Commit Graph

2706 Commits

Author SHA1 Message Date
Sepherosa Ziehau
6faefea03b linuxkpi: Fix PCI BAR lazy allocation support.
FreeBSD supports lazy allocation of PCI BAR, that is, when a device
driver's attach method is invoked, even if the device's PCI BAR
address wasn't initialized, the invocation of bus_alloc_resource_any()
(the call chain: pci_alloc_resource() -> pci_alloc_multi_resource() ->
pci_reserve_map() -> pci_write_bar()) would allocate a proper address
for the PCI BAR and write this 'lazy allocated' address into the PCI
BAR.

This model works fine for native FreeBSD device drivers, but _not_ for
device drivers shared with Linux (e.g. dev/mlx5/mlx5_core/mlx5_main.c
and ofed/drivers/net/mlx4/main.c.  Both of them use
pci_request_regions(), which doesn't work properly with the PCI BAR
lazy allocation, because pci_resource_type() -> _pci_get_rle() always
returns NULL, so pci_request_regions() doesn't have the opportunity to
invoke bus_alloc_resource_any().  We now use pci_find_bar() in
pci_resource_type(), which is able to locate all available PCI BARs
even if some of them will be lazy allocated.

Submitted by:	Dexuan Cui <decui microsoft com>
Reviewed by:	hps
MFC after:	1 week
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D8071
2016-09-30 05:51:11 +00:00
Hans Petter Selasky
bdff61f849 The IORESOURCE_XXX defines should resemble a bitmask while SYS_RES_XXX
are not bitmasks. Fix return value of pci_resource_flags() to reflect
this change.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-09-29 14:35:32 +00:00
Mateusz Guzik
02274c93c3 cloudabi: use fget_cap instead of hand-rolling capability read
This has a side effect of unbreaking the build after r306272.

Discussed with:		ed
2016-09-23 23:08:23 +00:00
Mariusz Zaborski
85b0f9de11 capsicum: propagate rights on accept(2)
Descriptor returned by accept(2) should inherits capabilities rights from
the listening socket.

PR:		201052
Reviewed by:	emaste, jonathan
Discussed with:	many
Differential Revision:	https://reviews.freebsd.org/D7724
2016-09-22 09:58:46 +00:00
Mark Johnston
bdaf6d6913 Regenerate syscall provider argument strings. 2016-09-22 04:50:03 +00:00
Konstantin Belousov
643f6f47fd Add PROC_TRAPCAP procctl(2) controls and global sysctl kern.trap_enocap.
Both can be used to cause processes in capability mode to receive
SIGTRAP when ENOTCAPABLE or ECAPMODE errors are returned from
syscalls.

Idea by:	emaste
Reviewed by:	oshogbo (previous version), emaste
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D7965
2016-09-21 08:23:33 +00:00
Ed Maste
df4336ddfa Catch up to sys/capability.h rename to sys/capsicum.h in r263232
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2016-09-19 18:44:43 +00:00
Konstantin Belousov
94ffabd5ce Regen. 2016-09-18 22:03:26 +00:00
Konstantin Belousov
84a30d33a3 Add compat32 support for capsicum.
Reviewed by:	bapt, emaste
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D7942
2016-09-18 22:03:07 +00:00
Dmitry Chagin
ede2869c4c Implement BLKSSZGET ioctl for the Linuxulator.
PR:		212700
Submitted by:	Erik Cederstrand
Reported by:	Erik Cederstrand
MFC after:	1 week
2016-09-17 08:10:01 +00:00
Mateusz Guzik
dbfe4b2673 linprocfs: garbage collect meminfo fields not present in linux
In particular memshared not only does not exist in linux, it was
extremely expensive to calculate.
2016-09-16 03:36:43 +00:00
John Baldwin
38605d7312 Remove 'cpu' and 'cpu_class' on amd64.
The 'cpu' and 'cpu_class' variables were always set to the same value
on amd64 and are legacy holdovers from i386.  Remove them entirely on
amd64.

Reviewed by:	imp, kib (older version)
Differential Revision:	https://reviews.freebsd.org/D7888
2016-09-15 17:05:54 +00:00
Brooks Davis
ef13681631 Remove a pointless translation of struct ioc_toc_header.
struct ioc_toc_header will be the same size (and thus IOREADTOCHEADER
will have the same value on all supported platforms).

Sponsored by:	DARPA, AFRL
2016-09-08 00:38:50 +00:00
Ed Schouten
102754b3cf Add missing header dependency.
This header depends on sigaltstack32 being declared.
2016-08-24 09:57:19 +00:00
Ed Schouten
79ad79d6a4 Add source files generated from the 32-bit system call table. 2016-08-21 16:02:25 +00:00
Ed Schouten
240f8c2d51 Add CPU independent code for running 32-bits CloudABI executables.
Essentially, this is a literal copy of the code in sys/compat/cloudabi64,
except that it now makes use of 32-bits datatypes and limits. In
sys/conf/files, we now need to take care to build the code in
sys/compat/cloudabi if either COMPAT_CLOUDABI32 or COMPAT_CLOUDABI64 is
turned on.

This change does not yet include any of the CPU dependent bits. Right
now I have implementations for running i386 binaries both on i386 and
x86-64, which I will send out for review separately.
2016-08-21 16:01:30 +00:00
Ed Schouten
90f4145f82 Don't forget to define __ELF_WORD_SIZE.
Without it, we only obtain the ELF types native to the system. In this
we explicitly want the 64-bit versions.
2016-08-21 15:37:49 +00:00
Ed Schouten
47cb4d7bd0 Add a utility macro for converting 64-bit pointers to native pointers.
Right now we're casting uint64_t's to native pointers. This isn't
causing any problems right now, but if we want to provide a 32-bit
compatibility layer that works on 64-bit systems as well, this will
cause problems. Casting a uint32_t to a 64-bit pointer throws a compiler
error.

Introduce a TO_PTR() macro that casts the value to uintptr_t before
casting it to a pointer.
2016-08-21 15:36:18 +00:00
Ed Schouten
4fbc90654c Move the linker script from cloudabi64/ to cloudabi/.
It turns out that it works perfectly fine for generating 32-bits vDSOs
as well. While there, get rid of the extraneous .s file extension.
2016-08-21 15:14:06 +00:00
Ed Schouten
a953f555e1 Use the right _MAX constant.
Though uio_resid is of type ssize_t, we need to take into account that
this source file contains an implementation specific to a certain
userspace pointer size. If this file provided 32-bit implementations,
this should have used INT32_MAX, even when running a 64-bit kernel.

This change has no effect, but is simply in preparation for adding
support for running 32-bit CloudABI executables.
2016-08-21 09:32:20 +00:00
Ed Schouten
384ef4841a Use memcpy() to copy 64-bit timestamps into the syscall return values.
On 32-bit platforms, our 64-bit timestamps need to be split up across
two registers. A simple assignment to td_retval[0] will cause the top 32
bits to get lost. By using memcpy(), we will automatically either use 1
or 2 registers depending on the size of register_t.
2016-08-21 07:41:11 +00:00
Ed Schouten
7a3558bed5 Regenerate system call table after r304483. 2016-08-19 17:54:06 +00:00
Ed Schouten
a5c3c5b14f Import the new automatically generated system call table for CloudABI.
Now that we've switched over to using the vDSO on CloudABI, it becomes a
lot easier for us to phase out old features. System call numbering is no
longer something that's part of the ABI. It's fully based on names. As
long as the numbering used by the kernel and the vDSO is consistent
(which it always is), it's all right.

Let's put this to the test by removing a system call (thread_tcb_set())
that's already unused for quite some time now, but was only left intact
to serve as a placeholder. Sync in the new system call table that uses
alphabetic sorting of system calls.

Obtained from:	https://github.com/NuxiNL/cloudabi
2016-08-19 17:49:35 +00:00
George V. Neville-Neil
3e7e23332f Remove the obsolete and unused openbsd_poll system call. (Phase 2)
Reported by:	brooks
Reviewed by:	brooks, jhb
Differential Revision:	https://reviews.freebsd.org/D7548
2016-08-18 10:54:39 +00:00
George V. Neville-Neil
5cba398b0c Remove unusedd and obsolete openbsd_poll system call. (Phase 1)
Reported by:	brooks
Reviewed by:	brooks,jhb
Differential Revision:	https://reviews.freebsd.org/D7548
2016-08-18 10:50:40 +00:00
Ed Schouten
93d9ebd82e Eliminate use of sys_fsync() and sys_fdatasync().
Make the kern_fsync() function public, so that it can be used by other
parts of the kernel. Fix up existing consumers to make use of it.

Requested by:	kib
2016-08-15 20:11:52 +00:00
Ed Schouten
2256eed3eb Let CloudABI use fdatasync() as well.
Now that FreeBSD supports fdatasync() natively, we can tidy up
CloudABI's equivalent system call to use that instead.
2016-08-15 19:42:21 +00:00
Konstantin Belousov
1d2537a26a Regen after r304176, fdatasync(2) addition. 2016-08-15 19:15:46 +00:00
Konstantin Belousov
295af703a0 Add an implementation of fdatasync(2).
The syscall is a trivial wrapper around new VOP_FDATASYNC(), sharing
code with fsync(2).  For all filesystems, this commit provides the
implementation which delegates the work of VOP_FDATASYNC() to
VOP_FSYNC().  This is functionally correct but not efficient.

This is not yet POSIX-compliant implementation, because it does not
ensure that queued AIO requests are completed before returning.

Reviewed by:	mckusick
Discussed with:	avg (ZFS), jhb (AIO part)
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D7471
2016-08-15 19:08:51 +00:00
Ed Schouten
13b4b4df98 Provide the CloudABI vDSO to its executables.
CloudABI executables already provide support for passing in vDSOs. This
functionality is used by the emulator for OS X to inject system call
handlers. On FreeBSD, we could use it to optimize calls to
gettimeofday(), etc.

Though I don't have any plans to optimize any system calls right now,
let's go ahead and already pass in a vDSO. This will allow us to
simplify the executables, as the traditional "syscall" shims can be
removed entirely. It also means that we gain more flexibility with
regards to adding and removing system calls.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D7438
2016-08-10 21:02:41 +00:00
Bryan Drewery
c1fa440409 Regenerate after r303755.
MFC after:	3 days
X-MFC-With:	r303755
Sponsored by:	EMC / Isilon Storage Division
2016-08-04 19:15:51 +00:00
Ed Schouten
e938ebbc0c Regenerate system call tables for r303699 and r303700. 2016-08-03 06:36:45 +00:00
Ed Schouten
a813fdc6c3 mprotect(): Change prototype to comply to POSIX.
Our mprotect() function seems to take a "const void *" address to the
pages whose permissions need to be adjusted. POSIX uses "void *". Simply
stick to the POSIX one to prevent us from writing unportable code.

PR:		211423 (exp-run)
Tested by:	antoine@ (Thanks!)
2016-08-03 06:33:04 +00:00
Brooks Davis
40018b91dd Don't create pointless backups of generated files in "make sysent".
Any sensible workflow will include a revision control system from which
to restore the old files if required.  In normal usage, developers just
have to clean up the mess.

Reviewed by:	jhb
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D7353
2016-07-28 21:29:04 +00:00
Konstantin Belousov
584b675ed6 Hide the boottime and bootimebin globals, provide the getboottime(9)
and getboottimebin(9) KPI. Change consumers of boottime to use the
KPI.  The variables were renamed to avoid shadowing issues with local
variables of the same name.

Issue is that boottime* should be adjusted from tc_windup(), which
requires them to be members of the timehands structure.  As a
preparation, this commit only introduces the interface.

Some uses of boottime were found doubtful, e.g. NLM uses boottime to
identify the system boot instance.  Arguably the identity should not
change on the leap second adjustment, but the commit is about the
timekeeping code and the consumers were kept bug-to-bug compatible.

Tested by:	pho (as part of the bigger patch)
Reviewed by:	jhb (same)
Discussed with:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	1 month
X-Differential revision:	https://reviews.freebsd.org/D7302
2016-07-27 11:08:59 +00:00
Ed Schouten
f63cd251b2 Add shmatt_t.
It looks like our "struct shmid_ds::shm_nattch" deviates from the
standard in the sense that it is a signed integer, whereas POSIX
requires that it is unsigned, having a special type shmatt_t.

Patch up our native and 32-bit copies to use a new shmatt_t that is an
unsigned integer. As it's unsigned, we can relax the comparisons that
are performed on it. Leave the Linux, iBCS2, etc. copies of the
structure alone.

Reviewed by:	ngie
Differential Revision:	https://reviews.freebsd.org/D6655
2016-07-26 17:23:49 +00:00
Gleb Smirnoff
32e0ade6c4 Partially revert r257696/r257713, which have an issue with writing to user
controlled address. Restore the old code that emulated OSIOCGIFCONF in if.c.

Noticed by:	C Turt
2016-07-24 10:10:09 +00:00
Dmitry Chagin
97d06da692 Fix a copy/paste bug introduced during X86_64 Linuxulator work.
FreeBSD support NX bit on X86_64 processors out of the box, for i386 emulation
use READ_IMPLIES_EXEC flag, introduced in r302515.

While here move common part of mmap() and mprotect() code to the files in compat/linux
to reduce code dupcliation between Linuxulator's.

Reported by:    Johannes Jost Meixner, Shawn Webb

MFC after:	1 week
XMFC with:	r302515, r302516
2016-07-10 08:22:04 +00:00
Dmitry Chagin
23e8912c60 Implement Linux personality() system call mainly due to READ_IMPLIES_EXEC flag.
In Linux if this flag is set, PROT_READ implies PROT_EXEC for mmap().
Linux/i386 set this flag automatically if the binary requires executable stack.

READ_IMPLIES_EXEC flag will be used in the next Linux mmap() commit.
2016-07-10 08:15:50 +00:00
Dmitry Chagin
3a49978f45 Fix a bug introduced in r283433.
[1] Remove unneeded sockaddr conversion before kern_recvit() call as the from
argument is used to record result (the source address of the received message) only.

[2] In Linux the type of msg_namelen member of struct msghdr is signed but native
msg_namelen has a unsigned type (socklen_t). So use the proper storage to fetch fromlen
from userspace and than check the user supplied value and return EINVAL if it is less
than 0 as a Linux do.

Reported by:	Thomas Mueller <tmueller at sysgo dot com> [1]
Reviewed by:	kib@
Approved by:	re (gjb, kib)
MFC after:	3 days
2016-06-26 16:59:59 +00:00
Brooks Davis
722bc9c00b Regen post r302096 and implement svr4_pipe().
Approved by:	re (implict, fixing build)
Sponsored by:	DARPA, AFRL
2016-06-23 00:30:09 +00:00
Brooks Davis
2b4a6471f4 Declare a svr4 version of pipe() now that sys_pipe() is no more.
Approved by:	re (implicit, fixing build)
Sponsored by:	DARPA, AFRL
2016-06-23 00:29:03 +00:00
Brooks Davis
a72c64b0b6 Generate syscall tables and update pipe() implementation after r302094.
Mark the pipe() system call as COMPAT10.

As of r302092 libc uses pipe2() with a zero flags value instead of pipe().

Approved by:	re (gjb)
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D6816
2016-06-22 21:18:19 +00:00
Brooks Davis
e16e64098c Mark the pipe() system call as COMPAT10.
As of r302092 libc uses pipe2() with a zero flags value instead of pipe().

Commit with regenerated files and implementation to follow.

Approved by:	re (gjb)
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D6816
2016-06-22 21:15:59 +00:00
Konstantin Belousov
5c2cf81845 Update comments for the MD functions managing contexts for new
threads, to make it less confusing and using modern kernel terms.

Rename the functions to reflect current use of the functions, instead
of the historic KSE conventions:
  cpu_set_fork_handler -> cpu_fork_kthread_handler (for kthreads)
  cpu_set_upcall -> cpu_copy_thread (for forks)
  cpu_set_upcall_kse -> cpu_set_upcall (for new threads creation)

Reviewed by:	jhb (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Approved by:	re (hrs)
Differential revision:	https://reviews.freebsd.org/D6731
2016-06-16 12:05:44 +00:00
Mark Johnston
8de3effe5e Add a missing error check for a malloc() call in idr_get().
Submitted by:	Matt Joras <mjoras@isilon.com>
Approved by:	re (gjb)
MFC after:	1 week
Sponsored by:	EMC / Isilon Storage Division
2016-06-14 03:57:00 +00:00
Konstantin Belousov
eb00e5b783 swap_dev_info() does not require Giant, so Giant locking around
the loop in linprocfs_doswaps() is useless.
List of the registered filesystems is protected by vfsconf_sx,
not by the Giant.  Adjust linprocfs_dofilesystems() correspondingly.

Approved by:	re (delphij), des (linprocfs maintainer)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2016-06-12 11:13:38 +00:00
Hans Petter Selasky
d8571d3ea3 Fallback to arc4rand() in the LinuxKPI when read_random() returns
zero. This can happen for virtual machines.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-06-07 13:10:13 +00:00
Gleb Smirnoff
34e05ebe72 Fix kernel stack disclosures in the Linux and 4.3BSD compat layers.
Submitted by:	CTurt
Security:	SA-16:20
Security:	SA-16:21
2016-05-31 16:56:30 +00:00
Hans Petter Selasky
8eeb3e1773 The SCHEDULER_STOPPED() macro already contains a predict false statement.
Remove superfluous unlikely() wrapper.

Suggested by:	glebius
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-05-27 07:33:49 +00:00
Hans Petter Selasky
2bb46d5516 Define ATOMIC_LONG_INIT() in the LinuxKPI.
Obtained from:	kmacy @
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-05-26 10:03:22 +00:00
Hans Petter Selasky
06ca64ecf3 Add support for runtime modifiable module parameters in the LinuxKPI.
Linux module parameters have a permissions value. If any write bits
are set we are allowed to modify the module parameter runtime. Reflect
this when creating the static SYSCTL nodes.

Sponsored by:	Mellanox Technologies
MFC after:	1 week
2016-05-26 09:04:14 +00:00
Hans Petter Selasky
707324edee Add more module parameter macros to the LinuxKPI.
Obtained from:	kmacy @
Sponsored by:	Mellanox Technologies
2016-05-26 08:47:06 +00:00
Hans Petter Selasky
0bb3dd300b Add support for boolean module parameters in the LinuxKPI.
Requested by:	kmacy @
Sponsored by:	Mellanox Technologies
2016-05-26 08:44:11 +00:00
Hans Petter Selasky
1d9b99e5e3 Implement Linux module parameters as read-only tunable SYSCTLs.
Bool module parameters are no longer supported, because there is no
equivalent in FreeBSD.

There are two macros available which control the behaviour of the
LinuxKPI module parameters:

- LINUXKPI_PARAM_PARENT allows the consumer to set the SYSCTL parent
where the modules parameters will be created.

- LINUXKPI_PARAM_PREFIX defines a parameter name prefix, which is
  added to all created module parameters.

Sponsored by:	Mellanox Technologies
MFC after:	1 week
2016-05-25 12:12:14 +00:00
Hans Petter Selasky
8571421886 Add checks for SCHEDULER_STOPPED() so that code using the LinuxKPI can
run after a panic(). This for example allows a LinuxKPI based graphics
stack to receive prints during a panic.

Obtained from:	kmacy @
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-05-25 09:04:06 +00:00
Kevin Lo
8636496407 Add __iowrite32_copy() to the Linux kernel compatibility layer.
Reviewed by:	hselasky
2016-05-24 09:23:04 +00:00
Hans Petter Selasky
9183a497e7 Use the DROP_GIANT() and PICKUP_GIANT() macros instead of making
assumptions about how the Giant mutex is locked.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-05-24 07:52:53 +00:00
Hans Petter Selasky
3ce1263063 Set "current" for all PCI enumeration callbacks.
Obtained from:	kmacy @
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-05-24 07:46:20 +00:00
Hans Petter Selasky
5a6748b2cf Use make_dev_s() instead of make_dev() to avoid race setting
"si_drv1". Convert panic() into regular error while at it.

Suggested by:	jhb @
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-05-24 07:06:04 +00:00
Dmitry Chagin
ab610366b5 Don't leak fp in case where fo_ioctl() returns an error.
Reported by:	C Turt <ecturt@gmail.com>
MFC after:	1 week
2016-05-24 05:29:41 +00:00
Hans Petter Selasky
44701cf732 Implement "atomic_long_add_unless()" in the LinuxKPI and fix the
implementation of "atomic_long_inc_not_zero()".

Found by:	ngie @
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-05-23 16:19:51 +00:00
Hans Petter Selasky
8b68f2509f A missing definition needed by ktime_to_ms().
Obtained from:	kmacy @
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-05-23 13:19:20 +00:00
Hans Petter Selasky
425da8eb61 Fix some data types and add "inline" keyword for __reg_op() function.
Obtained from:	kmacy @
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-05-23 13:18:15 +00:00
Hans Petter Selasky
83cfd83419 Implement ror32() in the LinuxKPI.
Obtained from:	kmacy @
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-05-23 12:53:17 +00:00
Hans Petter Selasky
fb2faed84e Define more copy to/from userspace functions in the LinuxKPI.
Obtained from:	kmacy @
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-05-23 12:52:22 +00:00
Hans Petter Selasky
aef2a67b83 Add more printf() related functions to the LinuxKPI.
Obtained from:	kmacy @
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-05-23 12:35:07 +00:00
Hans Petter Selasky
3092e0c54c Set an invalid IRQ number when no PCI IRQ is available in the LinuxKPI.
Obtained from:	kmacy @
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-05-23 12:13:16 +00:00
Hans Petter Selasky
7dfa8b2c4e Add more ktime related functions to the LinuxKPI.
Obtained from:	kmacy @
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-05-23 12:10:28 +00:00
Hans Petter Selasky
08a5e6ec7f Implement "kref_put_mutex()" for the LinuxKPI.
Obtained from:	kmacy @
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-05-23 12:06:34 +00:00
Hans Petter Selasky
aad02fb444 Add more list_xxx() functions to the LinuxKPI.
Obtained from:	kmacy @
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-05-23 12:03:40 +00:00
Hans Petter Selasky
0f8f7f554b Make header file standalone by including definitions for needed
linux_wait_xxx() functions.

Obtained from:	kmacy @
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-05-23 11:57:23 +00:00
Hans Petter Selasky
94a201be43 Implement "_outb()" to the LinuxKPI for i386 and amd64 only.
Obtained from:	kmacy @
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-05-23 11:53:00 +00:00
Hans Petter Selasky
ed5f781270 Add support for "cdev_add_ext()" to the LinuxKPI.
Obtained from:	kmacy @
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-05-23 11:50:05 +00:00
Hans Petter Selasky
299f29203a Add more GFP related defines to the LinuxKPI.
Obtained from:	kmacy @
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-05-23 11:47:54 +00:00
Hans Petter Selasky
4a0f827906 Add support for atomic_long_inc_not_zero() to the LinuxKPI.
Obtained from:	kmacy @
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-05-23 11:44:46 +00:00
Hans Petter Selasky
b5c541821a Add support for atomic_long_inc_not_zero() to the LinuxKPI.
Obtained from:	kmacy @
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-05-23 11:41:35 +00:00
Dmitry Chagin
df964aa45d Convert proto family in both directions. The linux and native values for
local and inet are identical, but for inet6 values differ.

PR:		155040
Reported by:	Simon Walton
MFC after:	2 week
2016-05-22 19:08:29 +00:00
Pedro F. Giffuni
91460148b2 ndis(4): Undo unneeded workarounds in ndis' rand().
- Revert the change for seed(0) in r300384. I misunderstood the standard
and while our random() implementation in libkern may be improved, it
handles the seed(0) case fine.

Pointed out by:	bde, ache
2016-05-22 14:13:20 +00:00
Dmitry Chagin
d56e689e7d Add a missing errno translation for SO_ERROR optname.
PR:		135458
Reported by:	Stefan Schmidt @ stadtbuch.de
MFC after:	1 week
2016-05-22 12:49:08 +00:00
Dmitry Chagin
f8d72f5312 For future use move futex timeout code to the separate function and
switch to the high resolution sbintime_t.

MFC after:	1 week
2016-05-22 12:37:40 +00:00
Dmitry Chagin
a03566dd95 Due to lack the priority propagation feature replace sx by mutex. WIth this
commit NPTL tests are ends in 1 minute faster.

MFC after:	1 week
2016-05-22 12:35:50 +00:00
Dmitry Chagin
ea53658ed7 Add my copyright as I rewrote most of the futex code. Minor style(9) cleanup
while here.

MFC after:	1 week
2016-05-22 12:28:55 +00:00
Dmitry Chagin
56c4f83d2e Minor style(9) cleanup, no functional changes.
MFC after:	1 week
2016-05-22 12:26:03 +00:00
Pedro F. Giffuni
b9c52b50a7 ndis(4): adjustments for our random() specific implementation.
- Revert r300377: The implementation claims to return a value
  within the range. [1]
- Adjust the value for the case of a zero seed, whihc according
  to standards should be equivalent to a seed of value 1.

Pointed out by:	cem
2016-05-22 00:29:25 +00:00
Pedro F. Giffuni
c515200599 ndis(4): Avoid overflow.
This is a long standing problem: our random() function returns an
unsigned integer but the rand provided by ndis(4) returns an int.
Scale it down.

MFC after:	2 weeks
2016-05-21 17:52:44 +00:00
Pedro F. Giffuni
0e6d3c6d16 ndis(4): Better mimic the behavior of rand() on Windows.
In ndis(4) we expose a rand() function that was constantly reseeding
with a time depending function every time it was called. This
essentially broke the reasoning behind seeding, and rendered srand()
a no-op.

Keep it simple, just use random() and srandom() as it's meant to work.
It  would have been tempting to just go for arc4random() but we
want to mimic Microsoft, and we don't need crypto-grade randomness
here.

PR:		209616
MFC after:	2 weeks
2016-05-21 17:38:43 +00:00
Konstantin Belousov
2a339d9e3d Add implementation of robust mutexes, hopefully close enough to the
intention of the POSIX IEEE Std 1003.1TM-2008/Cor 1-2013.

A robust mutex is guaranteed to be cleared by the system upon either
thread or process owner termination while the mutex is held.  The next
mutex locker is then notified about inconsistent mutex state and can
execute (or abandon) corrective actions.

The patch mostly consists of small changes here and there, adding
neccessary checks for the inconsistent and abandoned conditions into
existing paths.  Additionally, the thread exit handler was extended to
iterate over the userspace-maintained list of owned robust mutexes,
unlocking and marking as terminated each of them.

The list of owned robust mutexes cannot be maintained atomically
synchronous with the mutex lock state (it is possible in kernel, but
is too expensive).  Instead, for the duration of lock or unlock
operation, the current mutex is remembered in a special slot that is
also checked by the kernel at thread termination.

Kernel must be aware about the per-thread location of the heads of
robust mutex lists and the current active mutex slot.  When a thread
touches a robust mutex for the first time, a new umtx op syscall is
issued which informs about location of lists heads.

The umtx sleep queues for PP and PI mutexes are split between
non-robust and robust.

Somewhat unrelated changes in the patch:
1. Style.
2. The fix for proper tdfind() call use in umtxq_sleep_pi() for shared
   pi mutexes.
3. Removal of the userspace struct pthread_mutex m_owner field.
4. The sysctl kern.ipc.umtx_vnode_persistent is added, which controls
   the lifetime of the shared mutex associated with a vnode' page.

Reviewed by:	jilles (previous version, supposedly the objection was fixed)
Discussed with:	brooks, Martin Simmons <martin@lispworks.com> (some aspects)
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
2016-05-17 09:56:22 +00:00
Hans Petter Selasky
0f9f74597d Only lock Giant when needed in the LinuxKPI.
Suggested by:	ngie @
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-05-16 17:41:25 +00:00
Hans Petter Selasky
b334cdea9b Implement more Linux device related functions in the LinuxKPI. While
at it use NULL for some pointer checks.

Bump the FreeBSD version to force recompilation of all kernel modules
due to a structure size change.

Obtained from:	kmacy @
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-05-16 09:56:48 +00:00
Hans Petter Selasky
88fa0d734c Don't dereference parent pointer when it is NULL.
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-05-16 09:25:56 +00:00
Hans Petter Selasky
03219fba43 Properly implement "cpu_has_clflush" macro.
Suggested by:	kib, jhb
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-05-16 09:16:15 +00:00
Hans Petter Selasky
fdddd267d7 Handle case of class being set, but not parent when calling
device_register() in the LinuxKPI.

Obtained from:	kmacy @
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-05-13 13:01:02 +00:00
Hans Petter Selasky
83d5d45ea2 Add more PAGE related defines to the LinuxKPI. Move the definition of
"pgprot_t" to "linux/page.h" similar to what Linux does.

Obtained from:	kmacy @
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-05-13 12:41:21 +00:00
Hans Petter Selasky
854e1d4e6c Implement "old_encode_dev()" for the LinuxKPI.
Obtained from:	kmacy @
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-05-13 11:51:43 +00:00
Hans Petter Selasky
374377ce91 Define _IOC_SIZE() in the LinuxKPI.
Obtained from:	kmacy @
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-05-13 11:42:36 +00:00
Hans Petter Selasky
677a229c76 Add unlikely() statement to optimise the IS_ERR_VALUE() macro.
Obtained from:	kmacy @
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-05-13 11:30:56 +00:00
Hans Petter Selasky
e320ac1958 Implement nsecs_to_jiffies() in the LinuxKPI and while at it
streamline the rest of the xxx_to_jiffies() functions to have a
constant 64-bit argument and use identical range checks for the
result.

Specifically preserve msecs_to_jiffies(0) returning 0. See r282743 for
further details.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-05-13 11:02:02 +00:00
Hans Petter Selasky
abb14a540f Add more Linux defines. Improve some existing ones.
Obtained from:	kmacy @
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-05-13 10:10:43 +00:00
Hans Petter Selasky
04a471587e The Linux error defines should all be positive, else frequently used
error code checks might fail. ERESTART is in the BSD world defined as
-1. While at it add more Linux error codes.

Obtained from:	kmacy @
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-05-13 09:21:22 +00:00
Hans Petter Selasky
3a8bec33ef Fix handling of IOCTLs in the LinuxKPI.
Linux requires that all IOCTL data resides in userspace. FreeBSD
always moves the main IOCTL structure into a kernel buffer before
invoking the IOCTL handler and then copies it back into userspace,
before returning. Hide this difference in the "linux_copyin()" and
"linux_copyout()" functions by remapping userspace addresses in the
range from 0x10000 to 0x20000, to the kernel IOCTL data buffer.

It is assumed that the userspace code, data and stack segments starts
no lower than memory address 0x400000, which is also stated by "man 1
ld", which means any valid userspace pointer can be passed to regular
LinuxKPI handled IOCTLs.

Bump the FreeBSD version to force recompilation of all kernel modules.

Discussed with:	kmacy @
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-05-12 11:38:28 +00:00
Hans Petter Selasky
15c98ff2f1 Remove redundant "task_struct_set()".
This is done by the "linux_kthread_fn()".

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-05-12 09:11:18 +00:00
Hans Petter Selasky
464d20bcc8 Create a dummy "task_struct" on the stack which is returned by
"current" inside all LinuxKPI file operation callbacks. The "current"
is frequently used for various debug prints, printing the thread name
and thread ID for example.

Obtained from:	kmacy @
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-05-12 09:06:54 +00:00
Hans Petter Selasky
dacb734ea8 Match Linux behaviour and iterate the IDR tree unlocked. The caller is
responsible the IDR tree stays unmodified while iterating.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-05-11 17:20:20 +00:00
Hans Petter Selasky
b3c89b5ad9 Return a proper error code instead of panicing when an I/O vector
having the wrong number of entries is detected.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-05-11 10:50:59 +00:00
Hans Petter Selasky
fd42d62378 Add more IDR and IDA related functions to the LinuxKPI.
Obtained from:	kmacy @
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-05-11 10:40:04 +00:00
Hans Petter Selasky
f8221002a5 Factor out common code into "idr_find_layer_locked()" and fix inverted
bitmap test for free entry in "idr_replace()".

Obtained from:	kmacy @
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-05-11 10:35:15 +00:00
Hans Petter Selasky
5d35d77707 Add missing destruction of mutex.
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-05-11 10:06:58 +00:00
Hans Petter Selasky
8457719578 Add more atomic LinuxKPI functions.
Obtained from:	kmacy @
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-05-11 07:58:43 +00:00
Hans Petter Selasky
f2dbb750f4 Implement ioremap_wt() and use that in the MEMREMAP_WT case for i386
and amd64.

Suggested by:	cem @
Discussed with:	kmacy @
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-05-10 17:51:17 +00:00
Hans Petter Selasky
684a5fef01 Add more LinuxKPI I/O functions.
Obtained from:	kmacy @
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-05-10 12:04:57 +00:00
Hans Petter Selasky
7652bc32f7 Use function macros when possible to avoid stray substitutions.
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-05-10 11:39:36 +00:00
Hans Petter Selasky
f2f5b1337e Add missing semicolon and properly wrap macro argument.
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-05-10 11:34:22 +00:00
Hans Petter Selasky
c7d81c66df Allow the argument for the cpu_to_xxxp() and xxx_to_cpup() macros to
point to a constant.

Obtained from:	kmacy @
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-05-10 11:31:00 +00:00
Hans Petter Selasky
0754e66c54 Fix file polling bug.
Ensure the actual poll result is returned by the "linux_file_poll()"
function instead of zero which means no data is available.

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2016-05-09 11:52:57 +00:00
Pedro F. Giffuni
1ce4275dd2 sys/compat/linux*: spelling fixes.
Mostly on comments but there are some user-visible messages as well.

MFC after: 2 weeks
2016-04-30 00:53:10 +00:00
Pedro F. Giffuni
72ffecf147 ndis: spelling fixes in comments.
No functional change.
2016-04-30 00:35:46 +00:00
Pedro F. Giffuni
2bede1b82a x86bios: spelling fix in a comment.
No functional change.
2016-04-30 00:34:04 +00:00
Pedro F. Giffuni
5cb9d60e30 x86bios_alloc(): Unsign a counter.
The value can't even be signed so we can avoid the signed vs. unsigned
comparison.

Reviewed by:	jkim
2016-04-29 20:22:10 +00:00
Pedro F. Giffuni
5fe2c518bd ndis(4): it's rather unrealistic to expect a size_t here.
int was actually OK, and u_int is more than enough.
2016-04-28 03:19:53 +00:00
Pedro F. Giffuni
9119df34df ndis(4): unsign some indexes to prevent overflows.
The "len" parameter is uint32_t, indexing it with an int may
end up in a signed integer overflow.

strlen(3) returns an integer of size_t so the corresponding index should
have that size.

MFC after:	1 week
2016-04-28 01:58:56 +00:00
Conrad Meyer
aa90aec270 osd(9): Change array pointer to array pointer type from void*
This is a minor follow-up to r297422, prompted by a Coverity warning.  (It's
not a real defect, just a code smell.)  OSD slot array reservations are an
array of pointers (void **) but were cast to void* and back unnecessarily.
Keep the correct type from reservation to use.

osd.9 is updated to match, along with a few trivial igor fixes.

Reported by:	Coverity
CID:		1353811
Sponsored by:	EMC / Isilon Storage Division
2016-04-26 19:57:35 +00:00
Pedro F. Giffuni
55e0987aea sys: extend use of the howmany() macro when available.
We have a howmany() macro in the <sys/param.h> header that is
convenient to re-use as it makes things easier to read.
2016-04-26 15:38:17 +00:00
Jamie Gritton
d56cf22d22 linux_map_osrel doesn't need to be checked in linux_prison_set,
since it already was in linux_prison_check.
2016-04-25 06:08:45 +00:00
Dmitry Chagin
6c960bf07f Allow to build svr4 module with SYSV support separatelly from the kernel build.
PR:		208464
Reported by:	Kristoffer Eriksson
MFC after:	2 week
2016-04-23 20:31:18 +00:00
Dmitry Chagin
9f8621b1d5 Fix streams and svr4 module dependency. Both modules are complaining about
undefined symbol svr4_delete_socket which was moved from streams to the svr4 module
in r160558 that created a two-way dependency between them.

PR:		208464
Submitted by:	Kristoffer Eriksson
Reported by:	Kristoffer Eriksson
MFC after:	2 week
2016-04-23 20:29:55 +00:00
Pedro F. Giffuni
b66bb393f2 Cleanup redundant parenthesis from existing howmany()/roundup() macro uses. 2016-04-22 16:57:42 +00:00
Conrad Meyer
8d340432aa linprocfs_doproclimits: Initialize error return before use
Reported by:	Coverity
CID:		1354623
Sponsored by:	EMC / Isilon Storage Division
2016-04-20 01:03:06 +00:00
Conrad Meyer
e78adba3fe linprocfs: Don't print uninitialized values
Reported by:	Coverity
CID:		1354624
Sponsored by:	EMC / Isilon Storage Division
2016-04-20 01:00:13 +00:00
Pedro F. Giffuni
02abd40029 kernel: use our nitems() macro when it is available through param.h.
No functional change, only trivial cases are done in this sweep,

Discussed in:	freebsd-current
2016-04-19 23:48:27 +00:00
Pedro F. Giffuni
500ed14d6e compat/linux: for pointers replace 0 with NULL.
plvc is a pointer, no functional change.

Found with devel/coccinelle.
2016-04-15 16:21:13 +00:00
Pedro F. Giffuni
74b8d63dcc Cleanup unnecessary semicolons from the kernel.
Found with devel/coccinelle.
2016-04-10 23:07:00 +00:00
Dmitry Chagin
5743aa47f5 More complete implementation of /proc/self/limits.
Fix the way the code accesses process limits struct - pointed out by mjg@.

PR:		207386
Reviewed by:	no objection form des@
MFC after:	3 weeks
2016-04-10 07:11:29 +00:00
Ed Schouten
ab83575070 Make CloudABI's way of doing TLS more friendly to userspace emulators.
We're currently seeing how hard it would be to run CloudABI binaries on
operating systems cannot be modified easily (Windows, Mac OS X). The
idea is that we want to just run them without any sandboxing. Now
that CloudABI executables are PIE, this is already a bit easier, but TLS
is still problematic:

- CloudABI executables want to write to the %fs, which typically
  requires extra system calls by the emulator every time it needs to
  switch between CloudABI's and its own TLS.

- If CloudABI executables overwrite the %fs base unconditionally, it
  also becomes harder for the emulator to store a backup of the old
  value of %fs. To solve this, let's no longer overwrite %fs, but just
  %fs:0.

As CloudABI's C library does not use a TCB, this space can now be used
by an emulator to keep track of its internal state. The executable can
now safely overwrite %fs:0, as long as it makes sure that the TCB is
copied over to the new TLS area.

Ensure that there is an initial TLS area set up when the process starts,
only containing a bogus TCB. We don't really care about its contents on
FreeBSD.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D5836
2016-04-06 11:11:31 +00:00
Pedro F. Giffuni
ae26eab161 Fix indentation oops. 2016-04-03 14:40:54 +00:00
Dmitry Chagin
8bc21bafba Move Linux specific times tests up to guarantee the values are defined.
CID:		1305178
Submitted by:	pfg@
MFC after:	1 week
2016-04-03 06:33:16 +00:00
Sepherosa Ziehau
1ea448225c tcp/lro: Change SLIST to LIST, so that removing an entry is O(1)
This is kinda critical to the performance when the CPU is slow and
network bandwidth is high, e.g. in the hypervisor.

Reviewed by:	rrs, gallatin, Dexuan Cui <decui microsoft com>
Sponsored by:	Microsoft OSTC
Differential Revision:	https://reviews.freebsd.org/D5765
2016-04-01 06:43:05 +00:00
Ed Schouten
4a8b3b18cc Make Position Independent Executables work for CloudABI.
- Set BI_CAN_EXEC_DYN, so we can execute ET_DYN ELF files in addition to
  regular ET_EXECs.
- Provide an AT_BASE entry in the auxiliary vector, so the executable
  knows at which address it got loaded and can apply relocations.
2016-03-31 18:52:00 +00:00
Ed Schouten
2054309b6a Regenerate system call table after r297468. 2016-03-31 18:50:52 +00:00
Ed Schouten
38526a2cf1 Sync in the latest CloudABI system call definitions.
Some time ago I made a change to merge together the memory scope
definitions used by mmap (MAP_{PRIVATE,SHARED}) and lock objects
(PTHREAD_PROCESS_{PRIVATE,SHARED}). Though that sounded pretty smart
back then, it's backfiring. In the case of mmap it's used with other
flags in a bitmask, but for locking it's an enumeration. As our plan is
to automatically generate bindings for other languages, that looks a bit
sloppy.

Change all of the locking functions to use separate flags instead.

Obtained from:	https://github.com/NuxiNL/cloudabi
2016-03-31 18:50:06 +00:00
Navdeep Parhar
b03384114d Add wait_event_interruptible_timeout to linuxkpi.
Submitted by:	Krishnamraju Eraparaju @ Chelsio
Reviewed by:	hselasky@
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D5776
2016-03-31 17:11:58 +00:00
Hans Petter Selasky
9ad5ce9d01 Fix bugs in currently unused bit searching loop.
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2016-03-31 06:19:15 +00:00
Jamie Gritton
7ab25e3d18 Use osd_reserve / osd_jail_set_reserved, which is known to succeed.
Also don't work around nonexistent osd_register failure.
2016-03-30 17:05:04 +00:00
Gleb Smirnoff
9c64cfe56c The sendfile(2) allows to send extra data from userspace before the file
data (headers).  Historically the size of the headers was not checked
against the socket buffer space.  Application could easily overcommit the
socket buffer space.

With the new sendfile (r293439) the problem remained, but a KASSERT was
inserted that checked that amount of data written to the socket matches
its space.  In case when size of headers is bigger that socket space,
KASSERT fires.  Without INVARIANTS the new sendfile won't panic, but
would report incorrect amount of bytes sent.

o With this change, the headers copyin is moved down into the cycle, after
  the sbspace() check.  The uio size is trimmed by socket space there,
  which fixes the overcommit problem and its consequences.
o The compatibility handling for FreeBSD 4 sendfile headers API is pushed
  up the stack to syscall wrappers.  This required a copy and paste of the
  code, but in turn this allowed to remove extra stack carried parameter
  from fo_sendfile_t, and embrace entire compat code into #ifdef.  If in
  future we got more fo_sendfile_t function, the copy and paste level would
  even reduce.

Reviewed by:	emax, gallatin, Maxim Dounin <mdounin mdounin.ru>
Tested by:	Vitalij Satanivskij <satan ukr.net>
Sponsored by:	Netflix
2016-03-29 19:57:11 +00:00
Dmitry Chagin
7c5982000d Revert r297310 as the SOL_XXX are equal to the IPPROTO_XX except SOL_SOCKET.
Pointed out by:	ae@
2016-03-27 10:09:10 +00:00
Dmitry Chagin
c826fcfe22 iConvert Linux SOL_IPV6 level.
MFC after:	1 week
2016-03-27 08:12:01 +00:00
Dmitry Chagin
e667ee63f6 Whitespaces and style(9) fix. No functional changes.
MFC after:	1 week
2016-03-27 08:10:20 +00:00
Dmitry Chagin
09806d8e3e When write(2) on eventfd object fails with the error EAGAIN do not return
the number of bytes written.

MFC after:	1 week
2016-03-26 19:16:53 +00:00
Dmitry Chagin
2bb14e7541 Implement O_NONBLOCK flag via fcntl(F_SETFL) for eventfd object.
MFC after:	1 week
2016-03-26 19:15:23 +00:00
Ed Schouten
3cf50041ef Regenerate system call table after r297247. 2016-03-24 21:49:39 +00:00
Ed Schouten
1f3bbfd875 Replace the CloudABI system call table by a machine generated version.
The type definitions and constants that were used by COMPAT_CLOUDABI64
are a literal copy of some headers stored inside of CloudABI's C
library, cloudlibc. What is annoying is that we can't make use of
cloudlibc's system call list, as the format is completely different and
doesn't provide enough information. It had to be synced in manually.

We recently decided to solve this (and some other problems) by moving
the ABI definitions into a separate file:

	https://github.com/NuxiNL/cloudabi/blob/master/cloudabi.txt

This file is processed by a pile of Python scripts to generate the
header files like before, documentation (markdown), but in our case more
importantly: a FreeBSD system call table.

This change discards the old files in sys/contrib/cloudabi and replaces
them by the latest copies, which requires some minor changes here and
there. Because cloudabi.txt also enforces consistent names of the system
call arguments, we have to patch up a small number of system call
implementations to use the new argument names.

The new header files can also be included directly in FreeBSD kernel
space without needing any includes/defines, so we can now remove
cloudabi_syscalldefs.h and cloudabi64_syscalldefs.h. Patch up the
sources to include the definitions directly from sys/contrib/cloudabi
instead.
2016-03-24 21:47:15 +00:00
Dmitry Chagin
2ad0231309 Check bsd_to_linux_statfs() return value. Forgotten in r297070.
MFC after:	1 week
2016-03-20 19:06:21 +00:00
Dmitry Chagin
525c9796c3 Return EOVERFLOW in case when actual statfs values are large enough and
not fit into 32 bit fileds of a Linux struct statfs.

PR:		181012
MFC after:	1 week
2016-03-20 18:31:30 +00:00
Dmitry Chagin
7958a34cb5 Whitespaces, style(9) fixes. No functional changes.
MFC after:	1 week
2016-03-20 14:06:27 +00:00
Dmitry Chagin
99546279d6 Implement fstatfs64 system call.
PR:		181012
Submitted by:	John Wehle
MFC after:	1 week
2016-03-20 13:21:20 +00:00
Dmitry Chagin
4525bb829f Rework r296543:
1. Limit secs to INT32_MAX / 2 to avoid errors from kern_setitimer().
   Assert that kern_setitimer() returns 0.
   Remove bogus cast of secs.
   Fix style(9) issues.

2. Increment the return value if the remaining tv_usec value more than 500000 as a Linux does.

Pointed out by: [1] Bruce Evans

MFC after:	1 week
2016-03-20 11:40:52 +00:00
Justin Hibbits
da1b038af9 Use uintmax_t (typedef'd to rman_res_t type) for rman ranges.
On some architectures, u_long isn't large enough for resource definitions.
Particularly, powerpc and arm allow 36-bit (or larger) physical addresses, but
type `long' is only 32-bit.  This extends rman's resources to uintmax_t.  With
this change, any resource can feasibly be placed anywhere in physical memory
(within the constraints of the driver).

Why uintmax_t and not something machine dependent, or uint64_t?  Though it's
possible for uintmax_t to grow, it's highly unlikely it will become 128-bit on
32-bit architectures.  64-bit architectures should have plenty of RAM to absorb
the increase on resource sizes if and when this occurs, and the number of
resources on memory-constrained systems should be sufficiently small as to not
pose a drastic overhead.  That being said, uintmax_t was chosen for source
clarity.  If it's specified as uint64_t, all printf()-like calls would either
need casts to uintmax_t, or be littered with PRI*64 macros.  Casts to uintmax_t
aren't horrible, but it would also bake into the API for
resource_list_print_type() either a hidden assumption that entries get cast to
uintmax_t for printing, or these calls would need the PRI*64 macros.  Since
source code is meant to be read more often than written, I chose the clearest
path of simply using uintmax_t.

Tested on a PowerPC p5020-based board, which places all device resources in
0xfxxxxxxxx, and has 8GB RAM.
Regression tested on qemu-system-i386
Regression tested on qemu-system-mips (malta profile)

Tested PAE and devinfo on virtualbox (live CD)

Special thanks to bz for his testing on ARM.

Reviewed By: bz, jhb (previous)
Relnotes:	Yes
Sponsored by:	Alex Perez/Inertial Computing
Differential Revision: https://reviews.freebsd.org/D4544
2016-03-18 01:28:41 +00:00
John Baldwin
823590d40e Regen. 2016-03-12 22:55:07 +00:00
John Baldwin
8d91aced32 Regen. 2016-03-09 19:06:46 +00:00
John Baldwin
399e8c1773 Simplify AIO initialization now that it is standard.
- Mark AIO system calls as STD and remove the helpers to dynamically
  register them.
- Use COMPAT6 for the old system calls with the older sigevent instead of
  an 'o' prefix.
- Simplify the POSIX configuration to note that AIO is always available.
- Handle AIO in the default VOP_PATHCONF instead of special casing it in
  the pathconf() system call.  fpathconf() is still hackish.
- Remove freebsd32_aio_cancel() as it just called the native one directly.

Reviewed by:	kib
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D5589
2016-03-09 19:05:11 +00:00
Andrey V. Elsukov
86a9058b01 Add support for IPPROTO_IPV6 socket layer for getsockopt/setsockopt calls.
Also add mapping for several options from RFC 3493 and 3542.

Reviewed by:	dchagin
Tested by:	Joe Love <joe at getsomwhere dot net>
MFC after:	2 weeks
2016-03-09 09:12:40 +00:00
Dmitry Chagin
a87488d1e4 Better english.
Submitted by:	Kevin P. Neal
MFC after:	1 week
2016-03-08 19:40:01 +00:00
Dmitry Chagin
649ca5e9dc Put a commit message from r296502 about Linux alarm() system call
behaviour to the source.

Suggested by:	emaste@

MFC after:	1 week
2016-03-08 19:20:57 +00:00
Dmitry Chagin
91f514e413 Does not leak fp. While here remove bogus cast of fp->f_data.
MFC after:	1 week
2016-03-08 15:55:43 +00:00
Dmitry Chagin
fc4b98fb88 Linux accept() system call return EOPNOTSUPP errno instead of EINVAL
for UDP sockets.

MFC after:	1 week
2016-03-08 15:15:34 +00:00
Dmitry Chagin
15c3b371e2 According to POSIX and Linux implementation the alarm() system call
is always successfull.
So, ignore any errors and return 0 as a Linux do.

XXX. Unlike POSIX, Linux in case when the invalid seconds value specified
always return 0, so in that case Linux does not return proper remining time.

MFC after:	1 week
2016-03-08 15:12:49 +00:00
Dmitry Chagin
9f4e66afb9 Link the newly created process to the corresponding parent as
if CLONE_PARENT is set, then the parent of the new process will
be the same as that of the calling process.

MFC after:	1 week
2016-03-08 15:08:22 +00:00
Hans Petter Selasky
cb19abd277 Run the LinuxKPI PCI shutdown handler free of the Giant mutex.
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-03-07 14:35:31 +00:00
Hans Petter Selasky
510ebed7be Add more functions to the LinuxKPI.
Define strnicmp as a function macro instead of a regular macro while
at it.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-03-03 09:56:04 +00:00
Mark Johnston
0acf5d0bfd Improve error handling for posix_fallocate(2) and posix_fadvise(2).
- Set td_errno so that ktrace and dtrace can obtain the syscall error
  number in the usual way.
- Pass negative error numbers directly to the syscall layer, as they're
  not intended to be returned to userland.

Reviewed by:	kib
Sponsored by:	EMC / Isilon Storage Division
Differential Revision: https://reviews.freebsd.org/D5425
2016-02-25 19:58:23 +00:00
Ed Schouten
c0af8d16d8 Call cap_rights_init() properly.
Even though or'ing the individual rights works in this specific case, it
may not work in general. Pass them in as varargs.
2016-02-24 10:54:26 +00:00
Ed Schouten
70907712be Make handling of mmap()'s prot argument more strict.
- Make the system call fail if prot contains bits other than read, write
  and exec.
- Similar to OpenBSD's W^X, don't allow write and exec to be set at the
  same time. I'd like to see for now what happens if we enforce this
  policy unconditionally. If it turns out that this is far too strict,
  we'll loosen this requirement.
2016-02-23 09:22:00 +00:00
Svatopluk Kraus
35a0bc1260 As <machine/vmparam.h> is included from <vm/vm_param.h>, there is no
need to include it explicitly when <vm/vm_param.h> is already included.

Suggested by:	alc
Reviewed by:	alc
Differential Revision:	https://reviews.freebsd.org/D5379
2016-02-22 09:08:04 +00:00
Svatopluk Kraus
a1e1814d76 As <machine/pmap.h> is included from <vm/pmap.h>, there is no need to
include it explicitly when <vm/pmap.h> is already included.

Reviewed by:	alc, kib
Differential Revision:	https://reviews.freebsd.org/D5373
2016-02-22 09:02:20 +00:00
Dag-Erling Smørgrav
f4d6a773f8 Implement /proc/$$/limits.
PR:		207386
Submitted by:	Szymon Śliwa <knight.erraunt@gmail.com>
MFC after:	3 weeks
2016-02-21 14:56:05 +00:00
Jung-uk Kim
c2a9e596ed Silence VPS-Studio errors (V512). These buffer underflows are intentional. 2016-02-18 19:37:39 +00:00
Konstantin Belousov
db57c70a5b Rename P_KTHREAD struct proc p_flag to P_KPROC.
I left as is an apparent bug in ntoskrnl_var.h:AT_PASSIVE_LEVEL()
definition.

Suggested by:	jhb
Sponsored by:	The FreeBSD Foundation
2016-02-09 16:30:16 +00:00
Mateusz Guzik
813361c140 fork: plug a use after free of the returned process
fork1 required its callers to pass a pointer to struct proc * which would
be set to the new process (if any). procdesc and racct manipulation also
used said pointer.

However, the process could have exited prior to do_fork return and be
automatically reaped, thus making this a use-after-free.

Fix the problem by letting callers indicate whether they want the pid or
the struct proc, return the process in stopped state for the latter case.

Reviewed by:	kib
2016-02-04 04:25:30 +00:00
Mateusz Guzik
33fd9b9a2b fork: pass arguments to fork1 in a dedicated structure
Suggested by:	kib
2016-02-04 04:22:18 +00:00
Hans Petter Selasky
fe68f570d4 Update and add various macros to the LinuxKPI and resolve a macro
redefinition issue in the cxgb driver.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
Reviewed by:	np @
2016-01-26 15:26:35 +00:00
Hans Petter Selasky
c7c96d1093 LinuxKPI list updates:
- Add some new hlist macros.
- Update existing hlist macros removing the need for a temporary
  iteration variable.
- Properly define the RCU hlist macros to be SMP safe with regard
  to RCU.
- Safe list macro arguments by adding a pair of parentheses.
- Prefix the _list_add() and _list_splice() functions with "linux"
  to reflect they are LinuxKPI internal functions.

Obtained from:	Linux
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-01-26 15:12:31 +00:00
Hans Petter Selasky
e6ef991e5e Implement ether_addr_equal(), ether_addr_equal_64bits() and
random_ether_addr() for the LinuxKPI.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-01-26 14:36:16 +00:00
Hans Petter Selasky
f15ffb5e63 Implement is_vlan_dev() and vlan_dev_vlan_id() for the LinuxKPI.
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-01-26 14:33:20 +00:00
Hans Petter Selasky
e28297940b Implement bitmap_weight() and bitmap_equal() for the LinuxKPI.
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-01-26 14:31:20 +00:00
Hans Petter Selasky
d211177315 Add more network related macros and functions to the LinuxKPI.
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-01-26 14:29:50 +00:00
Hans Petter Selasky
4a19cc98b3 Add definition for the NETDEV_CHANGE event and tidy up the LinuxKPI
notifier header file a bit while at it.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-01-26 14:27:00 +00:00
Hans Petter Selasky
9f34efb9f4 Define __get_user() and __put_user() for the LinuxKPI.
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-01-26 14:21:30 +00:00
Hans Petter Selasky
a65ef21558 Add more LinuxKPI PCI related functions and defines.
Removed comments deriving from Linux.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-01-26 14:20:25 +00:00
Hans Petter Selasky
f919b7a664 Implement 64-bit atomic operations for the LinuxKPI.
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-01-21 17:56:23 +00:00
Hans Petter Selasky
29cbb3bef2 LinuxKPI atomic fixes:
- Fix implementation of atomic_add_unless(). The atomic_cmpset_int()
  function returns a boolean and not the previous value of the atomic
  variable.
- The atomic counters should be signed according to Linux.
- Some minor cosmetics and styling while at it.

Reviewed by:	alfred @
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-01-21 17:52:55 +00:00
Hans Petter Selasky
1f6112d50a Use function macro instead of non-function macro to reduce chance of
incorrect expansion.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-01-21 17:36:06 +00:00
Hans Petter Selasky
2d1bee654a Implement idr_preload(), idr_preload_end(), idr_alloc() and
idr_alloc_cyclic() in the LinuxKPI. Bump the FreeBSD version to
force recompilation of all KLDs due to IDR structure size change.

MFC after:	2 weeks
Sponsored by:	Mellanox Technologies
2016-01-21 14:57:45 +00:00
John Baldwin
e23cd1b923 Initialize vm_page_prot to VM_MEMATTR_DEFAULT instead of 0.
If a driver's Linux mmap callback passed vm_page_prot through unchanged,
then linux_dev_mmap_single() would try to apply whatever VM_MEMATTR_xxx
value 0 is to the mapping.  On x86, VM_MEMATTR_DEFAULT is the PAT value
for write-back (WB) which is 6, while 0 maps to the PAT value for
uncacheable (UC).  Thus, any mmap request that did not explicitly set
page_prot was tried to map memory as UC triggering the warning in
sg_pager_getpages().

Tested by:	np
Reported by:	Krishnamraju Eraparaju @ Chelsio
MFC after:	3 days
Sponsored by:	Chelsio Communications
2016-01-20 00:14:34 +00:00
Dmitry Chagin
67968b35a0 Prevent double free of control in common sendmsg path as sosend
already freeing it.
2016-01-17 19:28:13 +00:00
Hans Petter Selasky
7d1333938f Implement support for PCI suspend, resume and shutdown events in the
LinuxKPI. Fix a few spaces to tabs. Bump the FreeBSD version to force
recompilation of existing KMODs.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-01-15 11:18:58 +00:00
Gleb Smirnoff
c8358c6e0d Call crextend() before copying old credentials to the new credentials
and replace crcopysafe by crcopy as crcopysafe is is not intended to be
safe in a threaded environment, it drops PROC_LOCK() in while() that
can lead to unexpected results, such as overwrite kernel memory.

In my POV crcopysafe() needs special attention. For now I do not see
any problems with this function, but who knows.

Submitted by:	dchagin
Found by:	trinity
Security:	SA-16:04.linux
2016-01-14 10:16:25 +00:00
Gleb Smirnoff
037f750877 Change linux get_robust_list system call to match actual linux one.
The set_robust_list system call request the kernel to record the head
of the list of robust futexes owned by the calling thread. The head
argument is the list head to record.
The get_robust_list system call should return the head of the robust
list of the thread whose thread id is specified in pid argument.
The list head should be stored in the location pointed to by head
argument.

In contrast, our implemenattion of get_robust_list system call copies
the known portion of memory pointed by recorded in set_robust_list
system call pointer to the head of the robust list to the location
pointed by head argument.

So, it is possible for a local attacker to read portions of kernel
memory, which may result in a privilege escalation.

Submitted by:	mjg
Security:	SA-16:03.linux
2016-01-14 10:13:58 +00:00
Dmitry Chagin
6437b8e7d9 Unlock process lock when return error from getrobustlist call and add
an forgotten dtrace probe when return the same error.

MFC after:	3 days
XMFC with:	r292743
2016-01-10 07:36:43 +00:00
Dmitry Chagin
038c720553 Implement vsyscall hack. Prior to 2.13 glibc uses vsyscall
instead of vdso. An upcoming linux_base-c6 needs it.

Differential Revision:  https://reviews.freebsd.org/D1090

Reviewed by:	kib, trasz
MFC after:	1 week
2016-01-09 20:18:53 +00:00
Hans Petter Selasky
0c510167fb LinuxKPI style changes:
- Properly prefix internal functions with "linux_" instead of only a
  single underscore to avoid future namespace collisions.
- Make some functions global instead of inline to ease debugging and
  to avoid unnecessary code duplication.
- Remove no longer existing kthread_create() function's prototype.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-01-08 10:04:19 +00:00
Hans Petter Selasky
e10c4cc0a4 Implement RCU mechanism using shared exclusive locks.
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-01-05 12:22:45 +00:00
Hans Petter Selasky
b648035313 Handle when filedescriptors are closed before initialized. An early
fdclose() call can cause fget_unlocked() to fail.

Found by:	mjg @
MFC after:	1 week
Reviewed by:	Mark Block <markb@mellanox.com>
Sponsored by:	Mellanox Technologies
Differential Revision:	https://reviews.freebsd.org/D4351
2015-12-31 14:47:45 +00:00
Hans Petter Selasky
06204f8e25 Minor LinuxKPI code cleanup:
- Declare some static functions in linux_compat.c instead if inside
  various header files.
- Prefix FreeBSD local functions in the LinuxKPI with "linux_" to
  avoid symbol name conflicts in the future and to make debugging
  easier.
- Make the "struct kobj_ktype" declaractions constant to shave off a
  few bytes from the data segment.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2015-12-31 12:30:19 +00:00
Hans Petter Selasky
337cb9f04c Make the kobject refcounting compliant with Linux. Refcounting on the
parent kobject cannot be factored out and must be done by the kobject
consumers.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2015-12-31 11:27:36 +00:00
Hans Petter Selasky
260194052e Reduce memory consumption when allocating kobject strings in the
LinuxKPI. Compute string length before allocating memory instead of
using fixed size allocations. Make kobject_set_name_vargs() global
instead of inline to save some bytes when compiling.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2015-12-28 18:20:05 +00:00
Dmitry Chagin
bfb5568a3c Return EINVAL in case of incorrect sigev_signo value specified instead of panicing. 2015-12-26 09:09:49 +00:00
Dmitry Chagin
6e5549717a Do not allow access to emuldata for non Linux processes.
Pointed out by:	mjg@
Security:	https://admbugs.freebsd.org/show_bug.cgi?id=679
2015-12-26 09:04:47 +00:00
Hans Petter Selasky
c4e58b4efe Implement drain_workqueue() function.
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2015-12-21 12:20:02 +00:00
Hans Petter Selasky
9782763db2 In the zero delay case in queue_delayed_work() use the return value
from taskqueue_enqueue() instead of reading "ta_pending" unlocked and
also ensure the callout is stopped before proceeding.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2015-12-21 12:13:03 +00:00
Hans Petter Selasky
55d445d317 Minor workqueue cleanup:
- Make some functions global instead of inline to ease debugging.
- Fix some minor style issues.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2015-12-21 11:58:59 +00:00
Hans Petter Selasky
c094330345 Implement sleepable RCU mechanism using shared exclusive locks.
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2015-12-21 11:03:12 +00:00
Hans Petter Selasky
cee21041cf Implement ACCESS_ONCE(), WRITE_ONCE() and READ_ONCE().
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2015-12-21 10:56:38 +00:00
Mark Johnston
3616095801 Fix style issues around existing SDT probes.
- Use SDT_PROBE<N>() instead of SDT_PROBE(). This has no functional effect
  at the moment, but will be needed for some future changes.
- Don't hardcode the module component of the probe identifier. This is
  set automatically by the SDT framework.

MFC after:	1 week
2015-12-16 23:39:27 +00:00
Hans Petter Selasky
f837e46d16 Add some structures and defines which will be used when decoding small
form factor, SFF, standards compliant ethernet EEPROMs.

MFC after:	1 week
Obtained from:	Linux
Sponsored by:	Mellanox Technologies
2015-12-03 12:51:54 +00:00
Hans Petter Selasky
9ce5ab9ce6 Remove incorrect defines. The proper version of these macros is
defined in linux/etherdevice.h.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2015-12-03 11:45:12 +00:00
Hans Petter Selasky
52ba05767f Add more functions and types to the LinuxKPI.
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2015-11-30 09:24:12 +00:00
Konstantin Belousov
724f4b62b0 Remove sv_prepsyscall, sv_sigsize and sv_sigtbl members of the struct
sysent.

sv_prepsyscall is unused.

sv_sigsize and sv_sigtbl translate signal number from the FreeBSD
namespace into the ABI domain.  It is only utilized on i386 for iBCS2
binaries.  The issue with this approach is that signals for iBCS2 were
delivered with the FreeBSD signal frame layout, which does not follow
iBCS2.  The same note is true for any other potential user if
sv_sigtbl.  In other words, if ABI needs signal number translation, it
really needs custom sv_sendsig method instead.

Sponsored by:	The FreeBSD Foundation
2015-11-28 08:49:07 +00:00
Konstantin Belousov
5e27d79314 Split kerne timekeep ABI structure vdso_sv_tk out of the struct
sysentvec.  This allows the timekeep data to be shared between similar
ABIs which cannot share sysentvec.

Make the timekeep_push_vdso() tick callback to the timekeep structures
instead of sysentvecs.  If several sysentvec share the vdso_sv_tk
structure, we would update the userspace data several times on each
tick, without the change.

Only allocate vdso_sv_tk in the exec_sysvec_init() sysinit when
sysentvec is marked with the new SV_TIMEKEEP flag.  This saves
allocation and update of unneeded vdso_sv_tk for ABIs which do not
provide userspace gettimeofday yet, which are PowerPCs arches right
now.

Make vdso_sv_tk allocator public, namely split out and export
alloc_sv_tk() and alloc_sv_tk_compat32().  ABIs which share timekeep
data now can allocate it manually and share as appropriate.

Requested by:	nwhitehorn
Tested by:	nwhitehorn, pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-11-23 07:09:35 +00:00
Hans Petter Selasky
f727a767e9 Add assert and note about the size of "unsigned long" inside the
LinuxKPI for the future.

Sponsored by:	Mellanox Technologies
2015-11-13 09:00:39 +00:00
Hans Petter Selasky
86845417d1 Build fixes:
- Add some missing I/O functions for non-i386 and amd64 platforms.
- Stub ioremap() to NULL using a macro to ensure non-existing memory
  attributes are not referred when they do not exist.
- Add more header files to linux/list.h to resolve driver compilation
  issues on Sparc64 and PowerPC platforms.

Sponsored by:	Mellanox Technologies
2015-11-12 09:18:22 +00:00
Conrad Meyer
3b383c9ede linuxkpi/sysfs.h: Cast arg2 through intptr_t to avoid GCC warning
The code compiles fine under Clang, but GCC on PPC is less permissive about
integer and pointer sizes.  (An intmax_t is clearly *large enough* to hold a
pointer value.)

Another follow-up to r290475.

Reported by:	jhibbits
Sponsored by:	EMC / Isilon Storage Division
2015-11-09 16:50:42 +00:00
Hans Petter Selasky
8e7baabc9f Make all the LinuxKPI include files compile standalone.
Sponsored by:	Mellanox Technologies
2015-11-03 12:37:55 +00:00
Hans Petter Selasky
8d59ecb214 Finish process of moving the LinuxKPI module into the default kernel build.
- Move all files related to the LinuxKPI into sys/compat/linuxkpi and
  its subfolders.
- Update sys/conf/files and some Makefiles to use new file locations.
- Added description of COMPAT_LINUXKPI to sys/conf/NOTES which in turn
  adds the LinuxKPI to all LINT builds.
- The LinuxKPI can be added to the kernel by setting the
  COMPAT_LINUXKPI option. The OFED kernel option no longer builds the
  LinuxKPI into the kernel. This was done to keep the build rules for
  the LinuxKPI in sys/conf/files simple.
- Extend the LinuxKPI module to include support for USB by moving the
  Linux USB compat from usb.ko to linuxkpi.ko.
- Bump the FreeBSD_version.
- A universe kernel build has been done.

Reviewed by:	np @ (cxgb and cxgbe related changes only)
Sponsored by:	Mellanox Technologies
2015-10-29 08:28:39 +00:00
Konstantin Belousov
cd0a26c53f Fix build for the KTR-enabled kernels.
Sponsored by:	The FreeBSD Foundation
2015-10-23 11:41:55 +00:00
Ed Schouten
b78ef4bd86 Refactoring: move out generic bits from cloudabi64_sysvec.c.
In order to make it easier to support CloudABI on ARM64, move out all of
the bits from the AMD64 cloudabi_sysvec.c into a new file
cloudabi_module.c that would otherwise remain identical. This reduces
the AMD64 specific code to just ~160 lines.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D3974
2015-10-22 09:07:53 +00:00
Ed Schouten
808d980506 Properly format pointer size independent CloudABI system calls.
CloudABI has approximately 50 system calls that do not depend on the
pointer size of the system. As the ABI is pretty compact, it takes
little effort to each truss(8) the formatting rules for these system
calls. Start off by formatting pointer size independent system calls.

Changes:

- Make it possible to include the CloudABI system call definitions in
  FreeBSD userspace builds. Add ${root}/sys to the truss(8) Makefile so
  we can pull in <compat/cloudabi/cloudabi_syscalldefs.h>.
- Refactoring: patch up amd64-cloudabi64.c to use the CLOUDABI_*
  constants instead of rolling our own table.
- Add table entries for all of the system calls.
- Add new generic formatting types (UInt, IntArray) that we'll be using
  to format unsigned integers and arrays of integers.
- Add CloudABI specific formatting types.

Approved by:	jhb
Differential Revision:	https://reviews.freebsd.org/D3836
2015-10-08 05:27:45 +00:00
Bryan Drewery
a730673058 Remove redundant RFFPWAIT/vfork(2) handling in Linux fork(2) and clone(2) wrappers.
r161611 added some of the code from sys_vfork() directly into the Linux
module wrappers since they use RFSTOPPED.  In r232240, the RFFPWAIT handling
was moved to syscallret(), thus this code in the Linux module is no longer
needed as it will be called later.

This also allows the Linux wrappers to benefit from the fix in r275616 for
threads not getting suspended if their vforked child is stopped while they
wait on them.

Reviewed by:	jhb, kib
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D3828
2015-10-07 19:10:38 +00:00
Andriy Gapon
2f2f522b5d save some bytes by using more concise SDT_PROBE<n> instead of SDT_PROBE
SDT_PROBE requires 5 parameters whereas SDT_PROBE<n> requires n parameters
where n is typically smaller than 5.

Perhaps SDT_PROBE should be made a private implementation detail.

MFC after:	20 days
2015-09-28 12:14:16 +00:00
Edward Tomasz Napierala
089d32934a Fixes a panic triggered by threaded Linux applications when running
with RACCT/RCTL enabled.

Reviewed by:	ngie@, ed@
Tested by:	Larry Rosenman <ler@lerctr.org>
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D3470
2015-09-02 14:04:13 +00:00
Ed Schouten
bc1ace0b96 Decompose linkat()/renameat() rights to source and target.
To make it easier to understand how Capsicum interacts with linkat() and
renameat(), rename the rights to CAP_{LINK,RENAME}AT_{SOURCE,TARGET}.

This also addresses a shortcoming in Capsicum, where it isn't possible
to disable linking to files stored in a directory. Creating hardlinks
essentially makes it possible to access files with additional rights.

Reviewed by:	rwatson, wblock
Differential Revision:	https://reviews.freebsd.org/D3411
2015-08-27 15:16:41 +00:00
Ed Schouten
edcf7fbf59 Don't forget to invoke pre_execve() and post_execve().
CloudABI's proc_exec() was implemented before r282708 introduced
pre_execve() and post_execve(). Sync up by adding these missing calls.
2015-08-17 13:07:12 +00:00
Ed Schouten
fbb624e76f Add the last remaining system calls: send() and recv().
There is still one TODO item for these calls: add file descriptor
passing. The data structures are already prepared for this. It's just
the translation that's missing.

Obtained from:	http://github.com/NuxiNL/freebsd
2015-08-12 17:42:20 +00:00
Ed Schouten
2c20fbe43a Use CAP_EVENT instead of CAP_PDWAIT.
The cloudlibc pdwait() function ends up using FreeBSD's kqueue() in
combination with EVFILT_PROCDESC. This depends on CAP_EVENT -- not
CAP_PDWAIT.

Obtained from:	https://github.com/NuxiNL/freebsd
2015-08-12 11:07:03 +00:00
Ed Schouten
18528470cb Make blocking CloudABI futex operations work.
Blocking on locks and condition variables can be accomplished by polling
and using the special filters CONDVAR, LOCK_RDLOCK and LOCK_WRLOCK.

For now it wouldn't make sense to implement this functionality into
kqueue() itself, for the reason that they are CloudABI specific and
would require us to resize 'struct kevent' to hold all of the parameters
of interest.

Add a bandaid to the CloudABI poll system call to call into the futex
code directly if it detects specific combinations of events that are
used by the C library.

Obtained from:	https://github.com/NuxiNL/freebsd
2015-08-12 08:41:48 +00:00
Ed Schouten
322e16e87e Make poll() and kqueue() on CloudABI work.
This change implements two functions, cloudabi64_kevent_copyin() and
cloudabi64_kevent_copyout(), that convert CloudABI structures to
FreeBSD's struct kevent. CloudABI uses two structures: subscription_t
and event_t. The former is used for input, whereas the latter is used
for output. Unlike struct kevent, fields aren't overloaded for multiple
purposes or for separate event types.

For poll() we call into the newly introduced kern_kevent_anonymous()
function that allows us to poll without a file descriptor. This function
is not only used by poll(), but also by functions such as
sleep() and clock_nanosleep().

Reviewed by:	jmg
Obtained from:	https://github.com/NuxiNL/freebsd
Differential Revision:	https://reviews.freebsd.org/D3308
2015-08-12 07:59:00 +00:00
Ed Schouten
55a224afa2 Fall back to O_RDONLY -- not O_WRONLY.
If CloudABI processes open files with a set of requested rights that do
not match any of the privileges granted by O_RDONLY, O_WRONLY or O_RDWR,
we'd better fall back to O_RDONLY -- not O_WRONLY.
2015-08-11 14:08:46 +00:00
Ed Schouten
9d9123a80d Properly convert the error number to CloudABI's indexing.
We currently return FreeBSD's errno value directly, which is of course
not correct.
2015-08-11 14:07:04 +00:00
Ed Schouten
65c17fe451 Make cap_rights_limit() work for CloudABI processes.
Call into the recently introduced kern_cap_rights_limit() function to
restrict rights.
2015-08-11 08:44:19 +00:00
Ed Schouten
0f85ff377b Add file_open(): the underlying system call of openat().
CloudABI purely operates on file descriptor rights (CAP_*). File
descriptor access modes (O_ACCMODE) are emulated on top of rights.

Instead of accepting the traditional flags argument, file_open() copies
in an fdstat_t object that contains the initial rights the descriptor
should have, but also file descriptor flags that should persist after
opening (APPEND, NONBLOCK, *SYNC). Only flags that don't persist (EXCL,
TRUNC, CREAT, DIRECTORY) are passed in as an argument.

file_open() first converts the rights, the persistent flags and the
non-persistent flags to fflags. It then calls into vn_open(). If
successful, it installs the file descriptor with the requested
rights, trimming off rights that don't apply to the type of
the file that has been opened.

Unlike kern_openat(), this function does not support /dev/fd/*. I can't
think of a reason why we need to support this for CloudABI.

Obtained from:	https://github.com/NuxiNL/freebsd
Differential Revision:	https://reviews.freebsd.org/D3235
2015-08-06 06:47:28 +00:00
Ed Schouten
aaf53ab2aa Correct the previous commit: remove the DECLARE_MODULE().
It looks like a MODULE_VERSION() can also appear on its own -- there is
no need to use explicitly use DECLARE_MODULE(). Looking at other
modules, this seems common practice.
2015-08-05 16:53:49 +00:00
Ed Schouten
b6efa27589 Add DECLARE_MODULE() to the "cloudabi" kernel module.
This kernel module does not require any explicit initialization, but a
module declaration is needed to let the "cloudabi64" kernel module
automatically pull this in.

Obtained from:	https://github.com/NuxiNL/freebsd
2015-08-05 16:45:47 +00:00
Ed Schouten
36310bcd1d Make fcntl(F_SETFL) work.
The stat_put() system call can be used to modify file descriptor
attributes, such as flags, but also Capsicum permission bits. Support
for changing Capsicum bits will be added as soon as its dependent
changes have been pushed through code review.

Obtained from:	https://github.com/NuxiNL/freebsd
2015-08-05 16:15:43 +00:00
Ed Schouten
2412ae2b8e Regenerate the system call table. 2015-08-05 13:10:13 +00:00
Ed Schouten
2837d9ed43 Import the latest CloudABI system call definitions and table.
We're going to need these for next code I'm going to send out for
review: support for poll() and kqueue() on CloudABI.
2015-08-05 13:09:46 +00:00
Ed Schouten
db1c8ee585 Add the remaining pointer size independent CloudABI socket system calls.
CloudABI uses a structure called cloudabi_sockstat_t. Think of it as
'struct stat' for sockets. It is used by functions such as
getsockname(), getpeername(), some of the getsockopt() values, etc.

This change implements the sock_stat_get() system call that returns a
copy of this structure. The accept() system call should also return a
full copy of this structure eventually, but for now we're only
interested in the peer address. Add a TODO() to make sure this is
patched up later on.

Differential Revision:	https://reviews.freebsd.org/D3218
2015-08-05 08:18:05 +00:00
Ed Schouten
4958fab8cd Allow the creation of polling descriptors (kqueues) on CloudABI. 2015-08-05 07:37:06 +00:00
Ed Schouten
a2034cc98a Allow the creation of kqueues with a restricted set of Capsicum rights.
On CloudABI we want to create file descriptors with just the minimal set
of Capsicum rights in place. The reason for this is that it makes it
easier to obtain uniform behaviour across different operating systems.

By explicitly whitelisting the operations, we can return consistent
error codes, but also prevent applications from depending OS-specific
behaviour.

Extend kern_kqueue() to take an additional struct filecaps that is
passed on to falloc_caps(). Update the existing consumers to pass in
NULL.

Differential Revision:	https://reviews.freebsd.org/D3259
2015-08-05 07:36:50 +00:00
Ed Schouten
0c0964844e Let the CloudABI futex code use umtx_keys.
The CloudABI kernel still passes all of the cloudlibc unit tests.

Reviewed by:	vangyzen
Differential Revision:	https://reviews.freebsd.org/D3286
2015-08-04 06:02:03 +00:00
Ed Schouten
f52c3dd415 Allow CloudABI processes to create shared memory objects.
Summary:
Use the newly created `kern_shm_open()` function to create objects with
just the rights that are actually needed.

Reviewers: jhb, kib

Subscribers: imp

Differential Revision: https://reviews.freebsd.org/D3260
2015-08-01 07:51:48 +00:00
Ed Schouten
367a13f905 Limit rights on process descriptors.
On CloudABI, the rights bits returned by cap_rights_get() match up with
the operations that you can actually perform on the file descriptor.

Limiting the rights is good, because it makes it easier to get uniform
behaviour across different operating systems. If process descriptors on
FreeBSD would suddenly gain support for any new file operation, this
wouldn't become exposed to CloudABI processes without first extending
the rights.

Extend fork1() to gain a 'struct filecaps' argument that allows you to
construct process descriptors with custom rights. Use this in
cloudabi_sys_proc_fork() to limit the rights to just fstat() and
pdwait().

Obtained from:	https://github.com/NuxiNL/freebsd
2015-07-31 10:21:58 +00:00