Commit Graph

2706 Commits

Author SHA1 Message Date
Hans Petter Selasky
3a8bec33ef Fix handling of IOCTLs in the LinuxKPI.
Linux requires that all IOCTL data resides in userspace. FreeBSD
always moves the main IOCTL structure into a kernel buffer before
invoking the IOCTL handler and then copies it back into userspace,
before returning. Hide this difference in the "linux_copyin()" and
"linux_copyout()" functions by remapping userspace addresses in the
range from 0x10000 to 0x20000, to the kernel IOCTL data buffer.

It is assumed that the userspace code, data and stack segments starts
no lower than memory address 0x400000, which is also stated by "man 1
ld", which means any valid userspace pointer can be passed to regular
LinuxKPI handled IOCTLs.

Bump the FreeBSD version to force recompilation of all kernel modules.

Discussed with:	kmacy @
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-05-12 11:38:28 +00:00
Hans Petter Selasky
15c98ff2f1 Remove redundant "task_struct_set()".
This is done by the "linux_kthread_fn()".

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-05-12 09:11:18 +00:00
Hans Petter Selasky
464d20bcc8 Create a dummy "task_struct" on the stack which is returned by
"current" inside all LinuxKPI file operation callbacks. The "current"
is frequently used for various debug prints, printing the thread name
and thread ID for example.

Obtained from:	kmacy @
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-05-12 09:06:54 +00:00
Hans Petter Selasky
dacb734ea8 Match Linux behaviour and iterate the IDR tree unlocked. The caller is
responsible the IDR tree stays unmodified while iterating.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-05-11 17:20:20 +00:00
Hans Petter Selasky
b3c89b5ad9 Return a proper error code instead of panicing when an I/O vector
having the wrong number of entries is detected.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-05-11 10:50:59 +00:00
Hans Petter Selasky
fd42d62378 Add more IDR and IDA related functions to the LinuxKPI.
Obtained from:	kmacy @
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-05-11 10:40:04 +00:00
Hans Petter Selasky
f8221002a5 Factor out common code into "idr_find_layer_locked()" and fix inverted
bitmap test for free entry in "idr_replace()".

Obtained from:	kmacy @
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-05-11 10:35:15 +00:00
Hans Petter Selasky
5d35d77707 Add missing destruction of mutex.
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-05-11 10:06:58 +00:00
Hans Petter Selasky
8457719578 Add more atomic LinuxKPI functions.
Obtained from:	kmacy @
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-05-11 07:58:43 +00:00
Hans Petter Selasky
f2dbb750f4 Implement ioremap_wt() and use that in the MEMREMAP_WT case for i386
and amd64.

Suggested by:	cem @
Discussed with:	kmacy @
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-05-10 17:51:17 +00:00
Hans Petter Selasky
684a5fef01 Add more LinuxKPI I/O functions.
Obtained from:	kmacy @
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-05-10 12:04:57 +00:00
Hans Petter Selasky
7652bc32f7 Use function macros when possible to avoid stray substitutions.
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-05-10 11:39:36 +00:00
Hans Petter Selasky
f2f5b1337e Add missing semicolon and properly wrap macro argument.
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-05-10 11:34:22 +00:00
Hans Petter Selasky
c7d81c66df Allow the argument for the cpu_to_xxxp() and xxx_to_cpup() macros to
point to a constant.

Obtained from:	kmacy @
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-05-10 11:31:00 +00:00
Hans Petter Selasky
0754e66c54 Fix file polling bug.
Ensure the actual poll result is returned by the "linux_file_poll()"
function instead of zero which means no data is available.

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2016-05-09 11:52:57 +00:00
Pedro F. Giffuni
1ce4275dd2 sys/compat/linux*: spelling fixes.
Mostly on comments but there are some user-visible messages as well.

MFC after: 2 weeks
2016-04-30 00:53:10 +00:00
Pedro F. Giffuni
72ffecf147 ndis: spelling fixes in comments.
No functional change.
2016-04-30 00:35:46 +00:00
Pedro F. Giffuni
2bede1b82a x86bios: spelling fix in a comment.
No functional change.
2016-04-30 00:34:04 +00:00
Pedro F. Giffuni
5cb9d60e30 x86bios_alloc(): Unsign a counter.
The value can't even be signed so we can avoid the signed vs. unsigned
comparison.

Reviewed by:	jkim
2016-04-29 20:22:10 +00:00
Pedro F. Giffuni
5fe2c518bd ndis(4): it's rather unrealistic to expect a size_t here.
int was actually OK, and u_int is more than enough.
2016-04-28 03:19:53 +00:00
Pedro F. Giffuni
9119df34df ndis(4): unsign some indexes to prevent overflows.
The "len" parameter is uint32_t, indexing it with an int may
end up in a signed integer overflow.

strlen(3) returns an integer of size_t so the corresponding index should
have that size.

MFC after:	1 week
2016-04-28 01:58:56 +00:00
Conrad Meyer
aa90aec270 osd(9): Change array pointer to array pointer type from void*
This is a minor follow-up to r297422, prompted by a Coverity warning.  (It's
not a real defect, just a code smell.)  OSD slot array reservations are an
array of pointers (void **) but were cast to void* and back unnecessarily.
Keep the correct type from reservation to use.

osd.9 is updated to match, along with a few trivial igor fixes.

Reported by:	Coverity
CID:		1353811
Sponsored by:	EMC / Isilon Storage Division
2016-04-26 19:57:35 +00:00
Pedro F. Giffuni
55e0987aea sys: extend use of the howmany() macro when available.
We have a howmany() macro in the <sys/param.h> header that is
convenient to re-use as it makes things easier to read.
2016-04-26 15:38:17 +00:00
Jamie Gritton
d56cf22d22 linux_map_osrel doesn't need to be checked in linux_prison_set,
since it already was in linux_prison_check.
2016-04-25 06:08:45 +00:00
Dmitry Chagin
6c960bf07f Allow to build svr4 module with SYSV support separatelly from the kernel build.
PR:		208464
Reported by:	Kristoffer Eriksson
MFC after:	2 week
2016-04-23 20:31:18 +00:00
Dmitry Chagin
9f8621b1d5 Fix streams and svr4 module dependency. Both modules are complaining about
undefined symbol svr4_delete_socket which was moved from streams to the svr4 module
in r160558 that created a two-way dependency between them.

PR:		208464
Submitted by:	Kristoffer Eriksson
Reported by:	Kristoffer Eriksson
MFC after:	2 week
2016-04-23 20:29:55 +00:00
Pedro F. Giffuni
b66bb393f2 Cleanup redundant parenthesis from existing howmany()/roundup() macro uses. 2016-04-22 16:57:42 +00:00
Conrad Meyer
8d340432aa linprocfs_doproclimits: Initialize error return before use
Reported by:	Coverity
CID:		1354623
Sponsored by:	EMC / Isilon Storage Division
2016-04-20 01:03:06 +00:00
Conrad Meyer
e78adba3fe linprocfs: Don't print uninitialized values
Reported by:	Coverity
CID:		1354624
Sponsored by:	EMC / Isilon Storage Division
2016-04-20 01:00:13 +00:00
Pedro F. Giffuni
02abd40029 kernel: use our nitems() macro when it is available through param.h.
No functional change, only trivial cases are done in this sweep,

Discussed in:	freebsd-current
2016-04-19 23:48:27 +00:00
Pedro F. Giffuni
500ed14d6e compat/linux: for pointers replace 0 with NULL.
plvc is a pointer, no functional change.

Found with devel/coccinelle.
2016-04-15 16:21:13 +00:00
Pedro F. Giffuni
74b8d63dcc Cleanup unnecessary semicolons from the kernel.
Found with devel/coccinelle.
2016-04-10 23:07:00 +00:00
Dmitry Chagin
5743aa47f5 More complete implementation of /proc/self/limits.
Fix the way the code accesses process limits struct - pointed out by mjg@.

PR:		207386
Reviewed by:	no objection form des@
MFC after:	3 weeks
2016-04-10 07:11:29 +00:00
Ed Schouten
ab83575070 Make CloudABI's way of doing TLS more friendly to userspace emulators.
We're currently seeing how hard it would be to run CloudABI binaries on
operating systems cannot be modified easily (Windows, Mac OS X). The
idea is that we want to just run them without any sandboxing. Now
that CloudABI executables are PIE, this is already a bit easier, but TLS
is still problematic:

- CloudABI executables want to write to the %fs, which typically
  requires extra system calls by the emulator every time it needs to
  switch between CloudABI's and its own TLS.

- If CloudABI executables overwrite the %fs base unconditionally, it
  also becomes harder for the emulator to store a backup of the old
  value of %fs. To solve this, let's no longer overwrite %fs, but just
  %fs:0.

As CloudABI's C library does not use a TCB, this space can now be used
by an emulator to keep track of its internal state. The executable can
now safely overwrite %fs:0, as long as it makes sure that the TCB is
copied over to the new TLS area.

Ensure that there is an initial TLS area set up when the process starts,
only containing a bogus TCB. We don't really care about its contents on
FreeBSD.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D5836
2016-04-06 11:11:31 +00:00
Pedro F. Giffuni
ae26eab161 Fix indentation oops. 2016-04-03 14:40:54 +00:00
Dmitry Chagin
8bc21bafba Move Linux specific times tests up to guarantee the values are defined.
CID:		1305178
Submitted by:	pfg@
MFC after:	1 week
2016-04-03 06:33:16 +00:00
Sepherosa Ziehau
1ea448225c tcp/lro: Change SLIST to LIST, so that removing an entry is O(1)
This is kinda critical to the performance when the CPU is slow and
network bandwidth is high, e.g. in the hypervisor.

Reviewed by:	rrs, gallatin, Dexuan Cui <decui microsoft com>
Sponsored by:	Microsoft OSTC
Differential Revision:	https://reviews.freebsd.org/D5765
2016-04-01 06:43:05 +00:00
Ed Schouten
4a8b3b18cc Make Position Independent Executables work for CloudABI.
- Set BI_CAN_EXEC_DYN, so we can execute ET_DYN ELF files in addition to
  regular ET_EXECs.
- Provide an AT_BASE entry in the auxiliary vector, so the executable
  knows at which address it got loaded and can apply relocations.
2016-03-31 18:52:00 +00:00
Ed Schouten
2054309b6a Regenerate system call table after r297468. 2016-03-31 18:50:52 +00:00
Ed Schouten
38526a2cf1 Sync in the latest CloudABI system call definitions.
Some time ago I made a change to merge together the memory scope
definitions used by mmap (MAP_{PRIVATE,SHARED}) and lock objects
(PTHREAD_PROCESS_{PRIVATE,SHARED}). Though that sounded pretty smart
back then, it's backfiring. In the case of mmap it's used with other
flags in a bitmask, but for locking it's an enumeration. As our plan is
to automatically generate bindings for other languages, that looks a bit
sloppy.

Change all of the locking functions to use separate flags instead.

Obtained from:	https://github.com/NuxiNL/cloudabi
2016-03-31 18:50:06 +00:00
Navdeep Parhar
b03384114d Add wait_event_interruptible_timeout to linuxkpi.
Submitted by:	Krishnamraju Eraparaju @ Chelsio
Reviewed by:	hselasky@
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D5776
2016-03-31 17:11:58 +00:00
Hans Petter Selasky
9ad5ce9d01 Fix bugs in currently unused bit searching loop.
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2016-03-31 06:19:15 +00:00
Jamie Gritton
7ab25e3d18 Use osd_reserve / osd_jail_set_reserved, which is known to succeed.
Also don't work around nonexistent osd_register failure.
2016-03-30 17:05:04 +00:00
Gleb Smirnoff
9c64cfe56c The sendfile(2) allows to send extra data from userspace before the file
data (headers).  Historically the size of the headers was not checked
against the socket buffer space.  Application could easily overcommit the
socket buffer space.

With the new sendfile (r293439) the problem remained, but a KASSERT was
inserted that checked that amount of data written to the socket matches
its space.  In case when size of headers is bigger that socket space,
KASSERT fires.  Without INVARIANTS the new sendfile won't panic, but
would report incorrect amount of bytes sent.

o With this change, the headers copyin is moved down into the cycle, after
  the sbspace() check.  The uio size is trimmed by socket space there,
  which fixes the overcommit problem and its consequences.
o The compatibility handling for FreeBSD 4 sendfile headers API is pushed
  up the stack to syscall wrappers.  This required a copy and paste of the
  code, but in turn this allowed to remove extra stack carried parameter
  from fo_sendfile_t, and embrace entire compat code into #ifdef.  If in
  future we got more fo_sendfile_t function, the copy and paste level would
  even reduce.

Reviewed by:	emax, gallatin, Maxim Dounin <mdounin mdounin.ru>
Tested by:	Vitalij Satanivskij <satan ukr.net>
Sponsored by:	Netflix
2016-03-29 19:57:11 +00:00
Dmitry Chagin
7c5982000d Revert r297310 as the SOL_XXX are equal to the IPPROTO_XX except SOL_SOCKET.
Pointed out by:	ae@
2016-03-27 10:09:10 +00:00
Dmitry Chagin
c826fcfe22 iConvert Linux SOL_IPV6 level.
MFC after:	1 week
2016-03-27 08:12:01 +00:00
Dmitry Chagin
e667ee63f6 Whitespaces and style(9) fix. No functional changes.
MFC after:	1 week
2016-03-27 08:10:20 +00:00
Dmitry Chagin
09806d8e3e When write(2) on eventfd object fails with the error EAGAIN do not return
the number of bytes written.

MFC after:	1 week
2016-03-26 19:16:53 +00:00
Dmitry Chagin
2bb14e7541 Implement O_NONBLOCK flag via fcntl(F_SETFL) for eventfd object.
MFC after:	1 week
2016-03-26 19:15:23 +00:00
Ed Schouten
3cf50041ef Regenerate system call table after r297247. 2016-03-24 21:49:39 +00:00
Ed Schouten
1f3bbfd875 Replace the CloudABI system call table by a machine generated version.
The type definitions and constants that were used by COMPAT_CLOUDABI64
are a literal copy of some headers stored inside of CloudABI's C
library, cloudlibc. What is annoying is that we can't make use of
cloudlibc's system call list, as the format is completely different and
doesn't provide enough information. It had to be synced in manually.

We recently decided to solve this (and some other problems) by moving
the ABI definitions into a separate file:

	https://github.com/NuxiNL/cloudabi/blob/master/cloudabi.txt

This file is processed by a pile of Python scripts to generate the
header files like before, documentation (markdown), but in our case more
importantly: a FreeBSD system call table.

This change discards the old files in sys/contrib/cloudabi and replaces
them by the latest copies, which requires some minor changes here and
there. Because cloudabi.txt also enforces consistent names of the system
call arguments, we have to patch up a small number of system call
implementations to use the new argument names.

The new header files can also be included directly in FreeBSD kernel
space without needing any includes/defines, so we can now remove
cloudabi_syscalldefs.h and cloudabi64_syscalldefs.h. Patch up the
sources to include the definitions directly from sys/contrib/cloudabi
instead.
2016-03-24 21:47:15 +00:00
Dmitry Chagin
2ad0231309 Check bsd_to_linux_statfs() return value. Forgotten in r297070.
MFC after:	1 week
2016-03-20 19:06:21 +00:00
Dmitry Chagin
525c9796c3 Return EOVERFLOW in case when actual statfs values are large enough and
not fit into 32 bit fileds of a Linux struct statfs.

PR:		181012
MFC after:	1 week
2016-03-20 18:31:30 +00:00
Dmitry Chagin
7958a34cb5 Whitespaces, style(9) fixes. No functional changes.
MFC after:	1 week
2016-03-20 14:06:27 +00:00
Dmitry Chagin
99546279d6 Implement fstatfs64 system call.
PR:		181012
Submitted by:	John Wehle
MFC after:	1 week
2016-03-20 13:21:20 +00:00
Dmitry Chagin
4525bb829f Rework r296543:
1. Limit secs to INT32_MAX / 2 to avoid errors from kern_setitimer().
   Assert that kern_setitimer() returns 0.
   Remove bogus cast of secs.
   Fix style(9) issues.

2. Increment the return value if the remaining tv_usec value more than 500000 as a Linux does.

Pointed out by: [1] Bruce Evans

MFC after:	1 week
2016-03-20 11:40:52 +00:00
Justin Hibbits
da1b038af9 Use uintmax_t (typedef'd to rman_res_t type) for rman ranges.
On some architectures, u_long isn't large enough for resource definitions.
Particularly, powerpc and arm allow 36-bit (or larger) physical addresses, but
type `long' is only 32-bit.  This extends rman's resources to uintmax_t.  With
this change, any resource can feasibly be placed anywhere in physical memory
(within the constraints of the driver).

Why uintmax_t and not something machine dependent, or uint64_t?  Though it's
possible for uintmax_t to grow, it's highly unlikely it will become 128-bit on
32-bit architectures.  64-bit architectures should have plenty of RAM to absorb
the increase on resource sizes if and when this occurs, and the number of
resources on memory-constrained systems should be sufficiently small as to not
pose a drastic overhead.  That being said, uintmax_t was chosen for source
clarity.  If it's specified as uint64_t, all printf()-like calls would either
need casts to uintmax_t, or be littered with PRI*64 macros.  Casts to uintmax_t
aren't horrible, but it would also bake into the API for
resource_list_print_type() either a hidden assumption that entries get cast to
uintmax_t for printing, or these calls would need the PRI*64 macros.  Since
source code is meant to be read more often than written, I chose the clearest
path of simply using uintmax_t.

Tested on a PowerPC p5020-based board, which places all device resources in
0xfxxxxxxxx, and has 8GB RAM.
Regression tested on qemu-system-i386
Regression tested on qemu-system-mips (malta profile)

Tested PAE and devinfo on virtualbox (live CD)

Special thanks to bz for his testing on ARM.

Reviewed By: bz, jhb (previous)
Relnotes:	Yes
Sponsored by:	Alex Perez/Inertial Computing
Differential Revision: https://reviews.freebsd.org/D4544
2016-03-18 01:28:41 +00:00
John Baldwin
823590d40e Regen. 2016-03-12 22:55:07 +00:00
John Baldwin
8d91aced32 Regen. 2016-03-09 19:06:46 +00:00
John Baldwin
399e8c1773 Simplify AIO initialization now that it is standard.
- Mark AIO system calls as STD and remove the helpers to dynamically
  register them.
- Use COMPAT6 for the old system calls with the older sigevent instead of
  an 'o' prefix.
- Simplify the POSIX configuration to note that AIO is always available.
- Handle AIO in the default VOP_PATHCONF instead of special casing it in
  the pathconf() system call.  fpathconf() is still hackish.
- Remove freebsd32_aio_cancel() as it just called the native one directly.

Reviewed by:	kib
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D5589
2016-03-09 19:05:11 +00:00
Andrey V. Elsukov
86a9058b01 Add support for IPPROTO_IPV6 socket layer for getsockopt/setsockopt calls.
Also add mapping for several options from RFC 3493 and 3542.

Reviewed by:	dchagin
Tested by:	Joe Love <joe at getsomwhere dot net>
MFC after:	2 weeks
2016-03-09 09:12:40 +00:00
Dmitry Chagin
a87488d1e4 Better english.
Submitted by:	Kevin P. Neal
MFC after:	1 week
2016-03-08 19:40:01 +00:00
Dmitry Chagin
649ca5e9dc Put a commit message from r296502 about Linux alarm() system call
behaviour to the source.

Suggested by:	emaste@

MFC after:	1 week
2016-03-08 19:20:57 +00:00
Dmitry Chagin
91f514e413 Does not leak fp. While here remove bogus cast of fp->f_data.
MFC after:	1 week
2016-03-08 15:55:43 +00:00
Dmitry Chagin
fc4b98fb88 Linux accept() system call return EOPNOTSUPP errno instead of EINVAL
for UDP sockets.

MFC after:	1 week
2016-03-08 15:15:34 +00:00
Dmitry Chagin
15c3b371e2 According to POSIX and Linux implementation the alarm() system call
is always successfull.
So, ignore any errors and return 0 as a Linux do.

XXX. Unlike POSIX, Linux in case when the invalid seconds value specified
always return 0, so in that case Linux does not return proper remining time.

MFC after:	1 week
2016-03-08 15:12:49 +00:00
Dmitry Chagin
9f4e66afb9 Link the newly created process to the corresponding parent as
if CLONE_PARENT is set, then the parent of the new process will
be the same as that of the calling process.

MFC after:	1 week
2016-03-08 15:08:22 +00:00
Hans Petter Selasky
cb19abd277 Run the LinuxKPI PCI shutdown handler free of the Giant mutex.
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-03-07 14:35:31 +00:00
Hans Petter Selasky
510ebed7be Add more functions to the LinuxKPI.
Define strnicmp as a function macro instead of a regular macro while
at it.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-03-03 09:56:04 +00:00
Mark Johnston
0acf5d0bfd Improve error handling for posix_fallocate(2) and posix_fadvise(2).
- Set td_errno so that ktrace and dtrace can obtain the syscall error
  number in the usual way.
- Pass negative error numbers directly to the syscall layer, as they're
  not intended to be returned to userland.

Reviewed by:	kib
Sponsored by:	EMC / Isilon Storage Division
Differential Revision: https://reviews.freebsd.org/D5425
2016-02-25 19:58:23 +00:00
Ed Schouten
c0af8d16d8 Call cap_rights_init() properly.
Even though or'ing the individual rights works in this specific case, it
may not work in general. Pass them in as varargs.
2016-02-24 10:54:26 +00:00
Ed Schouten
70907712be Make handling of mmap()'s prot argument more strict.
- Make the system call fail if prot contains bits other than read, write
  and exec.
- Similar to OpenBSD's W^X, don't allow write and exec to be set at the
  same time. I'd like to see for now what happens if we enforce this
  policy unconditionally. If it turns out that this is far too strict,
  we'll loosen this requirement.
2016-02-23 09:22:00 +00:00
Svatopluk Kraus
35a0bc1260 As <machine/vmparam.h> is included from <vm/vm_param.h>, there is no
need to include it explicitly when <vm/vm_param.h> is already included.

Suggested by:	alc
Reviewed by:	alc
Differential Revision:	https://reviews.freebsd.org/D5379
2016-02-22 09:08:04 +00:00
Svatopluk Kraus
a1e1814d76 As <machine/pmap.h> is included from <vm/pmap.h>, there is no need to
include it explicitly when <vm/pmap.h> is already included.

Reviewed by:	alc, kib
Differential Revision:	https://reviews.freebsd.org/D5373
2016-02-22 09:02:20 +00:00
Dag-Erling Smørgrav
f4d6a773f8 Implement /proc/$$/limits.
PR:		207386
Submitted by:	Szymon Śliwa <knight.erraunt@gmail.com>
MFC after:	3 weeks
2016-02-21 14:56:05 +00:00
Jung-uk Kim
c2a9e596ed Silence VPS-Studio errors (V512). These buffer underflows are intentional. 2016-02-18 19:37:39 +00:00
Konstantin Belousov
db57c70a5b Rename P_KTHREAD struct proc p_flag to P_KPROC.
I left as is an apparent bug in ntoskrnl_var.h:AT_PASSIVE_LEVEL()
definition.

Suggested by:	jhb
Sponsored by:	The FreeBSD Foundation
2016-02-09 16:30:16 +00:00
Mateusz Guzik
813361c140 fork: plug a use after free of the returned process
fork1 required its callers to pass a pointer to struct proc * which would
be set to the new process (if any). procdesc and racct manipulation also
used said pointer.

However, the process could have exited prior to do_fork return and be
automatically reaped, thus making this a use-after-free.

Fix the problem by letting callers indicate whether they want the pid or
the struct proc, return the process in stopped state for the latter case.

Reviewed by:	kib
2016-02-04 04:25:30 +00:00
Mateusz Guzik
33fd9b9a2b fork: pass arguments to fork1 in a dedicated structure
Suggested by:	kib
2016-02-04 04:22:18 +00:00
Hans Petter Selasky
fe68f570d4 Update and add various macros to the LinuxKPI and resolve a macro
redefinition issue in the cxgb driver.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
Reviewed by:	np @
2016-01-26 15:26:35 +00:00
Hans Petter Selasky
c7c96d1093 LinuxKPI list updates:
- Add some new hlist macros.
- Update existing hlist macros removing the need for a temporary
  iteration variable.
- Properly define the RCU hlist macros to be SMP safe with regard
  to RCU.
- Safe list macro arguments by adding a pair of parentheses.
- Prefix the _list_add() and _list_splice() functions with "linux"
  to reflect they are LinuxKPI internal functions.

Obtained from:	Linux
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-01-26 15:12:31 +00:00
Hans Petter Selasky
e6ef991e5e Implement ether_addr_equal(), ether_addr_equal_64bits() and
random_ether_addr() for the LinuxKPI.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-01-26 14:36:16 +00:00
Hans Petter Selasky
f15ffb5e63 Implement is_vlan_dev() and vlan_dev_vlan_id() for the LinuxKPI.
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-01-26 14:33:20 +00:00
Hans Petter Selasky
e28297940b Implement bitmap_weight() and bitmap_equal() for the LinuxKPI.
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-01-26 14:31:20 +00:00
Hans Petter Selasky
d211177315 Add more network related macros and functions to the LinuxKPI.
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-01-26 14:29:50 +00:00
Hans Petter Selasky
4a19cc98b3 Add definition for the NETDEV_CHANGE event and tidy up the LinuxKPI
notifier header file a bit while at it.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-01-26 14:27:00 +00:00
Hans Petter Selasky
9f34efb9f4 Define __get_user() and __put_user() for the LinuxKPI.
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-01-26 14:21:30 +00:00
Hans Petter Selasky
a65ef21558 Add more LinuxKPI PCI related functions and defines.
Removed comments deriving from Linux.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-01-26 14:20:25 +00:00
Hans Petter Selasky
f919b7a664 Implement 64-bit atomic operations for the LinuxKPI.
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-01-21 17:56:23 +00:00
Hans Petter Selasky
29cbb3bef2 LinuxKPI atomic fixes:
- Fix implementation of atomic_add_unless(). The atomic_cmpset_int()
  function returns a boolean and not the previous value of the atomic
  variable.
- The atomic counters should be signed according to Linux.
- Some minor cosmetics and styling while at it.

Reviewed by:	alfred @
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-01-21 17:52:55 +00:00
Hans Petter Selasky
1f6112d50a Use function macro instead of non-function macro to reduce chance of
incorrect expansion.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-01-21 17:36:06 +00:00
Hans Petter Selasky
2d1bee654a Implement idr_preload(), idr_preload_end(), idr_alloc() and
idr_alloc_cyclic() in the LinuxKPI. Bump the FreeBSD version to
force recompilation of all KLDs due to IDR structure size change.

MFC after:	2 weeks
Sponsored by:	Mellanox Technologies
2016-01-21 14:57:45 +00:00
John Baldwin
e23cd1b923 Initialize vm_page_prot to VM_MEMATTR_DEFAULT instead of 0.
If a driver's Linux mmap callback passed vm_page_prot through unchanged,
then linux_dev_mmap_single() would try to apply whatever VM_MEMATTR_xxx
value 0 is to the mapping.  On x86, VM_MEMATTR_DEFAULT is the PAT value
for write-back (WB) which is 6, while 0 maps to the PAT value for
uncacheable (UC).  Thus, any mmap request that did not explicitly set
page_prot was tried to map memory as UC triggering the warning in
sg_pager_getpages().

Tested by:	np
Reported by:	Krishnamraju Eraparaju @ Chelsio
MFC after:	3 days
Sponsored by:	Chelsio Communications
2016-01-20 00:14:34 +00:00
Dmitry Chagin
67968b35a0 Prevent double free of control in common sendmsg path as sosend
already freeing it.
2016-01-17 19:28:13 +00:00
Hans Petter Selasky
7d1333938f Implement support for PCI suspend, resume and shutdown events in the
LinuxKPI. Fix a few spaces to tabs. Bump the FreeBSD version to force
recompilation of existing KMODs.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-01-15 11:18:58 +00:00
Gleb Smirnoff
c8358c6e0d Call crextend() before copying old credentials to the new credentials
and replace crcopysafe by crcopy as crcopysafe is is not intended to be
safe in a threaded environment, it drops PROC_LOCK() in while() that
can lead to unexpected results, such as overwrite kernel memory.

In my POV crcopysafe() needs special attention. For now I do not see
any problems with this function, but who knows.

Submitted by:	dchagin
Found by:	trinity
Security:	SA-16:04.linux
2016-01-14 10:16:25 +00:00
Gleb Smirnoff
037f750877 Change linux get_robust_list system call to match actual linux one.
The set_robust_list system call request the kernel to record the head
of the list of robust futexes owned by the calling thread. The head
argument is the list head to record.
The get_robust_list system call should return the head of the robust
list of the thread whose thread id is specified in pid argument.
The list head should be stored in the location pointed to by head
argument.

In contrast, our implemenattion of get_robust_list system call copies
the known portion of memory pointed by recorded in set_robust_list
system call pointer to the head of the robust list to the location
pointed by head argument.

So, it is possible for a local attacker to read portions of kernel
memory, which may result in a privilege escalation.

Submitted by:	mjg
Security:	SA-16:03.linux
2016-01-14 10:13:58 +00:00
Dmitry Chagin
6437b8e7d9 Unlock process lock when return error from getrobustlist call and add
an forgotten dtrace probe when return the same error.

MFC after:	3 days
XMFC with:	r292743
2016-01-10 07:36:43 +00:00
Dmitry Chagin
038c720553 Implement vsyscall hack. Prior to 2.13 glibc uses vsyscall
instead of vdso. An upcoming linux_base-c6 needs it.

Differential Revision:  https://reviews.freebsd.org/D1090

Reviewed by:	kib, trasz
MFC after:	1 week
2016-01-09 20:18:53 +00:00
Hans Petter Selasky
0c510167fb LinuxKPI style changes:
- Properly prefix internal functions with "linux_" instead of only a
  single underscore to avoid future namespace collisions.
- Make some functions global instead of inline to ease debugging and
  to avoid unnecessary code duplication.
- Remove no longer existing kthread_create() function's prototype.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-01-08 10:04:19 +00:00
Hans Petter Selasky
e10c4cc0a4 Implement RCU mechanism using shared exclusive locks.
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-01-05 12:22:45 +00:00
Hans Petter Selasky
b648035313 Handle when filedescriptors are closed before initialized. An early
fdclose() call can cause fget_unlocked() to fail.

Found by:	mjg @
MFC after:	1 week
Reviewed by:	Mark Block <markb@mellanox.com>
Sponsored by:	Mellanox Technologies
Differential Revision:	https://reviews.freebsd.org/D4351
2015-12-31 14:47:45 +00:00
Hans Petter Selasky
06204f8e25 Minor LinuxKPI code cleanup:
- Declare some static functions in linux_compat.c instead if inside
  various header files.
- Prefix FreeBSD local functions in the LinuxKPI with "linux_" to
  avoid symbol name conflicts in the future and to make debugging
  easier.
- Make the "struct kobj_ktype" declaractions constant to shave off a
  few bytes from the data segment.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2015-12-31 12:30:19 +00:00
Hans Petter Selasky
337cb9f04c Make the kobject refcounting compliant with Linux. Refcounting on the
parent kobject cannot be factored out and must be done by the kobject
consumers.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2015-12-31 11:27:36 +00:00
Hans Petter Selasky
260194052e Reduce memory consumption when allocating kobject strings in the
LinuxKPI. Compute string length before allocating memory instead of
using fixed size allocations. Make kobject_set_name_vargs() global
instead of inline to save some bytes when compiling.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2015-12-28 18:20:05 +00:00
Dmitry Chagin
bfb5568a3c Return EINVAL in case of incorrect sigev_signo value specified instead of panicing. 2015-12-26 09:09:49 +00:00
Dmitry Chagin
6e5549717a Do not allow access to emuldata for non Linux processes.
Pointed out by:	mjg@
Security:	https://admbugs.freebsd.org/show_bug.cgi?id=679
2015-12-26 09:04:47 +00:00
Hans Petter Selasky
c4e58b4efe Implement drain_workqueue() function.
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2015-12-21 12:20:02 +00:00
Hans Petter Selasky
9782763db2 In the zero delay case in queue_delayed_work() use the return value
from taskqueue_enqueue() instead of reading "ta_pending" unlocked and
also ensure the callout is stopped before proceeding.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2015-12-21 12:13:03 +00:00
Hans Petter Selasky
55d445d317 Minor workqueue cleanup:
- Make some functions global instead of inline to ease debugging.
- Fix some minor style issues.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2015-12-21 11:58:59 +00:00
Hans Petter Selasky
c094330345 Implement sleepable RCU mechanism using shared exclusive locks.
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2015-12-21 11:03:12 +00:00
Hans Petter Selasky
cee21041cf Implement ACCESS_ONCE(), WRITE_ONCE() and READ_ONCE().
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2015-12-21 10:56:38 +00:00
Mark Johnston
3616095801 Fix style issues around existing SDT probes.
- Use SDT_PROBE<N>() instead of SDT_PROBE(). This has no functional effect
  at the moment, but will be needed for some future changes.
- Don't hardcode the module component of the probe identifier. This is
  set automatically by the SDT framework.

MFC after:	1 week
2015-12-16 23:39:27 +00:00
Hans Petter Selasky
f837e46d16 Add some structures and defines which will be used when decoding small
form factor, SFF, standards compliant ethernet EEPROMs.

MFC after:	1 week
Obtained from:	Linux
Sponsored by:	Mellanox Technologies
2015-12-03 12:51:54 +00:00
Hans Petter Selasky
9ce5ab9ce6 Remove incorrect defines. The proper version of these macros is
defined in linux/etherdevice.h.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2015-12-03 11:45:12 +00:00
Hans Petter Selasky
52ba05767f Add more functions and types to the LinuxKPI.
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2015-11-30 09:24:12 +00:00
Konstantin Belousov
724f4b62b0 Remove sv_prepsyscall, sv_sigsize and sv_sigtbl members of the struct
sysent.

sv_prepsyscall is unused.

sv_sigsize and sv_sigtbl translate signal number from the FreeBSD
namespace into the ABI domain.  It is only utilized on i386 for iBCS2
binaries.  The issue with this approach is that signals for iBCS2 were
delivered with the FreeBSD signal frame layout, which does not follow
iBCS2.  The same note is true for any other potential user if
sv_sigtbl.  In other words, if ABI needs signal number translation, it
really needs custom sv_sendsig method instead.

Sponsored by:	The FreeBSD Foundation
2015-11-28 08:49:07 +00:00
Konstantin Belousov
5e27d79314 Split kerne timekeep ABI structure vdso_sv_tk out of the struct
sysentvec.  This allows the timekeep data to be shared between similar
ABIs which cannot share sysentvec.

Make the timekeep_push_vdso() tick callback to the timekeep structures
instead of sysentvecs.  If several sysentvec share the vdso_sv_tk
structure, we would update the userspace data several times on each
tick, without the change.

Only allocate vdso_sv_tk in the exec_sysvec_init() sysinit when
sysentvec is marked with the new SV_TIMEKEEP flag.  This saves
allocation and update of unneeded vdso_sv_tk for ABIs which do not
provide userspace gettimeofday yet, which are PowerPCs arches right
now.

Make vdso_sv_tk allocator public, namely split out and export
alloc_sv_tk() and alloc_sv_tk_compat32().  ABIs which share timekeep
data now can allocate it manually and share as appropriate.

Requested by:	nwhitehorn
Tested by:	nwhitehorn, pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-11-23 07:09:35 +00:00
Hans Petter Selasky
f727a767e9 Add assert and note about the size of "unsigned long" inside the
LinuxKPI for the future.

Sponsored by:	Mellanox Technologies
2015-11-13 09:00:39 +00:00
Hans Petter Selasky
86845417d1 Build fixes:
- Add some missing I/O functions for non-i386 and amd64 platforms.
- Stub ioremap() to NULL using a macro to ensure non-existing memory
  attributes are not referred when they do not exist.
- Add more header files to linux/list.h to resolve driver compilation
  issues on Sparc64 and PowerPC platforms.

Sponsored by:	Mellanox Technologies
2015-11-12 09:18:22 +00:00
Conrad Meyer
3b383c9ede linuxkpi/sysfs.h: Cast arg2 through intptr_t to avoid GCC warning
The code compiles fine under Clang, but GCC on PPC is less permissive about
integer and pointer sizes.  (An intmax_t is clearly *large enough* to hold a
pointer value.)

Another follow-up to r290475.

Reported by:	jhibbits
Sponsored by:	EMC / Isilon Storage Division
2015-11-09 16:50:42 +00:00
Hans Petter Selasky
8e7baabc9f Make all the LinuxKPI include files compile standalone.
Sponsored by:	Mellanox Technologies
2015-11-03 12:37:55 +00:00
Hans Petter Selasky
8d59ecb214 Finish process of moving the LinuxKPI module into the default kernel build.
- Move all files related to the LinuxKPI into sys/compat/linuxkpi and
  its subfolders.
- Update sys/conf/files and some Makefiles to use new file locations.
- Added description of COMPAT_LINUXKPI to sys/conf/NOTES which in turn
  adds the LinuxKPI to all LINT builds.
- The LinuxKPI can be added to the kernel by setting the
  COMPAT_LINUXKPI option. The OFED kernel option no longer builds the
  LinuxKPI into the kernel. This was done to keep the build rules for
  the LinuxKPI in sys/conf/files simple.
- Extend the LinuxKPI module to include support for USB by moving the
  Linux USB compat from usb.ko to linuxkpi.ko.
- Bump the FreeBSD_version.
- A universe kernel build has been done.

Reviewed by:	np @ (cxgb and cxgbe related changes only)
Sponsored by:	Mellanox Technologies
2015-10-29 08:28:39 +00:00
Konstantin Belousov
cd0a26c53f Fix build for the KTR-enabled kernels.
Sponsored by:	The FreeBSD Foundation
2015-10-23 11:41:55 +00:00
Ed Schouten
b78ef4bd86 Refactoring: move out generic bits from cloudabi64_sysvec.c.
In order to make it easier to support CloudABI on ARM64, move out all of
the bits from the AMD64 cloudabi_sysvec.c into a new file
cloudabi_module.c that would otherwise remain identical. This reduces
the AMD64 specific code to just ~160 lines.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D3974
2015-10-22 09:07:53 +00:00
Ed Schouten
808d980506 Properly format pointer size independent CloudABI system calls.
CloudABI has approximately 50 system calls that do not depend on the
pointer size of the system. As the ABI is pretty compact, it takes
little effort to each truss(8) the formatting rules for these system
calls. Start off by formatting pointer size independent system calls.

Changes:

- Make it possible to include the CloudABI system call definitions in
  FreeBSD userspace builds. Add ${root}/sys to the truss(8) Makefile so
  we can pull in <compat/cloudabi/cloudabi_syscalldefs.h>.
- Refactoring: patch up amd64-cloudabi64.c to use the CLOUDABI_*
  constants instead of rolling our own table.
- Add table entries for all of the system calls.
- Add new generic formatting types (UInt, IntArray) that we'll be using
  to format unsigned integers and arrays of integers.
- Add CloudABI specific formatting types.

Approved by:	jhb
Differential Revision:	https://reviews.freebsd.org/D3836
2015-10-08 05:27:45 +00:00
Bryan Drewery
a730673058 Remove redundant RFFPWAIT/vfork(2) handling in Linux fork(2) and clone(2) wrappers.
r161611 added some of the code from sys_vfork() directly into the Linux
module wrappers since they use RFSTOPPED.  In r232240, the RFFPWAIT handling
was moved to syscallret(), thus this code in the Linux module is no longer
needed as it will be called later.

This also allows the Linux wrappers to benefit from the fix in r275616 for
threads not getting suspended if their vforked child is stopped while they
wait on them.

Reviewed by:	jhb, kib
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D3828
2015-10-07 19:10:38 +00:00
Andriy Gapon
2f2f522b5d save some bytes by using more concise SDT_PROBE<n> instead of SDT_PROBE
SDT_PROBE requires 5 parameters whereas SDT_PROBE<n> requires n parameters
where n is typically smaller than 5.

Perhaps SDT_PROBE should be made a private implementation detail.

MFC after:	20 days
2015-09-28 12:14:16 +00:00
Edward Tomasz Napierala
089d32934a Fixes a panic triggered by threaded Linux applications when running
with RACCT/RCTL enabled.

Reviewed by:	ngie@, ed@
Tested by:	Larry Rosenman <ler@lerctr.org>
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D3470
2015-09-02 14:04:13 +00:00
Ed Schouten
bc1ace0b96 Decompose linkat()/renameat() rights to source and target.
To make it easier to understand how Capsicum interacts with linkat() and
renameat(), rename the rights to CAP_{LINK,RENAME}AT_{SOURCE,TARGET}.

This also addresses a shortcoming in Capsicum, where it isn't possible
to disable linking to files stored in a directory. Creating hardlinks
essentially makes it possible to access files with additional rights.

Reviewed by:	rwatson, wblock
Differential Revision:	https://reviews.freebsd.org/D3411
2015-08-27 15:16:41 +00:00
Ed Schouten
edcf7fbf59 Don't forget to invoke pre_execve() and post_execve().
CloudABI's proc_exec() was implemented before r282708 introduced
pre_execve() and post_execve(). Sync up by adding these missing calls.
2015-08-17 13:07:12 +00:00
Ed Schouten
fbb624e76f Add the last remaining system calls: send() and recv().
There is still one TODO item for these calls: add file descriptor
passing. The data structures are already prepared for this. It's just
the translation that's missing.

Obtained from:	http://github.com/NuxiNL/freebsd
2015-08-12 17:42:20 +00:00
Ed Schouten
2c20fbe43a Use CAP_EVENT instead of CAP_PDWAIT.
The cloudlibc pdwait() function ends up using FreeBSD's kqueue() in
combination with EVFILT_PROCDESC. This depends on CAP_EVENT -- not
CAP_PDWAIT.

Obtained from:	https://github.com/NuxiNL/freebsd
2015-08-12 11:07:03 +00:00
Ed Schouten
18528470cb Make blocking CloudABI futex operations work.
Blocking on locks and condition variables can be accomplished by polling
and using the special filters CONDVAR, LOCK_RDLOCK and LOCK_WRLOCK.

For now it wouldn't make sense to implement this functionality into
kqueue() itself, for the reason that they are CloudABI specific and
would require us to resize 'struct kevent' to hold all of the parameters
of interest.

Add a bandaid to the CloudABI poll system call to call into the futex
code directly if it detects specific combinations of events that are
used by the C library.

Obtained from:	https://github.com/NuxiNL/freebsd
2015-08-12 08:41:48 +00:00
Ed Schouten
322e16e87e Make poll() and kqueue() on CloudABI work.
This change implements two functions, cloudabi64_kevent_copyin() and
cloudabi64_kevent_copyout(), that convert CloudABI structures to
FreeBSD's struct kevent. CloudABI uses two structures: subscription_t
and event_t. The former is used for input, whereas the latter is used
for output. Unlike struct kevent, fields aren't overloaded for multiple
purposes or for separate event types.

For poll() we call into the newly introduced kern_kevent_anonymous()
function that allows us to poll without a file descriptor. This function
is not only used by poll(), but also by functions such as
sleep() and clock_nanosleep().

Reviewed by:	jmg
Obtained from:	https://github.com/NuxiNL/freebsd
Differential Revision:	https://reviews.freebsd.org/D3308
2015-08-12 07:59:00 +00:00
Ed Schouten
55a224afa2 Fall back to O_RDONLY -- not O_WRONLY.
If CloudABI processes open files with a set of requested rights that do
not match any of the privileges granted by O_RDONLY, O_WRONLY or O_RDWR,
we'd better fall back to O_RDONLY -- not O_WRONLY.
2015-08-11 14:08:46 +00:00
Ed Schouten
9d9123a80d Properly convert the error number to CloudABI's indexing.
We currently return FreeBSD's errno value directly, which is of course
not correct.
2015-08-11 14:07:04 +00:00
Ed Schouten
65c17fe451 Make cap_rights_limit() work for CloudABI processes.
Call into the recently introduced kern_cap_rights_limit() function to
restrict rights.
2015-08-11 08:44:19 +00:00
Ed Schouten
0f85ff377b Add file_open(): the underlying system call of openat().
CloudABI purely operates on file descriptor rights (CAP_*). File
descriptor access modes (O_ACCMODE) are emulated on top of rights.

Instead of accepting the traditional flags argument, file_open() copies
in an fdstat_t object that contains the initial rights the descriptor
should have, but also file descriptor flags that should persist after
opening (APPEND, NONBLOCK, *SYNC). Only flags that don't persist (EXCL,
TRUNC, CREAT, DIRECTORY) are passed in as an argument.

file_open() first converts the rights, the persistent flags and the
non-persistent flags to fflags. It then calls into vn_open(). If
successful, it installs the file descriptor with the requested
rights, trimming off rights that don't apply to the type of
the file that has been opened.

Unlike kern_openat(), this function does not support /dev/fd/*. I can't
think of a reason why we need to support this for CloudABI.

Obtained from:	https://github.com/NuxiNL/freebsd
Differential Revision:	https://reviews.freebsd.org/D3235
2015-08-06 06:47:28 +00:00
Ed Schouten
aaf53ab2aa Correct the previous commit: remove the DECLARE_MODULE().
It looks like a MODULE_VERSION() can also appear on its own -- there is
no need to use explicitly use DECLARE_MODULE(). Looking at other
modules, this seems common practice.
2015-08-05 16:53:49 +00:00
Ed Schouten
b6efa27589 Add DECLARE_MODULE() to the "cloudabi" kernel module.
This kernel module does not require any explicit initialization, but a
module declaration is needed to let the "cloudabi64" kernel module
automatically pull this in.

Obtained from:	https://github.com/NuxiNL/freebsd
2015-08-05 16:45:47 +00:00
Ed Schouten
36310bcd1d Make fcntl(F_SETFL) work.
The stat_put() system call can be used to modify file descriptor
attributes, such as flags, but also Capsicum permission bits. Support
for changing Capsicum bits will be added as soon as its dependent
changes have been pushed through code review.

Obtained from:	https://github.com/NuxiNL/freebsd
2015-08-05 16:15:43 +00:00
Ed Schouten
2412ae2b8e Regenerate the system call table. 2015-08-05 13:10:13 +00:00
Ed Schouten
2837d9ed43 Import the latest CloudABI system call definitions and table.
We're going to need these for next code I'm going to send out for
review: support for poll() and kqueue() on CloudABI.
2015-08-05 13:09:46 +00:00
Ed Schouten
db1c8ee585 Add the remaining pointer size independent CloudABI socket system calls.
CloudABI uses a structure called cloudabi_sockstat_t. Think of it as
'struct stat' for sockets. It is used by functions such as
getsockname(), getpeername(), some of the getsockopt() values, etc.

This change implements the sock_stat_get() system call that returns a
copy of this structure. The accept() system call should also return a
full copy of this structure eventually, but for now we're only
interested in the peer address. Add a TODO() to make sure this is
patched up later on.

Differential Revision:	https://reviews.freebsd.org/D3218
2015-08-05 08:18:05 +00:00
Ed Schouten
4958fab8cd Allow the creation of polling descriptors (kqueues) on CloudABI. 2015-08-05 07:37:06 +00:00
Ed Schouten
a2034cc98a Allow the creation of kqueues with a restricted set of Capsicum rights.
On CloudABI we want to create file descriptors with just the minimal set
of Capsicum rights in place. The reason for this is that it makes it
easier to obtain uniform behaviour across different operating systems.

By explicitly whitelisting the operations, we can return consistent
error codes, but also prevent applications from depending OS-specific
behaviour.

Extend kern_kqueue() to take an additional struct filecaps that is
passed on to falloc_caps(). Update the existing consumers to pass in
NULL.

Differential Revision:	https://reviews.freebsd.org/D3259
2015-08-05 07:36:50 +00:00
Ed Schouten
0c0964844e Let the CloudABI futex code use umtx_keys.
The CloudABI kernel still passes all of the cloudlibc unit tests.

Reviewed by:	vangyzen
Differential Revision:	https://reviews.freebsd.org/D3286
2015-08-04 06:02:03 +00:00
Ed Schouten
f52c3dd415 Allow CloudABI processes to create shared memory objects.
Summary:
Use the newly created `kern_shm_open()` function to create objects with
just the rights that are actually needed.

Reviewers: jhb, kib

Subscribers: imp

Differential Revision: https://reviews.freebsd.org/D3260
2015-08-01 07:51:48 +00:00
Ed Schouten
367a13f905 Limit rights on process descriptors.
On CloudABI, the rights bits returned by cap_rights_get() match up with
the operations that you can actually perform on the file descriptor.

Limiting the rights is good, because it makes it easier to get uniform
behaviour across different operating systems. If process descriptors on
FreeBSD would suddenly gain support for any new file operation, this
wouldn't become exposed to CloudABI processes without first extending
the rights.

Extend fork1() to gain a 'struct filecaps' argument that allows you to
construct process descriptors with custom rights. Use this in
cloudabi_sys_proc_fork() to limit the rights to just fstat() and
pdwait().

Obtained from:	https://github.com/NuxiNL/freebsd
2015-07-31 10:21:58 +00:00
Ed Schouten
8328babdd0 Make pipes in CloudABI work.
Summary:
Pipes in CloudABI are unidirectional. The reason for this is that
CloudABI attempts to provide a uniform runtime environment across
different flavours of UNIX.

Instead of implementing a custom pipe that is unidirectional, we can
simply reuse Capsicum permission bits to support this. This is nice,
because CloudABI already attempts to restrict permission bits to
correspond with the operations that apply to a certain file descriptor.

Replace kern_pipe() and kern_pipe2() by a single kern_pipe() that takes
a pair of filecaps. These filecaps are passed to the newly introduced
falloc_caps() function that creates the descriptors with rights in
place.

Test Plan:
CloudABI pipes seem to be created with proper rights in place:

https://github.com/NuxiNL/cloudlibc/blob/master/src/libc/unistd/pipe_test.c#L44

Reviewers: jilles, mjg

Reviewed By: mjg

Subscribers: imp

Differential Revision: https://reviews.freebsd.org/D3236
2015-07-29 17:18:27 +00:00
Ed Schouten
9d2332c9ee Split up Capsicum to CloudABI rights conversion into two separate routines.
CloudABI's openat() ensures that files are opened with the smallest set
of relevant rights. For example, when opening a FIFO, unrelated rights
like CAP_RECV are automatically removed. To remove unrelated rights, we
can just reuse the code for this that was already present in the rights
conversion function.
2015-07-29 12:42:45 +00:00
Ed Schouten
3720b82fa8 Implement CloudABI's readdir().
Summary:
CloudABI's readdir() system call could be thought of as a mixture
between FreeBSD's getdents(2) and pread(). Instead of using the file
descriptor offset, userspace provides a 64-bit cloudabi_dircookie_t
continue reading at a given point. CLOUDABI_DIRCOOKIE_START, having
value 0, can be used to return entries at the start of the directory.

The file descriptor offset is not used to store the cookie for the
reason that in a file descriptor centric environment, it would make
sense to allow concurrent use of a single file descriptor.

The remaining space returned by the system call should be filled with a
partially truncated copy of the next entry. The advantage of doing this
is that it gracefully deals with long filenames. If the C library
provides a buffer that is too small to hold a single entry, it can still
extract the directory entry header, meaning that it can retry the read
with a larger buffer or skip it using the cookie.

Test Plan:
This implementation passes the cloudlibc unit tests at:

	https://github.com/NuxiNL/cloudlibc/tree/master/src/libc/dirent

Reviewers: marcel, kib

Reviewed By: kib

Subscribers: imp

Differential Revision: https://reviews.freebsd.org/D3226
2015-07-29 06:31:44 +00:00
Ed Schouten
1d96fd8d9f Implement file attribute modification system calls for CloudABI.
CloudABI uses a system call interface to modify file attributes that is
more similar to KPI's/FUSE, namely where a stat structure is passed back
to the kernel, together with a bitmask of attributes that should be
changed. This would allow us to update any set of attributes atomically.

That said, I'd rather not go as far as to actually implement it that
way, as it would require us to duplicate more code than strictly needed.
Let's just stick to the combinations that are actually used by
cloudlibc.

Obtained from:	https://github.com/NuxiNL/freebsd
2015-07-28 12:57:19 +00:00
Ed Schouten
29515a68a5 Implement directory and FIFO creation.
The file_create() system call can be used to create files of a given
type. Right now it can only be used to create directories and FIFOs. As
CloudABI does not expose filesystem permissions, this system call lacks
a mode argument. Simply use 0777 or 0666 depending on the file type.
2015-07-28 06:50:47 +00:00
Ed Schouten
cec575201a Make fstat() and friends work.
Summary:
CloudABI provides access to two different stat structures:

- fdstat, containing file descriptor level status: oflags, file
  descriptor type and Capsicum rights, used by cap_rights_get(),
  fcntl(F_GETFL), getsockopt(SO_TYPE).
- filestat, containing your regular file status: timestamps, inode
  number, used by fstat().

Unlike FreeBSD's stat::st_mode, CloudABI file descriptor types don't
have overloaded meanings (e.g., returning S_ISCHR() for kqueues). Add a
utility function to extract the type of a file descriptor accurately.

CloudABI does not work with O_ACCMODEs. File descriptors have two sets
of Capsicum-style rights: rights that apply to the file descriptor
itself ('base') and rights that apply to any new file descriptors
yielded through openat() ('inheriting'). Though not perfect, we can
pretty safely decompose Capsicum rights to such a pair. This is done in
convert_capabilities().

Test Plan: Tests for these system calls are fairly extensive in cloudlibc.

Reviewers: jonathan, mjg, #manpages

Reviewed By: mjg

Subscribers: imp

Differential Revision: https://reviews.freebsd.org/D3171
2015-07-28 06:36:49 +00:00
Ed Schouten
af7e75f59d Add a futex implementation for CloudABI.
Summary:
CloudABI provides two different types of futex objects: read-write locks
and condition variables. There is no need to provide separate support
for once objects and thread joining, as these are efficiently simulated
by blocking on a read-write lock. Mutexes simply use read-write locks.

Condition variables always have a lock object associated to them. They
always know to which lock a thread needs to be migrated if woken up.
This allows us to implement requeueing. A broadcast on a condition
variable will never cause multiple threads to be woken up at once. They
will be woken up iteratively.

This implementation still has lots of room for improvement. Locking is
coarse and right now we use linked lists to store all of the locks and
condition variables, instead of using a hash table. The primary goal of
this implementation was to behave correctly. Performance will be
improved as we go.

Test Plan:
This futex implementation has been in use for the last couple of months
and seems to work pretty well. All of the cloudlibc and libc++ unit
tests seem to pass.

Reviewers: dchagin, kib, vangyzen

Subscribers: imp

Differential Revision: https://reviews.freebsd.org/D3148
2015-07-27 10:07:29 +00:00
Ed Schouten
533c8a29da Regenerate system call table. 2015-07-27 10:04:28 +00:00
Ed Schouten
f4c06d124f Sync in latest upstream system call definitions.
Futex object scopes have been renamed from using their own constants to
simply reusing the existing CLOUDABI_MAP_{PRIVATE,SHARED} flags, as they
are more accurate in this context.
2015-07-27 10:04:06 +00:00
Ed Schouten
4615998165 Implement the basic system calls that operate on pathnames.
Summary:
Unlike FreeBSD, CloudABI does not use null terminated strings for its
pathnames. Introduce a function called copyin_path() that can be used by
all of the filesystem system calls that use pathnames. This change
already implements the system calls that don't depend on any additional
functionality (e.g., conversion of struct stat).

Also implement the socket system calls that operate on pathnames, namely
the ones used by the C library functions bindat() and connectat(). These
don't receive a 'struct sockaddr_un', but just the pathname, meaning
they could be implemented in such a way that they don't depend on the
size of sun_path. For now, just use the existing interfaces.

Add a missing #include to cloudabi_syscalldefs.h to get this code to
build, as one of its macros depends on UINT64_C().

Test Plan:
These implementations have already been tested in the CloudABI branch on
GitHub. They pass all of the tests.

Reviewers: kib, pjd

Subscribers: imp

Differential Revision: https://reviews.freebsd.org/D3097
2015-07-24 07:46:02 +00:00
Ed Schouten
fef97e09d9 Allow us to create UNIX sockets and socketpairs in CloudABI processes. 2015-07-23 13:52:53 +00:00
Ed Schouten
c989441af6 Regenerate system call table. 2015-07-22 10:05:46 +00:00
Ed Schouten
73dcd7db56 Import upstream changes to the system call definitions.
Support has been added for providing the scope of a futex operation,
whether the futex is local to the process or shared between processes.
2015-07-22 10:04:53 +00:00
Ed Schouten
072cb63ddc Make clock_gettime() and clock_getres() work for CloudABI programs.
Though the standard C library uses a 'struct timespec' using a 64-bit
'time_t', there is no need to use such a type at the system call level.
CloudABI uses a simple 64-bit unsigned timestamp in nanoseconds. This is
sufficient to express any time value from 1970 to 2554.

The CloudABI low-level interface also supports fetching timestamp values
with a lower precision. Instead of overloading the clock ID argument for
this purpose, the system call provides a precision argument that may be
used to specify the maximum slack. The current system call
implementation does not use this information, but it's good to already
have this available.

Expose cloudabi_convert_timespec(), as we're going to need this for
fstat() as well.

Obtained from:	https://github.com/NuxiNL/freebsd
2015-07-21 15:08:13 +00:00
Ed Schouten
21d30b29d5 Make thread creation work for CloudABI processes.
Summary:
Remove the stub system call that was put in place during the system call
import and replace it by a target-dependent version stored in sys/amd64.
Initialize the thread in a way similar to cpu_set_upcall_kse(). We
provide the entry point with two arguments: the thread ID and the
argument pointer.

Test Plan:
Thread creation still seems to work, both for FreeBSD and CloudABI
binaries.

Reviewers: dchagin, mjg, kib

Reviewed By: kib

Subscribers: imp

Differential Revision: https://reviews.freebsd.org/D3110
2015-07-21 12:47:15 +00:00
Ed Schouten
62c31cffae Make forking of CloudABI processes work.
Just like FreeBSD+Capsicum, CloudABI uses process descriptors. Return
the file descriptor number to the parent process.

To the child process we both return a special value for the file
descriptor number (CLOUDABI_PROCESS_CHILD). We also return the thread ID
of the new thread in the copied process, so the threading library can
reinitialize itself.

Obtained from:	https://github.com/NuxiNL/freebsd
2015-07-20 13:46:22 +00:00
Marcelo Araujo
f19e47d691 Add support to the jail framework to be able to mount linsysfs(5) and
linprocfs(5).

Differential Revision:	D2846
Submitted by:		Nikolai Lifanov <lifanov@mail.lifanov.com>
Reviewed by:		jamie
2015-07-19 08:52:35 +00:00
Konstantin Belousov
b4490c6e93 The si_status field of the siginfo_t, provided by the waitid(2) and
SIGCHLD signal, should keep full 32 bits of the status passed to the
_exit(2).

Split the combined p_xstat of the struct proc into the separate exit
status p_xexit for normal process exit, and signalled termination
information p_xsig.  Kernel-visible macro KW_EXITCODE() reconstructs
old p_xstat from p_xexit and p_xsig.  p_xexit contains complete status
and copied out into si_status.

Requested by:	Joerg Schilling
Reviewed by:	jilles (previous version), pho
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
2015-07-18 09:02:50 +00:00
Ed Schouten
6256e57ba9 Implement CloudABI memory management system calls.
Add support for the <sys/mman.h> functions by wrapping around our own
implementations. There are no kern_*() variants of these system calls,
but we also don't need them in this case. It is sufficient to just call
into the sys_*() functions.

Differential Revision:	https://reviews.freebsd.org/D3033
Reviewed by:		brooks
2015-07-17 09:00:38 +00:00
Ed Schouten
6e5fcd99df Add a sysentvec for CloudABI on x86-64.
Summary:
For CloudABI we need to put two things on the stack of new processes:
the argument data (a binary blob; not strings) and a startup data
structure. The startup data structure contains interesting things such
as a pointer to the ELF program header, the thread ID of the initial
thread, a stack smashing protection canary, and a pointer to the
argument data.

Fetching system call arguments and setting the return value is similar
to FreeBSD. The only differences are that system call 0 does not exist
and that we call into cloudabi_convert_errno() to convert the error
code. We also need this function in a couple of other places, so we'd
better reuse it here.

Reviewers: dchagin, kib

Reviewed By: kib

Subscribers: imp

Differential Revision: https://reviews.freebsd.org/D3098
2015-07-16 18:24:06 +00:00
Ed Schouten
457f7e23b1 Implement CloudABI's exec() call.
Summary:
In a runtime that is purely based on capability-based security, there is
a strong emphasis on how programs start their execution. We need to make
sure that we execute an new program with an exact set of file
descriptors, ensuring that credentials are not leaked into the process
accidentally.

Providing the right file descriptors is just half the problem. There
also needs to be a framework in place that gives meaning to these file
descriptors. How does a CloudABI mail server know which of the file
descriptors corresponds to the socket that receives incoming emails?
Furthermore, how will this mail server acquire its configuration
parameters, as it cannot open a configuration file from a global path on
disk?

CloudABI solves this problem by replacing traditional string command
line arguments by tree-like data structure consisting of scalars,
sequences and mappings (similar to YAML/JSON). In this structure, file
descriptors are treated as a first-class citizen. When calling exec(),
file descriptors are passed on to the new executable if and only if they
are referenced from this tree structure. See the cloudabi-run(1) man
page for more details and examples (sysutils/cloudabi-utils).

Fortunately, the kernel does not need to care about this tree structure
at all. The C library is responsible for serializing and deserializing,
but also for extracting the list of referenced file descriptors. The
system call only receives a copy of the serialized data and a layout of
what the new file descriptor table should look like:

    int proc_exec(int execfd, const void *data, size_t datalen, const int *fds,
              size_t fdslen);

This change introduces a set of fd*_remapped() functions:

- fdcopy_remapped() pulls a copy of a file descriptor table, remapping
  all of the file descriptors according to the provided mapping table.
- fdinstall_remapped() replaces the file descriptor table of the process
  by the copy created by fdcopy_remapped().
- fdescfree_remapped() frees the table in case we aborted before
  fdinstall_remapped().

We then add a function exec_copyin_data_fds() that builds on top these
functions. It copies in the data and constructs a new remapped file
descriptor. This is used by cloudabi_sys_proc_exec().

Test Plan:
cloudabi-run(1) is capable of spawning processes successfully, providing
it data and file descriptors. procstat -f seems to confirm all is good.
Regular FreeBSD processes also work properly.

Reviewers: kib, mjg

Reviewed By: mjg

Subscribers: imp

Differential Revision: https://reviews.freebsd.org/D3079
2015-07-16 07:05:42 +00:00
Ed Schouten
952c6e1010 Implement the trivial socket system calls: shutdown() and listen(). 2015-07-15 11:27:34 +00:00
Ed Schouten
4fa92fb538 Make posix_fallocate() and posix_fadvise() work.
We can map these system calls directly to the FreeBSD counterparts. The
other filesystem related system calls will be sent out for review
separately, as they are a bit more complex to get right.
2015-07-15 09:14:06 +00:00
Ed Schouten
707d98fe2f Implement the CloudABI random_get() system call.
The random_get() system call works similar to getentropy()/getrandom()
on OpenBSD/Linux. It fills a buffer with random data.

This change introduces a new function, read_random_uio(), that is used
to implement read() on the random devices. We can call into this
function from within the CloudABI compatibility layer.

Approved by:	secteam
Reviewed by:	jmg, markm, wblock
Obtained from:	https://github.com/NuxiNL/freebsd
Differential Revision:	https://reviews.freebsd.org/D3053
2015-07-14 18:45:15 +00:00
Ed Schouten
460ac6370a Regenerate system call table for r285540. 2015-07-14 15:12:24 +00:00
Ed Schouten
1eb7c7cae3 Implement thread_tcb_set() and thread_yield().
The first system call is used to set the user TLS address. Right now
this system call is invoked by the C library for both the initial thread
and additional threads unconditionally, but in the future we'll only
call this if the architecture does not support this. On recent x86-64
CPUs we could use the WRFSBASE instruction.

This system call was erroneously placed in sys/compat/cloudabi64, even
though it does not depend on any pointer size dependent datastructure.
Move it to the right place.

Obtained from:	https://github.com/NuxiNL/freebsd
2015-07-14 15:11:50 +00:00
Ed Schouten
03744d7c8d Implement {,p}{read,write}{,v}().
Add a routine similar to copyinuio() and freebsd32_copyinuio() that
copies in CloudABI's struct iovecs. These are then translated into
FreeBSD format and placed in a 'struct uio', so we can call into the
kern_*() functions.

Obtained from:	https://github.com/NuxiNL/freebsd
2015-07-14 14:33:21 +00:00
Ed Schouten
f9675092b8 Let proc_raise() call into pksignal() directly.
Summary:
As discussed with kib@ in response to r285404, don't call into
kern_sigaction() within proc_raise() to reset the signal to the default
action before delivery. We'd better do that during image execution.

Change the code to simply use pksignal(), so we don't waste cycles on
functions like pfind() to look up the currently running process itself.

Test Plan:
This change has also been pushed into the cloudabi branch on GitHub. The
raise() tests still seem to pass.

Reviewers: kib

Reviewed By: kib

Subscribers: imp

Differential Revision: https://reviews.freebsd.org/D3076
2015-07-14 12:16:14 +00:00
Ed Schouten
4f1905177a Implement normal and abnormal process termination.
CloudABI does not provide an explicit kill() system call, for the reason
that there is no access to the global process namespace. Instead, it
offers a raise() system call that can at least be used to terminate the
process abnormally.

CloudABI does not support installing signal handlers. CloudABI's raise()
system call should behave as if the default policy is set up. Call into
kern_sigaction(SIG_DFL) before calling sys_kill() to force this.

Obtained from:	https://github.com/NuxiNL/freebsd
2015-07-11 19:41:31 +00:00
Ed Schouten
a4001f4cb9 Use FDDUP_NORMAL instead of hardcoding value 0.
Proposed by:	mjg
2015-07-11 18:53:30 +00:00
Ed Schouten
329d1bca7f Add missing function parameter.
A function parameter got added in r285356, meaning that the call to
kern_dup() needs to be patched up.
2015-07-11 18:39:16 +00:00
Mateusz Guzik
b34be824a0 linprocfs: vref the vnode passed to vn_fullpath 2015-07-11 16:44:28 +00:00
Mateusz Guzik
8a08cec166 Create a dedicated function for ensuring that cdir and rdir are populated.
Previously several places were doing it on its own, partially
incorrectly (e.g. without the filedesc locked) or even actively harmful
by populating jdir or assigning rootvnode without vrefing it.

Reviewed by:	kib
2015-07-11 16:22:48 +00:00
Mateusz Guzik
f0725a8e1e Move chdir/chroot-related fdp manipulation to kern_descrip.c
Prefix exported functions with pwd_.

Deduplicate some code by adding a helper for setting fd_cdir.

Reviewed by:	kib
2015-07-11 16:19:11 +00:00
Adrian Chadd
871ef8b0d8 Regenerate syscalls. 2015-07-11 15:22:11 +00:00
Mateusz Guzik
5fe97c20dc fd: split kern_dup flags argument into actual flags and a mode
Tidy up the code inside to switch on the mode.
2015-07-10 11:01:30 +00:00
Ed Schouten
2491302a04 Add implementations for some of the CloudABI file descriptor system calls.
All of the CloudABI system calls that operate on file descriptors of an
arbitrary type are prefixed with fd_. This change adds wrappers for
most of these system calls around their FreeBSD equivalents.

The dup2() system call present on CloudABI deviates from POSIX, in the
sense that it can only be used to replace existing file descriptor. It
cannot be used to create new ones. The reason for this is that this is
inherently thread-unsafe. Furthermore, there is no need on CloudABI to
use fixed file descriptor numbers. File descriptors 0, 1 and 2 have no
special meaning.

This change exposes the kern_dup() through <sys/syscallsubr.h> and puts
the FDDUP_* flags in <sys/filedesc.h>. It then adds a new flag,
FDDUP_MUSTREPLACE to force that file descriptors are replaced -- not
allocated.

Differential Revision:	https://reviews.freebsd.org/D3035
Reviewed by:	mjg
2015-07-09 16:07:01 +00:00
Ed Schouten
f355e810cf Generate CloudABI system call table with proper $FreeBSD$ tags. 2015-07-09 07:21:33 +00:00
Ed Schouten
6d338f9a81 Import the CloudABI datatypes and create a system call table.
CloudABI is a pure capability-based runtime environment for UNIX. It
works similar to Capsicum, except that processes already run in
capabilities mode on startup. All functionality that conflicts with this
model has been omitted, making it a compact binary interface that can be
supported by other operating systems without too much effort.

CloudABI is 'secure by default'; the idea is that it should be safe to
run arbitrary third-party binaries without requiring any explicit
hardware virtualization (Bhyve) or namespace virtualization (Jails). The
rights of an application are purely determined by the set of file
descriptors that you grant it on startup.

The datatypes and constants used by CloudABI's C library (cloudlibc) are
defined in separate files called syscalldefs_mi.h (pointer size
independent) and syscalldefs_md.h (pointer size dependent). We import
these files in sys/contrib/cloudabi and wrap around them in
cloudabi*_syscalldefs.h.

We then add stubs for all of the system calls in sys/compat/cloudabi or
sys/compat/cloudabi64, depending on whether the system call depends on
the pointer size. We only have nine system calls that depend on the
pointer size. If we ever want to support 32-bit binaries, we can simply
add sys/compat/cloudabi32 and implement these nine system calls again.

The next step is to send in code reviews for the individual system call
implementations, but also add a sysentvec, to allow CloudABI executabled
to be started through execve().

More information about CloudABI:
- GitHub: https://github.com/NuxiNL/cloudlibc
- Talk at BSDCan: https://www.youtube.com/watch?v=SVdF84x1EdA

Differential Revision:	https://reviews.freebsd.org/D2848
Reviewed by:	emaste, brooks
Obtained from:	https://github.com/NuxiNL/freebsd
2015-07-09 07:20:15 +00:00
Mateusz Guzik
f131759f54 fd: make 'rights' a manadatory argument to fget* functions 2015-07-05 19:05:16 +00:00
Konstantin Belousov
2a4734651c svr4 emulator has custom sendsig() implementation, it does not use
sv_sigtbl.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-06-29 10:33:04 +00:00
Dmitry Chagin
3c91646b46 Add EPOLLRDHUP support.
Tested by:	abi at abinet dot ru
2015-06-20 05:40:35 +00:00
Mateusz Guzik
4da8456f0a Replace struct filedesc argument in getvnode with struct thread
This is is a step towards removal of spurious arguments.
2015-06-16 13:09:18 +00:00
Mateusz Guzik
9ef8328d52 fd: make rights a mandatory argument to fget_unlocked 2015-06-16 09:52:36 +00:00
Mateusz Guzik
6871c7c3f1 linux: make sure to grab all cow structs when creating a thread
This is a fixup for r284214.

Reported and tested by: Ivan Klymenko <fidaj ukr.net>
2015-06-10 15:34:43 +00:00
Mateusz Guzik
f6f6d24062 Implement lockless resource limits.
Use the same scheme implemented to manage credentials.

Code needing to look at process's credentials (as opposed to thred's) is
provided with *_proc variants of relevant functions.

Places which possibly had to take the proc lock anyway still use the proc
pointer to access limits.
2015-06-10 10:48:12 +00:00
Jung-uk Kim
1a01bdf906 Properly initialize flags for accept4(2) not to return spurious EINVAL.
Note this fixes a Linuxulator regression introduced in r283490.

PR:		200662
2015-06-08 20:03:15 +00:00
Dmitry Chagin
32ba368ba9 Finish r283544. In exec case properly detach threads from user space
before suicide.
2015-06-06 06:12:14 +00:00
Eric van Gyzen
63e4c6cdf9 Provide vnode in memory map info for files on tmpfs
When providing memory map information to userland, populate the vnode pointer
for tmpfs files.  Set the memory mapping to appear as a vnode type, to match
FreeBSD 9 behavior.

This fixes the use of tmpfs files with the dtrace pid provider,
procstat -v, procfs, linprocfs, pmc (pmcstat), and ptrace (PT_VM_ENTRY).

Submitted by:   Eric Badger <eric@badgerio.us> (initial revision)
Obtained from:  Dell Inc.
PR:             198431
MFC after:      2 weeks
Reviewed by:    jhb
Approved by:    kib (mentor)
2015-06-02 18:37:04 +00:00
Dmitry Chagin
d707582f83 When I merged the lemul branch I missied kib@'s r282708 commit.
This is not the final fix as I need properly cleanup thread resources
before other threads suicide.

Tested by:	Ruslan Makhmatkhanov
2015-05-25 20:44:46 +00:00
Dmitry Chagin
5c2748d5e7 Linux nanosleep() and clock_nanosleep() system calls always
writes the remaining time into the structure pointed to by rmtp
unless rmtp is NULL. The value of *rmtp can then be used to call
nanosleep() again and complete the specified pause if the previous
call was interrupted.

Note. clock_nanosleep() with an absolute time value does not write
the remaining time.

While here fix whitespaces and typo in SDT_PROBE.
2015-05-24 18:14:38 +00:00
Dmitry Chagin
bbf392d5ef Convert SCM_TIMESTAMP in recvmsg(). 2015-05-24 18:13:21 +00:00
Dmitry Chagin
5989b75bdb The latest cp tool is trying to use the btrfs clone operation that is
implemented via ioctl interface. First of all return ENOTSUP for this
operation as a cp fallback to usual method in that case. Secondly, do
not print out the message about unimplemented operation.
2015-05-24 18:12:04 +00:00
Dmitry Chagin
4f65e9cff4 Fix an mbuf(9) leak in sendmsg() under failure condition and
remove unneeded check for failed M_WAITOK allocation.

Found by: Brainy Code Scanner
Reported by: Maxime Villard
2015-05-24 18:10:07 +00:00
Dmitry Chagin
9802eb9ebc Implement Linux specific syncfs() system call. 2015-05-24 18:08:01 +00:00
Dmitry Chagin
d9cbe8f0ef Properly check tv_nsec value. The tv_nsec field can also be one
of the special value UTIME_NOW or UTIME_OMIT.
2015-05-24 18:06:46 +00:00
Dmitry Chagin
4cf10e2934 Since FreeBSD supports SOCK_CLOEXEC & SOCK_NONBLOCK options
remove its emulation via fcntl call from Linuxulator.
2015-05-24 18:06:12 +00:00
Dmitry Chagin
e1ff74c0f7 Implement recvmmsg() and sendmmsg() system calls. 2015-05-24 18:04:04 +00:00
Dmitry Chagin
b7aaa9fdb0 Reduce duplication between MD Linux code by moving msg related
struct definitions out into the compat/linux/linux_socket.h
2015-05-24 18:03:14 +00:00
Dmitry Chagin
6e4c8004dc Implement epoll_pwait() system call. 2015-05-24 18:00:14 +00:00
Dmitry Chagin
b7c4ebdb56 Convert signal number to native for VT_SETMODE ioctl and remove
strange and invalid ISSIGVALID macro.
The code has not been tested right way but it was originally broken.
2015-05-24 17:59:17 +00:00
Dmitry Chagin
19d8b461f4 Add utimensat() system call.
The patch developed by Jilles Tjoelker and Andrew Wilcox and
adopted for lemul branch by me.
2015-05-24 17:57:07 +00:00
Dmitry Chagin
dcc0e6c493 Simplify linprocfs_doprocenviron(). Remove extra proc visibility checks
and initialize pn_vis by well known procfs_candebug().
2015-05-24 17:53:48 +00:00
Dmitry Chagin
5885e5ab29 Convert Linux signal number to the FreeBSD. 2015-05-24 17:49:09 +00:00
Dmitry Chagin
94c0ee30b4 Convert Linux sigsets before showing.
Linux kernel displays sigset always as 16x4 bit mask.
2015-05-24 17:48:34 +00:00
Dmitry Chagin
4ab7403bbd Rework signal code to allow using it by other modules, like linprocfs:
1. Linux sigset always 64 bit on all platforms. In order to move Linux
sigset code to the linux_common module define it as 64 bit int. Move
Linux sigset manipulation routines to the MI path.

2. Move Linux signal number definitions to the MI path. In general, they
are the same on all platforms except for a few signals.

3. Map Linux RT signals to the FreeBSD RT signals and hide signal conversion
tables to avoid conversion errors.

4. Emulate Linux SIGPWR signal via FreeBSD SIGRTMIN signal which is outside
of allowed on Linux signal numbers.

PR:		197216
2015-05-24 17:47:20 +00:00
Dmitry Chagin
a6fd8bb2bb Add support for /proc/<pid>/auxv. 2015-05-24 17:46:04 +00:00
Dmitry Chagin
ffefd5707d Add vdso and stack names to the /proc/self/maps. 2015-05-24 17:44:42 +00:00
Dmitry Chagin
a7ac457613 According to Linux man sigaltstack(3) shall return EINVAL if the ss
argument is not a null pointer, and the ss_flags member pointed to by ss
contains flags other than SS_DISABLE. However, in fact, Linux also
allows SS_ONSTACK flag which is simply ignored.

For buggy apps (at least mono) ignore other than SS_DISABLE
flags as a Linux do.

While here move MI part of sigaltstack code to the appropriate place.

Reported by:	abi at abinet dot ru
2015-05-24 17:44:08 +00:00
Dmitry Chagin
76672e1113 Add EPOLLERR flag handling to epoll.
Tested by:	abi at abinet dot ru
2015-05-24 17:42:45 +00:00
Dmitry Chagin
e2ff4b9864 As fo_fill_kinfo() does not check fo_fill_kinfo to NULL
add a fo_fill_kinfo op to eventfdops.

Reported by:	trinity
2015-05-24 17:40:14 +00:00
Dmitry Chagin
b6aeb7d5dd Add preliminary fallocate system call implementation
to emulate posix_fallocate() function.

Differential Revision:	https://reviews.freebsd.org/D1523
Reviewed by:	emaste
2015-05-24 17:33:21 +00:00
Dmitry Chagin
16ac71bc4f Delete the duplicate of linux_to_native_clockid() function.
Differential Revision:	https://reviews.freebsd.org/D1521
Reviewed by:	trasz
2015-05-24 17:30:31 +00:00
Dmitry Chagin
680982281b Do not use struct l_timespec without conversion. While here move
args->timeout handling before acquiring the futex key at FUTEX_WAIT path.

Differential Revision:	https://reviews.freebsd.org/D1520
Reviewed by:	trasz
2015-05-24 17:29:18 +00:00
Dmitry Chagin
7e947ccc81 Add prototypes for static futex functions.
Differential Revision:	https://reviews.freebsd.org/D1519
Reviewed by:	trasz
2015-05-24 17:27:59 +00:00
Dmitry Chagin
2166e4e0a5 As for now our tmpfs is no longer being considered
"highly experimental" remove /dev/shm magic commited
in r218497 and convert tmpfs type to an expected magic number.

Differential Revision:	https://reviews.freebsd.org/D1497
Reviewed by:	emaste, trasz
2015-05-24 17:26:58 +00:00
Dmitry Chagin
5dd1d097f8 Print out unsupported futex operation message only once for the process.
Differential Revision:	https://reviews.freebsd.org/D1498
2015-05-24 17:25:57 +00:00
Dmitry Chagin
2711aba97e Add some clock mappings used in glibc 2.20.
Differential Revision:	https://reviews.freebsd.org/D1465
Reviewd by:	trasz
2015-05-24 17:23:08 +00:00
Dmitry Chagin
7d96520b25 Improve ktr(9) records in thread managment code.
Differential Revision:	https://reviews.freebsd.org/D1464
Reviewed by:	trasz
2015-05-24 17:09:07 +00:00
Dmitry Chagin
68cf0367e9 Use local struct proc * varable instead of dereferencing td->td_proc.
Differential Revision:	https://reviews.freebsd.org/D1463
Reviewed by:	emaste
2015-05-24 17:08:25 +00:00
Dmitry Chagin
97cfa5c899 Avoid unnecessary em zeroing in non-exec path
as it already zeroed by malloc with M_ZERO flag
and move zeroing to the proper place in exec path.

Differential Revision:	https://reviews.freebsd.org/D1462
Reviewed by:	trasz
2015-05-24 17:07:10 +00:00
Dmitry Chagin
e0327ddba0 Remove the unnecessary cast.
Differential Revision:	https://reviews.freebsd.org/D1461
Reviewed by:	emaste
2015-05-24 17:05:59 +00:00
Dmitry Chagin
a6b40812ec Implement ppoll() system call.
Differential Revision:	https://reviews.freebsd.org/D1105
Reviewed by:	trasz
2015-05-24 16:59:25 +00:00
Dmitry Chagin
3d7b4b3720 td_sigmask of a newly created thread copied from td.
Remove excess initialization of td_sigmask.

Differential Revision:	https://reviews.freebsd.org/D1128
Reviewed by:	emaste
2015-05-24 16:56:32 +00:00
Dmitry Chagin
2c4f134b25 Update Linux compat revision to 32.
Differential Revision:	https://reviews.freebsd.org/D1122
Reviewed by:	emaste
2015-05-24 16:55:32 +00:00
Dmitry Chagin
520e9c187d Fix linux_common module build with KTR option.
Differential Revision:	https://reviews.freebsd.org/D1096
Reviewed by:	trasz
2015-05-24 16:52:45 +00:00
Dmitry Chagin
a31d76867d Implement eventfd system call.
Differential Revision:	https://reviews.freebsd.org/D1094
In collaboration with:	Jilles Tjoelker
2015-05-24 16:49:14 +00:00
Dmitry Chagin
3e89b64168 Put the correct value for the abi_nfdbits parameter of kern_select() for
all supported Linuxulators.

Differential Revision:	https://reviews.freebsd.org/D1093
Reviewed by:	trasz
2015-05-24 16:47:13 +00:00
Dmitry Chagin
e16fe1c730 Implement epoll family system calls. This is a tiny wrapper
around kqueue() to implement epoll subset of functionality.
The kqueue user data are 32bit on i386 which is not enough for
epoll user data, so we keep user data in the proc emuldata.

Initial patch developed by rdivacky@ in 2007, then extended
by Yuri Victorovich @ r255672 and finished by me
in collaboration with mjg@ and jillies@.

Differential Revision:	https://reviews.freebsd.org/D1092
2015-05-24 16:41:39 +00:00
Dmitry Chagin
d2b6dbc06f Implement F_DUPFD_CLOEXEC fcntl flag.
Differential Revision:	https://reviews.freebsd.org/D1089
Reviewed by:	trasz
2015-05-24 16:34:57 +00:00
Dmitry Chagin
bfa4d74baf Add several fcntl flags.
Differential Revision:	https://reviews.freebsd.org/D1088
Reviewed by:	trasz
2015-05-24 16:32:52 +00:00
Dmitry Chagin
4d0f380d87 To avoid code duplication move open/fcntl definitions to the MI
header file.

Differential Revision:	https://reviews.freebsd.org/D1087
Reviewed by:	trasz
2015-05-24 16:31:44 +00:00
Dmitry Chagin
26c68e1fe5 Use the BSD_TO_LINUX_SIGNAL() wherever there is no need
to check the ABI as it is known.

Differential Revision:	https://reviews.freebsd.org/D1086
2015-05-24 16:30:23 +00:00
Dmitry Chagin
2245df381a Convert Linux wait options to the FreeBSD.
Check wait options as a Linux do.
Linux always set WEXITED option not a WUNTRACED|WNOHANG
which is a strange bug.

Differential Revision:	https://reviews.freebsd.org/D1085
Reviewed by:	trasz
2015-05-24 16:28:58 +00:00
Dmitry Chagin
7a7a6efc25 Set WIFCONTINUED to the wait status if needed.
Differential Revision:	https://reviews.freebsd.org/D1083
Reviewed by:	trasz
2015-05-24 16:27:38 +00:00
Dmitry Chagin
9599b0ec3a Rewrite linux_recvfrom. To avoid double conversion of sockaddr use
kern_recvit() directly.
And check fromlen parameter before sockaddr copyin and conversion.

Differential Revision:	https://reviews.freebsd.org/D1082
2015-05-24 16:26:55 +00:00
Dmitry Chagin
4048f59cd0 Add AT_RANDOM and AT_EXECFN auxiliary vector entries which are used by
glibc. At list since glibc version 2.16 using AT_RANDOM is mandatory.

Differential Revision:	https://reviews.freebsd.org/D1080
2015-05-24 16:24:24 +00:00
Dmitry Chagin
baa232bbfd Change linux faccessat syscall definition to match actual linux one.
The AT_EACCESS and AT_SYMLINK_NOFOLLOW flags are actually implemented
within the glibc wrapper function for faccessat().  If either of these
flags are specified, then the wrapper function employs fstatat() to
determine access permissions.

Differential Revision:	https://reviews.freebsd.org/D1078
Reviewed by:	trasz
2015-05-24 16:18:03 +00:00
Dmitry Chagin
e0d3ea8c65 Where possible we will use M_LINUX malloc(9) type.
Move M_FUTEX defines to the linux_common.ko.

Differential Revision:	https://reviews.freebsd.org/D1077
Reviewed by:	emaste
2015-05-24 16:14:41 +00:00
Dmitry Chagin
0edc82b564 Move FEATURE macros for v4l and v4l2 to the common module.
Differential Revision:	https://reviews.freebsd.org/D1075
Reviewed by:	emaste
2015-05-24 16:00:01 +00:00
Dmitry Chagin
bc27367760 Refund the proc emuldata struct for future use. For now move flags from
thread emuldata to proc emuldata as it was originally intended.

As we can have both 64 & 32 bit Linuxulator running any eventhandler
can be called twice for us. To prevent this move eventhandlers code
from linux_emul.c to the linux_common.ko module.

Differential Revision:	https://reviews.freebsd.org/D1073
2015-05-24 15:54:58 +00:00
Dmitry Chagin
67d3974849 Introduce a new module linux_common.ko which is intended for the
following primary purposes:

1. Remove the dependency of linsysfs and linprocfs modules from linux.ko,
which will be architecture specific on amd64.

2. Incorporate into linux_common.ko general code for platforms on which
we'll support two Linuxulator modules (for both instruction set - 32 & 64 bit).

3. Move malloc(9) declaration to linux_common.ko, to enable getting memory
usage statistics properly.

Currently linux_common.ko incorporates a code from linux_mib.c and linux_util.c
and linprocfs, linsysfs and linux kernel modules depend on linux_common.ko.

Temporarily remove dtrace garbage from linux_mib.c and linux_util.c

Differential Revision:	https://reviews.freebsd.org/D1072
In collaboration with:	Vassilis Laganakos.

Reviewed by:	trasz
2015-05-24 15:51:18 +00:00
Dmitry Chagin
606bcc1741 Add newfstatat system call for 64-bit Linuxulator.
Differential Revision:	https://reviews.freebsd.org/D1071
Reviewed by:	trasz
2015-05-24 15:48:34 +00:00
Dmitry Chagin
4ca75bed31 Fix compilation with -DDEBUG option.
Differential Revision:	https://reviews.freebsd.org/D1070
Reviewed by:	trasz
2015-05-24 15:47:15 +00:00
Dmitry Chagin
36204c3016 Add 64 bit support to the vdso.
Differential Revision:	https://reviews.freebsd.org/D1069
Reviewed by:	trasz
2015-05-24 15:45:36 +00:00
Dmitry Chagin
31eb438886 x86_64 Linux do not use multiplexing on ipc system calls.
Move struct ipc_perm definition to the MD path as it differs for 64 and
32 bit platform.

Differential Revision:	https://reviews.freebsd.org/D1068
Reviewed by:	trasz
2015-05-24 15:44:41 +00:00
Dmitry Chagin
7f8f1d7f7a Disable i386 call for x86-64 Linux.
Differential Revision:	https://reviews.freebsd.org/D1067
Reviewed by:	trasz
2015-05-24 15:43:53 +00:00
Dmitry Chagin
0ed687fa2e Print out proper procmap entry for 64 bit binaries.
Differential Revision:	https://reviews.freebsd.org/D1066
Reviewed by:	trasz
2015-05-24 15:42:36 +00:00
Dmitry Chagin
a12b9b3d96 64-bit paltforms, like x86_64, do not use multiplexing on
socketcall system calls.

Differential Revision:	https://reviews.freebsd.org/D1065
Reviewed by:	trasz
2015-05-24 15:41:27 +00:00
Dmitry Chagin
297f61cc01 Get ready to commit x86_64 Linux emulation.
All fields of type l_int in struct statfs are defined
as l_long on i386 and amd64.

Differential Revision:	https://reviews.freebsd.org/D1064
Reviewed by:	trasz
2015-05-24 15:39:08 +00:00
Dmitry Chagin
0020bdf13a Put linux_platform into the vdso to avoid copying it onto the stack at
every exec.

Differential Revision:	https://reviews.freebsd.org/D1062
Reviewed by:	trasz
2015-05-24 15:30:52 +00:00
Dmitry Chagin
bdc379344a Implement vdso - virtual dynamic shared object. Through vdso Linux
exposes functions from kernel with proper DWARF CFI information so that
it becomes easier to unwind through them.
Using vdso is a mandatory for a thread cancelation && cleanup
on a modern glibc.

Differential Revision:	https://reviews.freebsd.org/D1060
2015-05-24 15:28:17 +00:00
Dmitry Chagin
ae50b4d7b5 Implement pselect6() system call.
Differential Revision:	https://reviews.freebsd.org/D1051
Reviewed by:	trasz
2015-05-24 15:21:25 +00:00
Dmitry Chagin
c3978c7bb1 Implement prlimit64() system call.
Differential Revision:	https://reviews.freebsd.org/D1050
Reviewed by:	emaste, trasz
2015-05-24 15:18:19 +00:00
Dmitry Chagin
254a937ee5 Implement dup3() system call.
Differential Revision:	https://reviews.freebsd.org/D1049
Reviewed by:	emaste
2015-05-24 15:14:51 +00:00
Dmitry Chagin
44e93b234f Sched_rr_get_interval returns EINVAL in case when the invalid pid
specified. This silence the ltp tests.

Differential Revision:	https://reviews.freebsd.org/D1048
Reviewed by:	trasz
2015-05-24 15:13:56 +00:00
Dmitry Chagin
7ac9766db4 Implement rt_sigqueueinfo() system call.
Differential Revision:	https://reviews.freebsd.org/D1047
Reviewed by:	trasz
2015-05-24 15:11:32 +00:00
Dmitry Chagin
e5fe4ccf59 Implement waitid() system call.
Differential Revision:	https://reviews.freebsd.org/D1046
2015-05-24 15:06:39 +00:00
Dmitry Chagin
001398c4c5 To reduce code duplication introduce linux_copyout_rusage() method.
Use it in linux_wait4() system call and move linux_wait4() to the MI path.
While here add a prototype for the static bsd_to_linux_rusage().

Differential Revision:	https://reviews.freebsd.org/D2138
Reviewed by:	trasz
2015-05-24 15:03:09 +00:00
Dmitry Chagin
a7ae3c557f Add a function for converting wait options.
Differential Revision:	https://reviews.freebsd.org/D1045
Reviewed by:	trasz
2015-05-24 15:00:27 +00:00
Dmitry Chagin
fe4ed1e768 Add a siginfo_t conversion function.
Differential Revision:	https://reviews.freebsd.org/D1044
Reviewed by:	emaste, trasz
2015-05-24 14:58:30 +00:00
Dmitry Chagin
86bda7a02d Remove a now unused define.
Differential Revision:	https://reviews.freebsd.org/D1043
Reviewed by:	trasz
2015-05-24 14:57:39 +00:00
Dmitry Chagin
a6326909bb Introduce LINUX_VERSION_STR, LINUX_VERSION_CODE macro for use instead
of harcoded pr_osrelease, pr_osrel values. This will be used later in
the VDSO.

Differential Revision:	https://reviews.freebsd.org/D1042
Reviewed by:	trasz
2015-05-24 14:56:21 +00:00
Dmitry Chagin
5e609834bd pthread_join() caller do futex_wait on child_clear_tid. As a results
of multiple simultaneous calls to pthread_join() specifying the same
target thread are undefined wake up the one thread.

Differential Revision:	https://reviews.freebsd.org/D1040
2015-05-24 14:54:12 +00:00
Dmitry Chagin
81338031c4 Switch linuxulator to use the native 1:1 threads.
The reasons:
1. Get rid of the stubs/quirks with process dethreading,
   process reparent when the process group leader exits and close
   to this problems on wait(), waitpid(), etc.
2. Reuse our kernel code instead of writing excessive thread
   managment routines in Linuxulator.

Implementation details:

1. The thread is created via kern_thr_new() in the clone() call with
   the CLONE_THREAD parameter. Thus, everything else is a process.
2. The test that the process has a threads is done via P_HADTHREADS
   bit p_flag of struct proc.
3. Per thread emulator state data structure is now located in the
   struct thread and freed in the thread_dtor() hook.
   Mandatory holdig of the p_mtx required when referencing emuldata
   from the other threads.
4. PID mangling has changed. Now Linux pid is the native tid
   and Linux tgid is the native pid, with the exception of the first
   thread in the process where tid and pid are one and the same.

Ugliness:

   In case when the Linux thread is the initial thread in the thread
   group thread id is equal to the process id. Glibc depends on this
   magic (assert in pthread_getattr_np.c). So for system calls that
   take thread id as a parameter we should use the special method
   to reference struct thread.

Differential Revision:	https://reviews.freebsd.org/D1039
2015-05-24 14:53:16 +00:00
Dmitry Chagin
91d1786f65 In preparation for switching linuxulator to the use the native 1:1
threads add a hook for cleaning thread resources before the thread die.

Differential Revision:	https://reviews.freebsd.org/D1038
2015-05-24 14:51:29 +00:00
Dmitry Chagin
2003907d45 Implement a Linux version of sched_getparam() && sched_setparam().
Temporarily use the first thread in proc.

Differential Revision:	https://reviews.freebsd.org/D1036
Reviewed by:	trasz
2015-05-24 14:45:57 +00:00
Dmitry Chagin
1aa90eca33 In preparation for switching linuxulator to the use the native 1:1
threads refactor kern_sched_rr_get_interval() and sys_sched_rr_get_interval().
Add a kern_sched_rr_get_interval() counterpart which takes a targettd
parameter to allow specify target thread directly by callee (new Linuxulator).

Linuxulator temporarily uses first thread in proc.

Move linux_sched_rr_get_interval() to the MI part.

Differential Revision:	https://reviews.freebsd.org/D1032
Reviewed by:	trasz
2015-05-24 14:39:26 +00:00
Dmitry Chagin
161acbb670 In preparation for switching linuxulator to the use the native 1:1
threads introduce linux_exit() stub instead of sys_exit() call
(which terminates process).
In the new linuxulator exit() system call terminates the calling
thread (not a whole process).

Differential Revision:	https://reviews.freebsd.org/D1027
Reviewed by:	trasz
2015-05-24 14:33:19 +00:00
Jung-uk Kim
fd90e2ed54 CALLOUT_MPSAFE has lost its meaning since r141428, i.e., for more than ten
years for head.  However, it is continuously misused as the mpsafe argument
for callout_init(9).  Deprecate the flag and clean up callout_init() calls
to make them more consistent.

Differential Revision:	https://reviews.freebsd.org/D2613
Reviewed by:	jhb
MFC after:	2 weeks
2015-05-22 17:05:21 +00:00
Konstantin Belousov
7b445033ff On exec, single-threading must be enforced before arguments space is
allocated from exec_map.  If many threads try to perform execve(2) in
parallel, the exec map is exhausted and some threads sleep
uninterruptible waiting for the map space.  Then, the thread which won
the race for the space allocation, cannot single-thread the process,
causing deadlock.

Reported and tested by:	pho (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-05-10 09:00:40 +00:00
Peter Wemm
76cd25496f Fix an error in r281551, part of the getfsstat() / kern_getfsstat()
rework.  The number of entries was supposed to be returned to the user,
not used as a scratch variable.

This broke RELENG_4 jails starting up on current systems.
2015-05-05 05:14:12 +00:00
Edward Tomasz Napierala
310e931198 Simplify linux_getcwd(), removing code that was longer used.
Differential Revision:	https://reviews.freebsd.org/D2326
Reviewed by:	dchagin@, kib@
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2015-04-23 08:41:50 +00:00
Edward Tomasz Napierala
6289b482ec Modify kern___getcwd() to take max pathlen limit as an additional
argument.  This will be used for the Linux emulation layer - for Linux,
PATH_MAX is 4096 and not 1024.

Differential Revision:	https://reviews.freebsd.org/D2335
Reviewed by:	kib@
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2015-04-21 13:55:24 +00:00
Edward Tomasz Napierala
565716e60e Add back fdrop() missed in r281726.
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2015-04-19 07:35:18 +00:00
Edward Tomasz Napierala
92f7441328 Optimize the O_NOCTTY handling hack in linux_common_open().
Differential Revision:	https://reviews.freebsd.org/D2323
Reviewed by:	kib@
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2015-04-19 07:12:16 +00:00
Edward Tomasz Napierala
94d014f079 Remove unused code from linux_mount(), and make it possible to mount
any kind of filesystem instead of harcoded three.

MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2015-04-18 09:49:09 +00:00
Edward Tomasz Napierala
1c73bcab8e Rewrite linprocfs_domtab() as a wrapper around kern_getfsstat(). This
adds missing jail and MAC checks.

Differential Revision:	https://reviews.freebsd.org/D2193
Reviewed by:	kib@
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2015-04-15 09:13:11 +00:00
Mateusz Guzik
90f54cbfeb fd: remove filedesc argument from fdclose
Just accept a thread instead. This makes it consistent with fdalloc.

No functional changes.
2015-04-11 15:40:28 +00:00
John Baldwin
dbee5c671a Move the 32-bit compatible procfs types from freebsd32.h to <sys/procfs.h>
and export them to userland.
- Define __HAVE_REG32 on platforms that define a reg32 structure and check
  for this in <sys/procfs.h> to control when to export prstatus32, etc.
- Add prstatus32_t and prpsinfo32_t typedefs for the 32-bit structures.
  libbfd looks for these types, and having them fixes 'gcore' in gdb of a
  32-bit process on a 64-bit platform.
- Use the structure definitions from <sys/procfs.h> in gcore's elf32 core
  dump code instead of duplicating the definitions.

Differential Revision:	https://reviews.freebsd.org/D2142
Reviewed by:	kib, nathanw (powerpc bits)
MFC after:	1 week
2015-04-08 16:30:45 +00:00
Edward Tomasz Napierala
67caead165 Remove unused code.
Differential Revision:	https://reviews.freebsd.org/D2195
Reviewed by:	kib@, imp@
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2015-04-02 10:19:24 +00:00
Mateusz Guzik
daf63fd2f9 cred: add proc_set_cred helper
The goal here is to provide one place altering process credentials.

This eases debugging and opens up posibilities to do additional work when such
an action is performed.
2015-03-16 00:10:03 +00:00
Jilles Tjoelker
2b35e6a9f2 Run make sysent. 2015-01-23 21:08:24 +00:00
Jilles Tjoelker
2205e0d1bd Add futimens and utimensat system calls.
The core kernel part is patch file utimes.2008.4.diff from
pluknet@FreeBSD.org. I updated the code for API changes, added the manual
page and added compatibility code for old kernels. There is also audit and
Capsicum support.

A new UTIME_* constant might allow setting birthtimes in future.

Differential Revision:	https://reviews.freebsd.org/D1426
Submitted by:	pluknet (partially)
Reviewed by:	delphij, pluknet, rwatson
Relnotes:	yes
2015-01-23 21:07:08 +00:00
Konstantin Belousov
677258f7e7 Add procctl(2) PROC_TRACE_CTL command to enable or disable debugger
attachment to the process.  Note that the command is not intended to
be a security measure, rather it is an obfuscation feature,
implemented for parity with other operating systems.

Discussed with:	jilles, rwatson
Man page fixes by:	rwatson
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-01-18 15:13:11 +00:00
Konstantin Belousov
b53fc49cd4 fcntl F_O{GET,SET}LK take pointer as the arg, handle them properly for
compat32.

Reported and tested by:	Alex Tutubalin <lexa@lexa.ru>
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-01-15 10:43:58 +00:00
Dmitry Chagin
1beb1a8e13 Regen for r276654 (__getcwd()). 2015-01-04 10:40:23 +00:00
Dmitry Chagin
9f7a06f27e Indeed, instead of hiding the kern___getcwd() bug by bogus cast
in r276564, change path type to char * (pathnames are always char *).
And remove bogus casts of malloc().
kern___getcwd() internally doesn't actually use or support u_char *
paths, except to copy them to a normal char * path.

These changes are not visible to libc as libc/gen/getcwd.c misdeclares
__getcwd() as taking a plain char * path.

While here remove _SYS_SYSPROTO_H_ for __getcwd() syscall as
we always have sysproto.h.

Pointed out by:	bde

MFC after:	1 week
2015-01-04 10:34:02 +00:00
Dmitry Chagin
9fa04b52ec Cast *path to silence clang -Wpointer-sign warning.
MFC after:	1 week
2015-01-02 19:29:32 +00:00
Dmitry Chagin
de90b09a79 Remove Giant from linux_getcwd() due to VFS is MPSAFE now.
Discussed with:	kib
MFC after:	1 week
2015-01-02 18:36:08 +00:00
Dmitry Chagin
857ad5a31b Fix Clang -Wpointer-sign warnings.
MFC after:	1 week
2015-01-01 20:53:38 +00:00
Dmitry Chagin
5072ad67ae Fix Clang warning: passing 'unsigned int *' to parameter of type 'int *' converts between pointers to integer types with different sign.
MFC after:	1 week
2015-01-01 19:57:24 +00:00
Gleb Kurtsou
dde58752db Adjust printf format specifiers for dev_t and ino_t in kernel.
ino_t and dev_t are about to become uint64_t.

Reviewed by:	kib, mckusick
2014-12-17 07:27:19 +00:00
Konstantin Belousov
237623b028 Add a facility for non-init process to declare itself the reaper of
the orphaned descendants.  Base of the API is modelled after the same
feature from the DragonFlyBSD.

Requested by:	bapt
Reviewed by:	jilles (previous version)
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
2014-12-15 12:01:42 +00:00
Konstantin Belousov
5c7bebf961 The process spin lock currently has the following distinct uses:
- Threads lifetime cycle, in particular, counting of the threads in
  the process, and interlocking with process mutex and thread lock.
  The main reason of this is that turnstile locks are after thread
  locks, so you e.g. cannot unlock blockable mutex (think process
  mutex) while owning thread lock.

- Virtual and profiling itimers, since the timers activation is done
  from the clock interrupt context.  Replace the p_slock by p_itimmtx
  and PROC_ITIMLOCK().

- Profiling code (profil(2)), for similar reason.  Replace the p_slock
  by p_profmtx and PROC_PROFLOCK().

- Resource usage accounting.  Need for the spinlock there is subtle,
  my understanding is that spinlock blocks context switching for the
  current thread, which prevents td_runtime and similar fields from
  changing (updates are done at the mi_switch()).  Replace the p_slock
  by p_statmtx and PROC_STATLOCK().

The split is done mostly for code clarity, and should not affect
scalability.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-11-26 14:10:00 +00:00
John Baldwin
180e57e5c7 Improve support for XSAVE with debuggers.
- Dump an NT_X86_XSTATE note if XSAVE is in use. This note is designed
  to match what Linux does in that 1) it dumps the entire XSAVE area
  including the fxsave state, and 2) it stashes a copy of the current
  xsave mask in the unused padding between the fxsave state and the
  xstate header at the same location used by Linux.
- Teach readelf() to recognize NT_X86_XSTATE notes.
- Change PT_GET/SETXSTATE to take the entire XSAVE state instead of
  only the extra portion. This avoids having to always make two
  ptrace() calls to get or set the full XSAVE state.
- Add a PT_GET_XSTATE_INFO which returns the length of the current
  XSTATE save area (so the size of the buffer needed for PT_GETXSTATE)
  and the current XSAVE mask (%xcr0).

Differential Revision:	https://reviews.freebsd.org/D1193
Reviewed by:	kib
MFC after:	2 weeks
2014-11-21 20:53:17 +00:00
Konstantin Belousov
6e646651d3 Remove the no-at variants of the kern_xx() syscall helpers. E.g., we
have both kern_open() and kern_openat(); change the callers to use
kern_openat().

This removes one (sometimes two) levels of indirection and
consolidates arguments checks.

Reviewed by:	mckusick
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-11-13 18:01:51 +00:00
Dmitry Chagin
c28d9d0f9f Regen for r274462. 2014-11-13 05:28:06 +00:00
Dmitry Chagin
186d9c3473 Add the ppoll() system call.
Export kern_poll() needed by an upcoming Linuxulator change.

Differential Revision:	https://reviews.freebsd.org/D1133
Reviewed by:	kib, wblock
MFC after:	1 month
2014-11-13 05:26:14 +00:00
Gleb Smirnoff
efe28398f5 Fix build. 2014-11-11 22:08:18 +00:00
Gleb Smirnoff
0e87b36eaa Remove SF_KQUEUE code. This code was developed at Netflix, but was not
ever used.  It didn't go into stable/10, neither was documented.
It might be useful, but we collectively decided to remove it, rather
leave it abandoned and unmaintained.  It is removed in one single
commit, so restoring it should be easy, if anyone wants to reopen
this idea.

Sponsored by:	Netflix
2014-11-11 20:32:46 +00:00
Warner Losh
2736ae9f8c These don't belong in the modules directory. 2014-11-06 16:52:51 +00:00
Konstantin Belousov
0a2c94b86e Replace some calls to fuword() by fueword() with proper error checking.
Sponsored by:	The FreeBSD Foundation
Tested by:	pho
MFC after:	3 weeks
2014-10-28 15:28:20 +00:00
Mateusz Guzik
e015b1ab0a Avoid dynamic syscall overhead for statically compiled modules.
The kernel tracks syscall users so that modules can safely unregister them.

But if the module is not unloadable or was compiled into the kernel, there is
no need to do this.

Achieve this by adding SY_THR_STATIC_KLD macro which expands to SY_THR_STATIC
during kernel build and 0 otherwise.

Reviewed by:	kib (previous version)
MFC after:	2 weeks
2014-10-26 19:42:44 +00:00
Hans Petter Selasky
f0188618f2 Fix multiple incorrect SYSCTL arguments in the kernel:
- Wrong integer type was specified.

- Wrong or missing "access" specifier. The "access" specifier
sometimes included the SYSCTL type, which it should not, except for
procedural SYSCTL nodes.

- Logical OR where binary OR was expected.

- Properly assert the "access" argument passed to all SYSCTL macros,
using the CTASSERT macro. This applies to both static- and dynamically
created SYSCTLs.

- Properly assert the the data type for both static and dynamic
SYSCTLs. In the case of static SYSCTLs we only assert that the data
pointed to by the SYSCTL data pointer has the correct size, hence
there is no easy way to assert types in the C language outside a
C-function.

- Rewrote some code which doesn't pass a constant "access" specifier
when creating dynamic SYSCTL nodes, which is now a requirement.

- Updated "EXAMPLES" section in SYSCTL manual page.

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2014-10-21 07:31:21 +00:00
Adrian Chadd
e77f9fed15 Update the ULE scheduler + thread and kinfo structs to use int for cpuid
rather than u_char.

To try and play nice with the ABI, the u_char CPU ID values are clamped
at 254.  The new fields now contain the full CPU ID, or -1 for no cpu.

Differential Revision:	D955
Reviewed by:	jhb, kib
Sponsored by:	Norse Corp, Inc.
2014-10-18 19:36:11 +00:00
Marcel Moolenaar
2e7634503e Regenerate after r272823:
Move the SCTP syscalls to netinet with the rest of the SCTP code.

Submitted by:	Steve Kiernan <stevek@juniper.net>
Reviewed by:	tuexen, rrs
Obtained from:	Juniper Networks, Inc.
2014-10-09 15:19:35 +00:00
Marcel Moolenaar
80b47aefa1 Move the SCTP syscalls to netinet with the rest of the SCTP code. The
syscalls themselves are tightly coupled with the network stack and
therefore should not be in the generic socket code.

The following four syscalls have been marked as NOSTD so they can be
dynamically registered in sctp_syscalls_init() function:
  sys_sctp_peeloff
  sys_sctp_generic_sendmsg
  sys_sctp_generic_sendmsg_iov
  sys_sctp_generic_recvmsg

The syscalls are also set up to be dynamically registered when COMPAT32
option is configured.

As a side effect of moving the SCTP syscalls, getsock_cap needs to be
made available outside of the uipc_syscalls.c source file.  A proper
prototype has been added to the sys/socketvar.h header file.

API tests from the SCTP reference implementation have been run to ensure
compatibility. (http://code.google.com/p/sctp-refimpl/source/checkout)

Submitted by:	Steve Kiernan <stevek@juniper.net>
Reviewed by:	tuexen, rrs
Obtained from:	Juniper Networks, Inc.
2014-10-09 15:16:52 +00:00
Konstantin Belousov
f69261f2f9 Fix fcntl(2) compat32 after r270691. The copyin and copyout of the
struct flock are done in the sys_fcntl(), which mean that compat32 used
direct access to userland pointers.

Move code from sys_fcntl() to new wrapper, kern_fcntl_freebsd(), which
performs neccessary userland memory accesses, and use it from both
native and compat32 fcntl syscalls.

Reported by:	jhibbits
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2014-09-25 21:07:19 +00:00
Alexander Motin
6a9bcacfcf Remake Linux' SOUND_MIXER_INFO IOCTL as a wrapper around new FreeBSD's one.
Submitted by:	Dmitry Luhtionov <dmitryluhtionov@gmail.com>
MFC after:	3 days
2014-09-24 08:18:11 +00:00
Sean Bruno
d143d69857 Bump minimum linux compat version to support Centos6 ports updates for linux.
Update linux compat minimum revision to match linux-c6 now in ports.  This
is a candidate for 10.1 R as it matches the current state of supported
linux compat packages in the ports tree.

PR:		187786
Reviewed by:	xmj
MFC after:	2 days
Relnotes:	yes
2014-09-22 17:26:07 +00:00
Gleb Smirnoff
1411ec550f Fix build on 32-bit machines.
Pointy hat to:	glebius
2014-09-18 20:29:17 +00:00
Gleb Smirnoff
1e99b3f4e3 - Use if_get_counter() to fetch ifnet statistics.
- Report IFCOUNTER_OQDROPS to linprocfs. Wasn't there before.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-09-18 16:44:28 +00:00
Bjoern A. Zeeb
0a041f3b47 Implement most of timer_{create,settime,gettime,getoverrun,delete}
for amd64/linux32.  Fix the entirely bogus (untested) version from
r161310 for i386/linux using the same shared code in compat/linux.

It is unclear to me if we could support more clock mappings but
the current set allows me to successfully run commercial
32bit linux software under linuxolator on amd64.

Reviewed by:		jhb
Differential Revision:	D784
MFC after:		3 days
Sponsored by:		DARPA, AFRL
2014-09-18 08:36:45 +00:00
Mateusz Guzik
6662ce5aab Add missing proctree locking to fill_kinfo_proc consumers.
This fixes r270444.

Pointy hat:	mjg
Reported by:	many
MFC after:	1 week
2014-08-30 03:10:55 +00:00
Mateusz Guzik
8b04bbef31 Return real parent pid in kinfo (used by e.g. ps)
Add a separate field which exports tracer pid and add a new keyword
("tracer") for ps to display it.

This is a follow up to r270444.

Reviewed by:	kib
MFC after:	1 week
Relnotes:	yes
2014-08-28 08:41:11 +00:00
Konstantin Belousov
5aec07c73d Regen. 2014-08-27 01:02:19 +00:00
Konstantin Belousov
8fbeebf590 Fix handling of the third argument for fcntl(2). The native syscall
uses long for arg, which needs translation.

Discussed with and tested by:	mjg
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-08-27 01:02:02 +00:00
Gleb Smirnoff
15c28f87b8 All mbuf external free functions never fail, so let them be void.
Sponsored by:	Nginx, Inc.
2014-07-11 13:58:48 +00:00
Marcel Moolenaar
e7d939bda2 Remove ia64.
This includes:
o   All directories named *ia64*
o   All files named *ia64*
o   All ia64-specific code guarded by __ia64__
o   All ia64-specific makefile logic
o   Mention of ia64 in comments and documentation

This excludes:
o   Everything under contrib/
o   Everything under crypto/
o   sys/xen/interface
o   sys/sys/elf_common.h

Discussed at: BSDcan
2014-07-07 00:27:09 +00:00
Hans Petter Selasky
af3b2549c4 Pull in r267961 and r267973 again. Fix for issues reported will follow. 2014-06-28 03:56:17 +00:00
Glen Barber
37a107a407 Revert r267961, r267973:
These changes prevent sysctl(8) from returning proper output,
such as:

 1) no output from sysctl(8)
 2) erroneously returning ENOMEM with tools like truss(1)
    or uname(1)
 truss: can not get etype: Cannot allocate memory
2014-06-27 22:05:21 +00:00
Hans Petter Selasky
3da1cf1e88 Extend the meaning of the CTLFLAG_TUN flag to automatically check if
there is an environment variable which shall initialize the SYSCTL
during early boot. This works for all SYSCTL types both statically and
dynamically created ones, except for the SYSCTL NODE type and SYSCTLs
which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to
be used in the case a tunable sysctl has a custom initialisation
function allowing the sysctl to still be marked as a tunable. The
kernel SYSCTL API is mostly the same, with a few exceptions for some
special operations like iterating childrens of a static/extern SYSCTL
node. This operation should probably be made into a factored out
common macro, hence some device drivers use this. The reason for
changing the SYSCTL API was the need for a SYSCTL parent OID pointer
and not only the SYSCTL parent OID list pointer in order to quickly
generate the sysctl path. The motivation behind this patch is to avoid
parameter loading cludges inside the OFED driver subsystem. Instead of
adding special code to the OFED driver subsystem to post-load tunables
into dynamically created sysctls, we generalize this in the kernel.

Other changes:
- Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask"
to "hw.pcic.intr_mask".
- Removed redundant TUNABLE statements throughout the kernel.
- Some minor code rewrites in connection to removing not needed
TUNABLE statements.
- Added a missing SYSCTL_DECL().
- Wrapped two very long lines.
- Avoid malloc()/free() inside sysctl string handling, in case it is
called to initialize a sysctl from a tunable, hence malloc()/free() is
not ready when sysctls from the sysctl dataset are registered.
- Bumped FreeBSD version to indicate SYSCTL API change.

MFC after:	2 weeks
Sponsored by:	Mellanox Technologies
2014-06-27 16:33:43 +00:00
Alexander Motin
94fe9f959c - Add support for SG_GET_SG_TABLESIZE IOCTL to report that we don't support
scatter/gather lists.
- Return error for still unsupported SG 3.x API read/write calls.

MFC after:	1 month
2014-06-04 12:05:47 +00:00
Alexander Motin
fcaf473cfc Overhaul CAM SG driver IOCTL interfaces.
Make it really work for native FreeBSD programs.  Before this it was broken
for years due to different number of pointer dereferences in Linux and
FreeBSD IOCTL paths, permanently returning errors to FreeBSD programs.
This change breaks the driver FreeBSD IOCTL ABI, making it more strict,
but since it was not working any way -- who bother.

Add shims for 32-bit programs on 64-bit host, translating the argument
of the SG_IO IOCTL for both FreeBSD and Linux ABIs.

With this change I was able to run 32-bit Linux sg3_utils tools and simple
32 and 64-bit FreeBSD test tools on both 32 and 64-bit FreeBSD systems.

MFC after:	1 month
2014-06-02 19:53:53 +00:00
Dmitry Chagin
fb6bf8bba9 Glibc was switched to the FUTEX_WAIT_BITSET op and CLOCK_REALTIME
flag has been added instead of FUTEX_WAIT to replace the FUTEX_WAIT
logic which needs to do gettimeofday() calls before the futex syscall
to convert the absolute timeout to a relative timeout.
Before this the CLOCK_MONOTONIC used by the FUTEX_WAIT_BITSET op.

When the FUTEX_CLOCK_REALTIME is specified the timeout is an absolute
time, not a relative time. Rework futex_wait to handle this.
On the side fix the futex leak in error case and remove useless
parentheses.

Properly calculate the timeout for the CLOCK_MONOTONIC case.

MFC after:	3 days
2014-05-31 14:58:53 +00:00
Dmitry Chagin
32fd44657c In r218101 I have not changed properly the futex syscall definition.
Some Linux futex ops atomically verifies that the futex address uaddr
(uval) contains the value val. Comparing signed uval and unsigned val
may lead to an unexpected result, mostly to a deadlock.

So copyin uaddr to an unsigned int to compare the parameters correctly.

While here change ktr records to print parameters in more readable format.

Tested by	eadler@

MFC after:	3 days
2014-05-28 05:57:35 +00:00
Marcel Moolenaar
0fa211be96 In freebsd32_sendmsg(), replace the call to sockargs() followed by a
call to freebsd32_convert_msg_in() with freebsd32_copyin_control() to
readin and convert in a single step. This makes it simpler to put all
the control messages in a single mbuf or mbuf cluster as per the
limitations imposed upon us by ip6_setpktopts().

The logic is as follows:
1.  Go over the array of control messages to determine overall size
    and include extra padding for proper alignment as we go.
2.  Get a mbuf or mbuf cluster as needed or fail if the overall
    (adjusted) size is larger than a cluster.
3.  Go over the array of control messages again, but now copy them
    into kernel space and into aligned offsets.
4.  Update the length of the control message to take padding between
    the header and the data into account (but not for padding added
    between one control message and the next).

Obtained from:	Juniper Networks, Inc.
MFC after:	1 week
2014-04-05 18:56:01 +00:00
Warner Losh
8a27a339b6 Remove instances of variables that were set, but never used. gcc 4.9
warns about these by default.
2014-03-30 23:43:36 +00:00
Bryan Drewery
44f1c91610 Rename global cnt to vm_cnt to avoid shadowing.
To reduce the diff struct pcu.cnt field was not renamed, so
PCPU_OP(cnt.field) is still used. pc_cnt and pcpu are also used in
kvm(3) and vmstat(8). The goal was to not affect externally used KPI.

Bump __FreeBSD_version_ in case some out-of-tree module/code relies on the
the global cnt variable.

Exp-run revealed no ports using it directly.

No objection from:	arch@
Sponsored by:	EMC / Isilon Storage Division
2014-03-22 10:26:09 +00:00
Konstantin Belousov
88b124cede Make the array pointed to by AT_PAGESIZES auxv properly aligned.
Also, remove the expression which calculated the location of the
strings for a new image and grown over the time to be
non-comprehensible.  Instead, calculate the offsets by steps, which
also makes fixing the alignments much cleaner.

Reported and reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-03-19 12:35:04 +00:00
Attilio Rao
4f11a684ff Regen per r263318.
Sponsored by:	EMC / Isilon storage division
2014-03-18 21:34:11 +00:00
Attilio Rao
ce42e79310 Remove dead code from umtx support:
- Retire long time unused (basically always unused) sys__umtx_lock()
  and sys__umtx_unlock() syscalls
- struct umtx and their supporting definitions
- UMUTEX_ERROR_CHECK flag
- Retire UMTX_OP_LOCK/UMTX_OP_UNLOCK from _umtx_op() syscall

__FreeBSD_version is not bumped yet because it is expected that further
breakages to the umtx interface will follow up in the next days.
However there will be a final bump when necessary.

Sponsored by:	EMC / Isilon storage division
Reviewed by:	jhb
2014-03-18 21:32:03 +00:00
Ed Maste
0fcefb433d Update NetBSD Foundation copyrights to 2-clause BSD
The NetBSD Foundation states "Third parties are encouraged to change the
license on any files which have a 4-clause license contributed to the
NetBSD Foundation to a 2-clause license."

This change removes clauses 3 and 4 from copyright / license blocks that
list The NetBSD Foundation as the only copyright holder.

Sponsored by:	The FreeBSD Foundation
2014-03-18 01:40:25 +00:00
Robert Watson
4a14441044 Update kernel inclusions of capability.h to use capsicum.h instead; some
further refinement is required as some device drivers intended to be
portable over FreeBSD versions rely on __FreeBSD_version to decide whether
to include capability.h.

MFC after:	3 weeks
2014-03-16 10:55:57 +00:00
John-Mark Gurney
6f2b769cac change td_retval into a union w/ off_t, with defines to mask the
change...  This eliminates a cast, and also forces td_retval
(often 2 32-bit registers) to be aligned so that off_t's can be
stored there on arches with strict alignment requirements like
armeb (AVILA)...  On i386, this doesn't change alignment, and on
amd64 it doesn't either, as register_t is already 64bits...

This will also prevent future breakage due to people adding additional
fields to the struct...

This gets AVILA booting a bit farther...

Reviewed by:	bde
2014-03-16 00:53:40 +00:00
Gleb Smirnoff
b245f96c44 Since 32-bit if_baudrate isn't enough to describe a baud rate of a 10 Gbit
interface, in the r241616 a crutch was provided. It didn't work well, and
finally we decided that it is time to break ABI and simply make if_baudrate
a 64-bit value. Meanwhile, the entire struct if_data was reviewed.

o Remove the if_baudrate_pf crutch.

o Make all fields of struct if_data fixed machine independent size. The
  notion of data (packet counters, etc) are by no means MD. And it is a
  bug that on amd64 we've got a 64-bit counters, while on i386 32-bit,
  which at modern speeds overflow within a second.

  This also removes quite a lot of COMPAT_FREEBSD32 code.

o Give 16 bit for the ifi_datalen field. This field was provided to
  make future changes to if_data less ABI breaking. Unfortunately the
  8 bit size of it had effectively limited sizeof if_data to 256 bytes.

o Give 32 bits to ifi_mtu and ifi_metric.
o Give 64 bits to the rest of fields, since they are counters.

__FreeBSD_version bumped.

Discussed with:	emax
Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-03-13 03:42:24 +00:00
Eitan Adler
9ace80105a linprocfs: add support for /sys/kernel/random/uuid
PR:		kern/186187
Submitted by:	Fernando <fernando.apesteguia@gmail.com>
MFC After:	2 weeks
2014-02-27 00:43:10 +00:00
Konstantin Belousov
49d39308ba The posix_madvise(3) and posix_fadvise(2) should return error on
failure, same as posix_fallocate(2).

Noted by:	Bob Bishop <rb@gid.co.uk>
Discussed with:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-01-30 18:04:39 +00:00
Konstantin Belousov
2852de0489 The posix_fallocate(2) syscall should return error number on error,
without modifying errno.

Reported and tested by:	Gennady Proskurin <gpr@mail.ru>
Reviewed by:	mdf
PR:	standards/186028
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-01-23 17:24:26 +00:00