Commit Graph

2293 Commits

Author SHA1 Message Date
Gleb Smirnoff
eedc7fd9e8 Provide includes that are needed in these files, and before were read
in implicitly via if.h -> if_var.h pollution.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2013-10-26 18:18:50 +00:00
Gleb Smirnoff
76039bc84f The r48589 promised to remove implicit inclusion of if_var.h soon. Prepare
to this event, adding if_var.h to files that do need it. Also, include
all includes that now are included due to implicit pollution via if_var.h

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2013-10-26 17:58:36 +00:00
Konstantin Belousov
3a2092bad0 Add padding to match the compat32 struct stat32 definition to the real
struct stat on 32bit architectures.

Debugged and tested by:	bsam
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Approved by:	re (marius)
2013-10-04 22:05:23 +00:00
Mark Johnston
92c6196caa Fix some typos that were causing probe argument types to show up as unknown.
Reviewed by:	rwatson (mac provider)
Approved by:	re (glebius)
MFC after:	1 week
2013-10-01 15:40:27 +00:00
Mark Johnston
8d305ba0dc Regenerate syscall argument strings after r255777.
Approved by:	re (gjb)
MFC after:	1 week
2013-09-21 23:06:36 +00:00
John Baldwin
a566e8e3c5 Regen.
Approved by:	re (delphij)
2013-09-19 18:56:00 +00:00
John Baldwin
55648840de Extend the support for exempting processes from being killed when swap is
exhausted.
- Add a new protect(1) command that can be used to set or revoke protection
  from arbitrary processes.  Similar to ktrace it can apply a change to all
  existing descendants of a process as well as future descendants.
- Add a new procctl(2) system call that provides a generic interface for
  control operations on processes (as opposed to the debugger-specific
  operations provided by ptrace(2)).  procctl(2) uses a combination of
  idtype_t and an id to identify the set of processes on which to operate
  similar to wait6().
- Add a PROC_SPROTECT control operation to manage the protection status
  of a set of processes.  MADV_PROTECT still works for backwards
  compatability.
- Add a p_flag2 to struct proc (and a corresponding ki_flag2 to kinfo_proc)
  the first bit of which is used to track if P_PROTECT should be inherited
  by new child processes.

Reviewed by:	kib, jilles (earlier version)
Approved by:	re (delphij)
MFC after:	1 month
2013-09-19 18:53:42 +00:00
Roman Divacky
b12698e1a1 Revert r255672, it has some serious flaws, leaking file references etc.
Approved by:	re (delphij)
2013-09-18 18:48:33 +00:00
Roman Divacky
253c75c0de Implement epoll support in Linuxulator. This is a tiny wrapper around kqueue
to implement epoll subset of functionality. The kqueue user data are 32bit
on i386 which is not enough for epoll user data so this patch overrides
kqueue fileops to maintain enough space in struct file.

Initial patch developed by me in 2007 and then extended and finished
by Yuri Victorovich.

Approved by:    re (delphij)
Sponsored by:   Google Summer of Code
Submitted by:   Yuri Victorovich <yuri at rawbw dot com>
Tested by:      Yuri Victorovich <yuri at rawbw dot com>
2013-09-18 17:56:04 +00:00
Jilles Tjoelker
9fdb497cd0 Regenerate for freebsd32_cap_enter().
Approved by:	re (hrs)
2013-09-17 20:49:05 +00:00
Jilles Tjoelker
529411c369 Disallow cap_enter() in freebsd32 compatibility mode.
The freebsd32 compatibility mode (for running 32-bit binaries on 64-bit
kernels) does not currently allow any system calls in capability mode, but
still permits cap_enter(). As a result, 32-bit binaries on 64-bit kernels
that use capability mode do not work (they crash after being disallowed to
call sys_exit()). Affected binaries include dhclient and uniq. The latter's
crashes cause obscure build failures.

This commit makes freebsd32 cap_enter() fail with [ENOSYS], as if capability
mode was not compiled in. Applications deal with this by doing their work
without capability mode.

This commit does not fix the uncommon situation where a 64-bit process
enters capability mode and then executes a 32-bit binary using fexecve().

This commit should be reverted when allowing the necessary freebsd32 system
calls in capability mode.

Reviewed by:	pjd
Approved by:	re (hrs)
2013-09-17 20:48:19 +00:00
John Baldwin
edb572a38c Add a mmap flag (MAP_32BIT) on 64-bit platforms to request that a mapping use
an address in the first 2GB of the process's address space.  This flag should
have the same semantics as the same flag on Linux.

To facilitate this, add a new parameter to vm_map_find() that specifies an
optional maximum virtual address.  While here, fix several callers of
vm_map_find() to use a VMFS_* constant for the findspace argument instead of
TRUE and FALSE.

Reviewed by:	alc
Approved by:	re (kib)
2013-09-09 18:11:59 +00:00
Pawel Jakub Dawidek
00a7f703b3 Regenerate after r255219.
Sponsored by:	The FreeBSD Foundation
2013-09-05 00:11:59 +00:00
Pawel Jakub Dawidek
7008be5bd7 Change the cap_rights_t type from uint64_t to a structure that we can extend
in the future in a backward compatible (API and ABI) way.

The cap_rights_t represents capability rights. We used to use one bit to
represent one right, but we are running out of spare bits. Currently the new
structure provides place for 114 rights (so 50 more than the previous
cap_rights_t), but it is possible to grow the structure to hold at least 285
rights, although we can make it even larger if 285 rights won't be enough.

The structure definition looks like this:

	struct cap_rights {
		uint64_t	cr_rights[CAP_RIGHTS_VERSION + 2];
	};

The initial CAP_RIGHTS_VERSION is 0.

The top two bits in the first element of the cr_rights[] array contain total
number of elements in the array - 2. This means if those two bits are equal to
0, we have 2 array elements.

The top two bits in all remaining array elements should be 0.
The next five bits in all array elements contain array index. Only one bit is
used and bit position in this five-bits range defines array index. This means
there can be at most five array elements in the future.

To define new right the CAPRIGHT() macro must be used. The macro takes two
arguments - an array index and a bit to set, eg.

	#define	CAP_PDKILL	CAPRIGHT(1, 0x0000000000000800ULL)

We still support aliases that combine few rights, but the rights have to belong
to the same array element, eg:

	#define	CAP_LOOKUP	CAPRIGHT(0, 0x0000000000000400ULL)
	#define	CAP_FCHMOD	CAPRIGHT(0, 0x0000000000002000ULL)

	#define	CAP_FCHMODAT	(CAP_FCHMOD | CAP_LOOKUP)

There is new API to manage the new cap_rights_t structure:

	cap_rights_t *cap_rights_init(cap_rights_t *rights, ...);
	void cap_rights_set(cap_rights_t *rights, ...);
	void cap_rights_clear(cap_rights_t *rights, ...);
	bool cap_rights_is_set(const cap_rights_t *rights, ...);

	bool cap_rights_is_valid(const cap_rights_t *rights);
	void cap_rights_merge(cap_rights_t *dst, const cap_rights_t *src);
	void cap_rights_remove(cap_rights_t *dst, const cap_rights_t *src);
	bool cap_rights_contains(const cap_rights_t *big, const cap_rights_t *little);

Capability rights to the cap_rights_init(), cap_rights_set(),
cap_rights_clear() and cap_rights_is_set() functions are provided by
separating them with commas, eg:

	cap_rights_t rights;

	cap_rights_init(&rights, CAP_READ, CAP_WRITE, CAP_FSTAT);

There is no need to terminate the list of rights, as those functions are
actually macros that take care of the termination, eg:

	#define	cap_rights_set(rights, ...)				\
		__cap_rights_set((rights), __VA_ARGS__, 0ULL)
	void __cap_rights_set(cap_rights_t *rights, ...);

Thanks to using one bit as an array index we can assert in those functions that
there are no two rights belonging to different array elements provided
together. For example this is illegal and will be detected, because CAP_LOOKUP
belongs to element 0 and CAP_PDKILL to element 1:

	cap_rights_init(&rights, CAP_LOOKUP | CAP_PDKILL);

Providing several rights that belongs to the same array's element this way is
correct, but is not advised. It should only be used for aliases definition.

This commit also breaks compatibility with some existing Capsicum system calls,
but I see no other way to do that. This should be fine as Capsicum is still
experimental and this change is not going to 9.x.

Sponsored by:	The FreeBSD Foundation
2013-09-05 00:09:56 +00:00
Will Andrews
5e9ccc8797 Add the ability to display the default FIB number for a process to the
ps(1) utility, e.g. "ps -O fib".

bin/ps/keyword.c:
	Add the "fib" keyword and default its column name to "FIB".

bin/ps/ps.1:
	Add "fib" as a supported keyword.

sys/compat/freebsd32/freebsd32.h:
sys/kern/kern_proc.c:
sys/sys/user.h:
	Add the default fib number for a process (p->p_fibnum)
	to the user land accessible process data of struct kinfo_proc.

Submitted by:	Oliver Fromme <olli@fromme.com>, gibbs
2013-08-26 23:48:21 +00:00
Andre Oppermann
bb25e5ab00 Give (*ext_free) an int return value allowing for very sophisticated
external mbuf buffer management capabilities in the future.

For now only EXT_FREE_OK is defined with current legacy behavior.

Sponsored by:	The FreeBSD Foundation
2013-08-25 10:57:09 +00:00
Pawel Jakub Dawidek
f69e5b30c9 Regenerate after r254491. 2013-08-18 13:38:39 +00:00
Pawel Jakub Dawidek
32536142f6 The cap_rights_limit(2) system calls needs a wrapper for 32bit binaries
running under 64bit kernels as the 'rights' argument has to be split
into two registers or the half of the rights will disappear.

Reported by:	jilles
Sponsored by:	The FreeBSD Foundation
2013-08-18 13:37:54 +00:00
Pawel Jakub Dawidek
d6f6b87647 Move the PAIR32TO64() macro and the RETVAL_HI/RETVAL_LO defines to a
header file for use by other .c files.

Sponsored by:	The FreeBSD Foundation
2013-08-18 13:34:11 +00:00
Pawel Jakub Dawidek
e11dc435ba Regenerate after r254481. 2013-08-18 10:31:30 +00:00
Pawel Jakub Dawidek
0dac22d8ea Implement 32bit versions of the cap_ioctls_limit(2) and cap_ioctls_get(2)
system calls as unsigned longs have different size on i386 and amd64.

Reported by:	jilles
Sponsored by:	The FreeBSD Foundation
2013-08-18 10:30:41 +00:00
Mark Johnston
1570438586 Remove a couple of unused macros.
MFC after:	3 days
2013-08-17 21:53:37 +00:00
Pawel Jakub Dawidek
9a57f6e88c Regenerate after r254447.
Sponsored by:	The FreeBSD Foundation
2013-08-17 14:18:41 +00:00
Pawel Jakub Dawidek
b49f2e4b48 Make pdfork(2), pdkill(2) and pdgetpid(2) syscalls available for 32bit
binaries running under 64bit kernel.

Sponsored by:	The FreeBSD Foundation
2013-08-17 14:17:13 +00:00
Gleb Smirnoff
ca04d21d5f Make sendfile() a method in the struct fileops. Currently only
vnode backed file descriptors have this method implemented.

Reviewed by:	kib
Sponsored by:	Nginx, Inc.
Sponsored by:	Netflix
2013-08-15 07:54:31 +00:00
Jeff Roberson
5df87b21d3 Replace kernel virtual address space allocation with vmem. This provides
transparent layering and better fragmentation.

 - Normalize functions that allocate memory to use kmem_*
 - Those that allocate address space are named kva_*
 - Those that operate on maps are named kmap_*
 - Implement recursive allocation handling for kmem_arena in vmem.

Reviewed by:	alc
Tested by:	pho
Sponsored by:	EMC / Isilon Storage Division
2013-08-07 06:21:20 +00:00
Konstantin Belousov
68044954d2 Regenerate. 2013-07-21 19:44:53 +00:00
Konstantin Belousov
643ee87175 Implement compat32 wrappers for the ktimer_* syscalls.
Reported, reviewed and tested by:	Petr Salinger <Petr.Salinger@seznam.cz>
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2013-07-21 19:43:52 +00:00
Konstantin Belousov
058262c69a Wrap kmq_notify(2) for compat32 to properly consume struct sigevent32
argument.

Reviewed and tested by:	Petr Salinger <Petr.Salinger@seznam.cz>
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2013-07-21 19:40:30 +00:00
Konstantin Belousov
ab751a525b The freebsd32_lio_listio() compat syscall takes the struct sigevent32.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2013-07-21 19:36:53 +00:00
Konstantin Belousov
9731998997 Move the convert_sigevent32() utility function into freebsd32_misc.c
for consumption outside the vfs_aio.c.

For SIGEV_THREAD_ID and SIGEV_SIGNAL notification delivery methods,
also copy in the sigev_value, since librt event pumping loop compares
note generation number with the value passed through sigev_value.

Tested by:	Petr Salinger <Petr.Salinger@seznam.cz>
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2013-07-21 19:33:48 +00:00
Konstantin Belousov
df4d0ed1cd Cosmetic change, use the same union name on the left and right sides
of the conversion.

Tested by:	Petr Salinger <Petr.Salinger@seznam.cz>
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2013-07-21 19:17:46 +00:00
Konstantin Belousov
c5d0363709 Regenerate 2013-07-20 13:40:03 +00:00
Konstantin Belousov
d31e4b3a58 id_t is 64bit, provide the compat32 wrapper for clock_getcpuclockid2(2).
Reported and tested by:	Petr Salinger <Petr.Salinger@seznam.cz>
PR:	threads/180652
Sponsored by:	The FreeBSD Foundation
2013-07-20 13:39:41 +00:00
Hans Petter Selasky
a40a377cc7 Add some missing LIBUSB IOCTL conversion codes. 2013-07-14 10:13:01 +00:00
Alexander Leidinger
b85e1f7d05 - Move videodev headers from compat/linux to contrib/v4l (cp from vendor and
apply diff to compat/linux versions).
- The cp implies an update of videodev2.h to the linux kernel 2.6.34.14 one.

The update makes video in skype v4 work on FreeBSD.

Tested by:	Artyom Mirgorodskiy <artyom.mirgorodsky@gmail.com>
		(update of header only)
2013-07-06 19:59:06 +00:00
Gleb Smirnoff
8d1aa3c6b4 aio_mlock() added:
- Regen for r251526.
  - Bump __FreeBSD_version.
2013-06-08 13:30:13 +00:00
Gleb Smirnoff
6160e12c10 Add new system call - aio_mlock(). The name speaks for itself. It allows
to perform the mlock(2) operation, which can consume a lot of time, under
control of aio(4).

Reviewed by:	kib, jilles
Sponsored by:	Nginx, Inc.
2013-06-08 13:27:57 +00:00
Alan Cox
66c392df53 Relax the vm object locking. Use a read lock.
Sponsored by:	EMC / Isilon Storage Division
2013-06-05 17:00:10 +00:00
David E. O'Brien
1f4e9654bd Add a "kern.features" MIB for 32bit support under a 64bit kernel. 2013-05-31 21:43:17 +00:00
Konstantin Belousov
f85769eb75 Regenerate. 2013-05-21 11:41:08 +00:00
Konstantin Belousov
48947eccee Fix the wait6(2) on 32bit architectures and for the compat32, by using
the right type for the argument in syscalls.master.  Also fix the
posix_fallocate(2) and posix_fadvise(2) compat32 syscalls on the
architectures which require padding of the 64bit argument.

Noted and reviewed by:	jhb
Pointy hat to:	kib
MFC after:	1 week
2013-05-21 11:40:16 +00:00
Jilles Tjoelker
b201f4a0dc Regenerate files for pipe2(). 2013-05-01 22:45:04 +00:00
Jilles Tjoelker
dc570d5e56 Add pipe2() system call.
The pipe2() function is similar to pipe() but allows setting FD_CLOEXEC and
O_NONBLOCK (on both sides) as part of the function.

If p points to two writable ints, pipe2(p, 0) is equivalent to pipe(p).

If the pointer is not valid, behaviour differs: pipe2() writes into the
array from the kernel like socketpair() does, while pipe() writes into the
array from an architecture-specific assembler wrapper.

Reviewed by:	kan, kib
2013-05-01 22:42:42 +00:00
Jilles Tjoelker
1bf6b724f1 Regenerate files for accept4(). 2013-05-01 20:12:58 +00:00
Jilles Tjoelker
da7d2afb6d Add accept4() system call.
The accept4() function, compared to accept(), allows setting the new file
descriptor atomically close-on-exec and explicitly controlling the
non-blocking status on the new socket. (Note that the latter point means
that accept() is not equivalent to any form of accept4().)

The linuxulator's accept4 implementation leaves a race window where the new
file descriptor is not close-on-exec because it calls sys_accept(). This
implementation leaves no such race window (by using falloc() flags). The
linuxulator could be fixed and simplified by using the new code.

Like accept(), accept4() is async-signal-safe, a cancellation point and
permitted in capability mode.
2013-05-01 20:10:21 +00:00
Matthew D Fleming
b3e6bbc676 Regen.
MFC after:	1 week
2013-04-02 05:30:52 +00:00
Matthew D Fleming
e324bf91e8 Fix return type of extattr_set_* and fix rmextattr(8) utility.
extattr_set_{fd,file,link} is logically a write(2)-like operation and
should return ssize_t, just like extattr_get_*.  Also, the user-space
utility was using an int for the return value of extattr_get_* and
extattr_list_*, both of which return an ssize_t.

MFC after:	1 week
2013-04-02 05:30:41 +00:00
Jilles Tjoelker
d289dc7b73 Rename do_pipe() to kern_pipe2() and declare it properly. 2013-03-31 17:42:54 +00:00
Pawel Jakub Dawidek
5d46382415 Regenerate after r248599.
Sponsored by:	The FreeBSD Foundation
2013-03-21 23:02:19 +00:00
Pawel Jakub Dawidek
e948704e4b Implement chflagsat(2) system call, similar to fchmodat(2), but operates on
file flags.

Reviewed by:	kib, jilles
Sponsored by:	The FreeBSD Foundation
2013-03-21 22:59:01 +00:00
Pawel Jakub Dawidek
14cd1ffdf8 Regenerate after r248597.
Sponsored by:	The FreeBSD Foundation
2013-03-21 22:47:03 +00:00
Pawel Jakub Dawidek
b4b2596b97 - Make 'flags' argument to chflags(2), fchflags(2) and lchflags(2) of type
u_long. Before this change it was of type int for syscalls, but prototypes
  in sys/stat.h and documentation for chflags(2) and fchflags(2) (but not
  for lchflags(2)) stated that it was u_long. Now some related functions
  use u_long type for flags (strtofflags(3), fflagstostr(3)).
- Make path argument of type 'const char *' for consistency.

Discussed on:	arch
Sponsored by:	The FreeBSD Foundation
2013-03-21 22:44:33 +00:00
Gleb Smirnoff
dc4ad05ecd Use m_get/m_gethdr instead of compat macros.
Sponsored by:	Nginx, Inc.
2013-03-15 12:55:30 +00:00
Attilio Rao
89f6b8632c Switch the vm_object mutex to be a rwlock. This will enable in the
future further optimizations where the vm_object lock will be held
in read mode most of the time the page cache resident pool of pages
are accessed for reading purposes.

The change is mostly mechanical but few notes are reported:
* The KPI changes as follow:
  - VM_OBJECT_LOCK() -> VM_OBJECT_WLOCK()
  - VM_OBJECT_TRYLOCK() -> VM_OBJECT_TRYWLOCK()
  - VM_OBJECT_UNLOCK() -> VM_OBJECT_WUNLOCK()
  - VM_OBJECT_LOCK_ASSERT(MA_OWNED) -> VM_OBJECT_ASSERT_WLOCKED()
    (in order to avoid visibility of implementation details)
  - The read-mode operations are added:
    VM_OBJECT_RLOCK(), VM_OBJECT_TRYRLOCK(), VM_OBJECT_RUNLOCK(),
    VM_OBJECT_ASSERT_RLOCKED(), VM_OBJECT_ASSERT_LOCKED()
* The vm/vm_pager.h namespace pollution avoidance (forcing requiring
  sys/mutex.h in consumers directly to cater its inlining functions
  using VM_OBJECT_LOCK()) imposes that all the vm/vm_pager.h
  consumers now must include also sys/rwlock.h.
* zfs requires a quite convoluted fix to include FreeBSD rwlocks into
  the compat layer because the name clash between FreeBSD and solaris
  versions must be avoided.
  At this purpose zfs redefines the vm_object locking functions
  directly, isolating the FreeBSD components in specific compat stubs.

The KPI results heavilly broken by this commit.  Thirdy part ports must
be updated accordingly (I can think off-hand of VirtualBox, for example).

Sponsored by:	EMC / Isilon storage division
Reviewed by:	jeff
Reviewed by:	pjd (ZFS specific review)
Discussed with:	alc
Tested by:	pho
2013-03-09 02:32:23 +00:00
Eitan Adler
1eb9ea583b Remove check for NULL prior to free(9) and m_freem(9).
Approved by:	cperciva (mentor)
2013-03-04 02:21:34 +00:00
Pawel Jakub Dawidek
378a73d1bd Regen after r247667. 2013-03-02 21:12:54 +00:00
Pawel Jakub Dawidek
7493f24ee6 - Implement two new system calls:
int bindat(int fd, int s, const struct sockaddr *addr, socklen_t addrlen);
	int connectat(int fd, int s, const struct sockaddr *name, socklen_t namelen);

  which allow to bind and connect respectively to a UNIX domain socket with a
  path relative to the directory associated with the given file descriptor 'fd'.

- Add manual pages for the new syscalls.

- Make the new syscalls available for processes in capability mode sandbox.

- Add capability rights CAP_BINDAT and CAP_CONNECTAT that has to be present on
  the directory descriptor for the syscalls to work.

- Update audit(4) to support those two new syscalls and to handle path
  in sockaddr_un structure relative to the given directory descriptor.

- Update procstat(1) to recognize the new capability rights.

- Document the new capability rights in cap_rights_limit(2).

Sponsored by:	The FreeBSD Foundation
Discussed with:	rwatson, jilles, kib, des
2013-03-02 21:11:30 +00:00
Pawel Jakub Dawidek
1dc31587bf Regen after r247602. 2013-03-02 00:55:09 +00:00
Pawel Jakub Dawidek
2609222ab4 Merge Capsicum overhaul:
- Capability is no longer separate descriptor type. Now every descriptor
  has set of its own capability rights.

- The cap_new(2) system call is left, but it is no longer documented and
  should not be used in new code.

- The new syscall cap_rights_limit(2) should be used instead of
  cap_new(2), which limits capability rights of the given descriptor
  without creating a new one.

- The cap_getrights(2) syscall is renamed to cap_rights_get(2).

- If CAP_IOCTL capability right is present we can further reduce allowed
  ioctls list with the new cap_ioctls_limit(2) syscall. List of allowed
  ioctls can be retrived with cap_ioctls_get(2) syscall.

- If CAP_FCNTL capability right is present we can further reduce fcntls
  that can be used with the new cap_fcntls_limit(2) syscall and retrive
  them with cap_fcntls_get(2).

- To support ioctl and fcntl white-listing the filedesc structure was
  heavly modified.

- The audit subsystem, kdump and procstat tools were updated to
  recognize new syscalls.

- Capability rights were revised and eventhough I tried hard to provide
  backward API and ABI compatibility there are some incompatible changes
  that are described in detail below:

	CAP_CREATE old behaviour:
	- Allow for openat(2)+O_CREAT.
	- Allow for linkat(2).
	- Allow for symlinkat(2).
	CAP_CREATE new behaviour:
	- Allow for openat(2)+O_CREAT.

	Added CAP_LINKAT:
	- Allow for linkat(2). ABI: Reuses CAP_RMDIR bit.
	- Allow to be target for renameat(2).

	Added CAP_SYMLINKAT:
	- Allow for symlinkat(2).

	Removed CAP_DELETE. Old behaviour:
	- Allow for unlinkat(2) when removing non-directory object.
	- Allow to be source for renameat(2).

	Removed CAP_RMDIR. Old behaviour:
	- Allow for unlinkat(2) when removing directory.

	Added CAP_RENAMEAT:
	- Required for source directory for the renameat(2) syscall.

	Added CAP_UNLINKAT (effectively it replaces CAP_DELETE and CAP_RMDIR):
	- Allow for unlinkat(2) on any object.
	- Required if target of renameat(2) exists and will be removed by this
	  call.

	Removed CAP_MAPEXEC.

	CAP_MMAP old behaviour:
	- Allow for mmap(2) with any combination of PROT_NONE, PROT_READ and
	  PROT_WRITE.
	CAP_MMAP new behaviour:
	- Allow for mmap(2)+PROT_NONE.

	Added CAP_MMAP_R:
	- Allow for mmap(PROT_READ).
	Added CAP_MMAP_W:
	- Allow for mmap(PROT_WRITE).
	Added CAP_MMAP_X:
	- Allow for mmap(PROT_EXEC).
	Added CAP_MMAP_RW:
	- Allow for mmap(PROT_READ | PROT_WRITE).
	Added CAP_MMAP_RX:
	- Allow for mmap(PROT_READ | PROT_EXEC).
	Added CAP_MMAP_WX:
	- Allow for mmap(PROT_WRITE | PROT_EXEC).
	Added CAP_MMAP_RWX:
	- Allow for mmap(PROT_READ | PROT_WRITE | PROT_EXEC).

	Renamed CAP_MKDIR to CAP_MKDIRAT.
	Renamed CAP_MKFIFO to CAP_MKFIFOAT.
	Renamed CAP_MKNODE to CAP_MKNODEAT.

	CAP_READ old behaviour:
	- Allow pread(2).
	- Disallow read(2), readv(2) (if there is no CAP_SEEK).
	CAP_READ new behaviour:
	- Allow read(2), readv(2).
	- Disallow pread(2) (CAP_SEEK was also required).

	CAP_WRITE old behaviour:
	- Allow pwrite(2).
	- Disallow write(2), writev(2) (if there is no CAP_SEEK).
	CAP_WRITE new behaviour:
	- Allow write(2), writev(2).
	- Disallow pwrite(2) (CAP_SEEK was also required).

	Added convinient defines:

	#define	CAP_PREAD		(CAP_SEEK | CAP_READ)
	#define	CAP_PWRITE		(CAP_SEEK | CAP_WRITE)
	#define	CAP_MMAP_R		(CAP_MMAP | CAP_SEEK | CAP_READ)
	#define	CAP_MMAP_W		(CAP_MMAP | CAP_SEEK | CAP_WRITE)
	#define	CAP_MMAP_X		(CAP_MMAP | CAP_SEEK | 0x0000000000000008ULL)
	#define	CAP_MMAP_RW		(CAP_MMAP_R | CAP_MMAP_W)
	#define	CAP_MMAP_RX		(CAP_MMAP_R | CAP_MMAP_X)
	#define	CAP_MMAP_WX		(CAP_MMAP_W | CAP_MMAP_X)
	#define	CAP_MMAP_RWX		(CAP_MMAP_R | CAP_MMAP_W | CAP_MMAP_X)
	#define	CAP_RECV		CAP_READ
	#define	CAP_SEND		CAP_WRITE

	#define	CAP_SOCK_CLIENT \
		(CAP_CONNECT | CAP_GETPEERNAME | CAP_GETSOCKNAME | CAP_GETSOCKOPT | \
		 CAP_PEELOFF | CAP_RECV | CAP_SEND | CAP_SETSOCKOPT | CAP_SHUTDOWN)
	#define	CAP_SOCK_SERVER \
		(CAP_ACCEPT | CAP_BIND | CAP_GETPEERNAME | CAP_GETSOCKNAME | \
		 CAP_GETSOCKOPT | CAP_LISTEN | CAP_PEELOFF | CAP_RECV | CAP_SEND | \
		 CAP_SETSOCKOPT | CAP_SHUTDOWN)

	Added defines for backward API compatibility:

	#define	CAP_MAPEXEC		CAP_MMAP_X
	#define	CAP_DELETE		CAP_UNLINKAT
	#define	CAP_MKDIR		CAP_MKDIRAT
	#define	CAP_RMDIR		CAP_UNLINKAT
	#define	CAP_MKFIFO		CAP_MKFIFOAT
	#define	CAP_MKNOD		CAP_MKNODAT
	#define	CAP_SOCK_ALL		(CAP_SOCK_CLIENT | CAP_SOCK_SERVER)

Sponsored by:	The FreeBSD Foundation
Reviewed by:	Christoph Mallon <christoph.mallon@gmx.de>
Many aspects discussed with:	rwatson, benl, jonathan
ABI compatibility discussed with:	kib
2013-03-02 00:53:12 +00:00
Xin LI
69136e792a Fix wrong assignment.
Submitted by:	Sascha Wildner <saw online de>
Obtained from:	DragonFly rev 9568dd07a22a136e380e6c19a8ea188eb92976d5
MFC after:	2 weeks
2013-03-01 23:21:18 +00:00
John Baldwin
d825ce0a5d Reduce duplication between i386/linux/linux.h and amd64/linux32/linux.h
by moving bits that are MI out into headers in compat/linux.

Reviewed by:	Chagin Dmitry  dmitry | gmail
MFC after:	2 weeks
2013-01-29 18:41:30 +00:00
Dmitry Chagin
4d04cf1d9e Arithmetic on pointers takes into account the size of the type. Properly cast the pointer to avoid incorrect pointer scaling.
MFC after:	1 Week
2013-01-25 14:40:54 +00:00
John Baldwin
fb709557a3 Don't assume that all Linux TCP-level socket options are identical to
FreeBSD TCP-level socket options (only the first two are).  Instead,
using a mapping function and fail unsupported options as we do for other
socket option levels.

MFC after:	2 weeks
2013-01-23 21:44:48 +00:00
Gleb Smirnoff
eb1b1807af Mechanically substitute flags from historic mbuf allocator with
malloc(9) flags within sys.

Exceptions:

- sys/contrib not touched
- sys/mbuf.h edited manually
2012-12-05 08:04:20 +00:00
Colin Percival
43f13bea35 MFS security patches which seem to have accidentally not reached HEAD:
Fix insufficient message length validation for EAP-TLS messages.

Fix Linux compatibility layer input validation error.

Security:	FreeBSD-SA-12:07.hostapd
Security:	FreeBSD-SA-12:08.linux
Security:	CVE-2012-4445, CVE-2012-4576
With hat:	so@
2012-11-23 01:48:31 +00:00
Konstantin Belousov
a2a8559624 Style fixes for r242958.
Reported and reviewed by:	bde
MFC after:	28 days
2012-11-16 06:22:14 +00:00
Konstantin Belousov
552e993580 Regen 2012-11-13 12:53:41 +00:00
Konstantin Belousov
f13b5a0f01 Add the wait6(2) system call. It takes POSIX waitid()-like process
designator to select a process which is waited for. The system call
optionally returns siginfo_t which would be otherwise provided to
SIGCHLD handler, as well as extended structure accounting for child
and cumulative grandchild resource usage.

Allow to get the current rusage information for non-exited processes
as well, similar to Solaris.

The explicit WEXITED flag is required to wait for exited processes,
allowing for more fine-grained control of the events the waiter is
interested in.

Fix the handling of siginfo for WNOWAIT option for all wait*(2)
family, by not removing the queued signal state.

PR:	standards/170346
Submitted by:	"Jukka A. Ukkonen" <jau@iki.fi>
MFC after:	1 month
2012-11-13 12:52:31 +00:00
Konstantin Belousov
140dedb81c The r241025 fixed the case when a binary, executed from nullfs mount,
was still possible to open for write from the lower filesystem.  There
is a symmetric situation where the binary could already has file
descriptors opened for write, but it can be executed from the nullfs
overlay.

Handle the issue by passing one v_writecount reference to the lower
vnode if nullfs vnode has non-zero v_writecount.  Note that only one
write reference can be donated, since nullfs only keeps one use
reference on the lower vnode.  Always use the lower vnode v_writecount
for the checks.

Introduce the VOP_GET_WRITECOUNT to read v_writecount, which is
currently always bypassed to the lower vnode, and VOP_ADD_WRITECOUNT
to manipulate the v_writecount value, which manages a single bypass
reference to the lower vnode.  Caling the VOPs instead of directly
accessing v_writecount provide the fix described in the previous
paragraph.

Tested by:	pho
MFC after:	3 weeks
2012-11-02 13:56:36 +00:00
Konstantin Belousov
5050aa86cf Remove the support for using non-mpsafe filesystem modules.
In particular, do not lock Giant conditionally when calling into the
filesystem module, remove the VFS_LOCK_GIANT() and related
macros. Stop handling buffers belonging to non-mpsafe filesystems.

The VFS_VERSION is bumped to indicate the interface change which does
not result in the interface signatures changes.

Conducted and reviewed by:	attilio
Tested by:	pho
2012-10-22 17:50:54 +00:00
Kevin Lo
9823d52705 Revert previous commit...
Pointyhat to:	kevlo (myself)
2012-10-10 08:36:38 +00:00
Kevin Lo
a10cee30c9 Prefer NULL over 0 for pointers 2012-10-09 08:27:40 +00:00
Konstantin Belousov
877d24ac8a Fix the mis-handling of the VV_TEXT on the nullfs vnodes.
If you have a binary on a filesystem which is also mounted over by
nullfs, you could execute the binary from the lower filesystem, or
from the nullfs mount. When executed from lower filesystem, the lower
vnode gets VV_TEXT flag set, and the file cannot be modified while the
binary is active. But, if executed as the nullfs alias, only the
nullfs vnode gets VV_TEXT set, and you still can open the lower vnode
for write.

Add a set of VOPs for the VV_TEXT query, set and clear operations,
which are correctly bypassed to lower vnode.

Tested by:	pho (previous version)
MFC after:	2 weeks
2012-09-28 11:25:02 +00:00
Kevin Lo
457a9cfbc1 Remove redundant check 2012-09-12 10:12:03 +00:00
John Baldwin
a89828a2b0 Remove some more NetBSD compat shims and other unused bits from these
drivers:
- Remove scsi_low_pisa.*, they were unused.
- Remove <compat/netbsd/physio_proc.h> and calls to the stubs in that
  header.  They were empty nops.
- Retire sl_xname and use device_get_nameunit() and device_printf() with
  the underlying device_t instead.
- Remove unused {ct,ncv,nsp,stg}print() functions.
- Remove empty SOFT_INTR_REQUIRED() macro and the unused sl_irq member.
2012-09-10 18:49:49 +00:00
David Xu
e31eb35c3f regen. 2012-08-17 02:47:16 +00:00
David Xu
d65f1abca7 Implement syscall clock_getcpuclockid2, so we can get a clock id
for process, thread or others we want to support.
Use the syscall to implement POSIX API clock_getcpuclock and
pthread_getcpuclockid.

PR:	168417
2012-08-17 02:26:31 +00:00
Konstantin Belousov
f9ca52f21d Regenerate. 2012-08-15 15:18:20 +00:00
Konstantin Belousov
1a5fe89842 Provide 32bit compat for truncate(2) and ftruncate(2).
MFC after:	1 week
2012-08-15 15:17:56 +00:00
Konstantin Belousov
a8d4becf02 Regenerate. 2012-08-14 12:09:36 +00:00
Konstantin Belousov
f90fabce5a Implement the old mmap syscall for compat32, when COMPAT_43 option is
enabled. The syscall is used by FreeBSD 1.1.5.1 dynamic linker.

MFC after:	1 week
2012-08-14 12:09:09 +00:00
Konstantin Belousov
481af8b933 Cosmetics: define FREEBSD32_MINUSER and AOUT32_MINUSER for struct
sysentvec .sv_minuser. Also improve style.

Submitted by:	Oliver Pinter <oliver.pinter@gmail.com>
MFC after:	1 week
2012-07-22 13:41:45 +00:00
Konstantin Belousov
c5c1199c83 Extend the KPI to lock and unlock f_offset member of struct file. It
now fully encapsulates all accesses to f_offset, and extends f_offset
locking to other consumers that need it, in particular, to lseek() and
variants of getdirentries().

Ensure that on 32bit architectures f_offset, which is 64bit quantity,
always read and written under the mtxpool protection. This fixes
apparently easy to trigger race when parallel lseek()s or lseek() and
read/write could destroy file offset.

The already broken ABI emulations, including iBCS and SysV, are not
converted (yet).

Tested by:	pho
No objections from:	jhb
MFC after:    3 weeks
2012-07-02 21:01:03 +00:00
Kevin Lo
544c5e5b53 Make sure that each va_start has one and only one matching va_end,
especially in error cases.
2012-05-29 01:48:06 +00:00
Konstantin Belousov
371778a333 Fix ki_cow for compat32 binaries.
MFC after:	3 days
2012-05-27 05:24:53 +00:00
Ed Schouten
4412ad4887 Regenerate system call tables. 2012-05-25 21:52:57 +00:00
Ed Schouten
520b6a84f6 Remove use of non-ISO-C integer types from system call tables.
These files already use ISO-C-style integer types, so make them less
inconsistent by preferring the standard types.
2012-05-25 21:50:48 +00:00
Gleb Kurtsou
76dcec5d09 Add kern_fhstat(), adjust sys_fhstat() to use it.
Extend kern_getdirentries() to accept uio segflag and optionally return
buffer residue.

Sponsored by:	Google Summer of Code 2011
2012-05-24 08:00:26 +00:00
Alexander Leidinger
19e252baeb - >500 static DTrace probes for the linuxulator
- DTrace scripts to check for errors, performance, ...
  they serve mostly as examples of what you can do with the static probe;s
  with moderate load the scripts may be overwhelmed, excessive lock-tracing
  may influence program behavior (see the last design decission)

Design decissions:
 - use "linuxulator" as the provider for the native bitsize; add the
   bitsize for the non-native emulation (e.g. "linuxuator32" on amd64)
 - Add probes only for locks which are acquired in one function and released
   in another function. Locks which are aquired and released in the same
   function should be easy to pair in the code, inter-function
   locking is more easy to verify in DTrace.
 - Probes for locks should be fired after locking and before releasing to
   prevent races (to provide data/function stability in DTrace, see the
   man-page of "dtrace -v ..." and the corresponding DTrace docs).
2012-05-05 19:42:38 +00:00
Jung-uk Kim
d69a426fce - Implement pipe2 syscall for Linuxulator. This syscall appeared in 2.6.27
but GNU libc used it without checking its kernel version, e. g., Fedora 10.
- Move pipe(2) implementation for Linuxulator from MD files to MI file,
sys/compat/linux/linux_file.c.  There is no MD code for this syscall at all.
- Correct an argument type for pipe() from l_ulong * to l_int *.  Probably
this was the source of MI/MD confusion.

Reviewed by:	emulation
2012-04-16 21:22:02 +00:00
Tijl Coosemans
a5e3a5a56f Remove some unnecessary includes. 2012-03-18 19:15:11 +00:00
Tijl Coosemans
6e310b206f Eliminate ia32_reg.h by moving its contents to x86 and ia64 reg.h.
Reviewed by:	kib
2012-03-18 19:12:11 +00:00
Tijl Coosemans
01cd19680d Copy i386 reg.h to x86 and merge with amd64 reg.h. Replace i386/amd64/pc98
reg.h with stubs.

The tREGISTER macros are only made visible on i386. These macros are
deprecated and should not be available on amd64.

The i386 and amd64 versions of struct reg have been renamed to struct
__reg32 and struct __reg64. During compilation either __reg32 or __reg64
is defined as reg depending on the machine architecture. On amd64 the i386
struct is also available as struct reg32 which is used in COMPAT_FREEBSD32
code.

Most of compat/ia32/ia32_reg.h is now IA64 only.

Reviewed by:	kib (previous version)
2012-03-18 19:06:38 +00:00
Tijl Coosemans
786645078b Move userland bits of i386 npx.h and amd64 fpu.h to x86 fpu.h.
Remove FPU types from compat/ia32/ia32_reg.h that are no longer needed.
Create machine/npx.h on amd64 to allow compiling i386 code that uses
this header.

The original npx.h and fpu.h define struct envxmm differently. Both
definitions have been included in the new x86 header as struct __envxmm32
and struct __envxmm64. During compilation either __envxmm32 or __envxmm64
is defined as envxmm depending on machine architecture. On amd64 the i386
struct is also available as struct envxmm32.

Reviewed by:	kib
2012-03-16 20:24:30 +00:00
Rebecca Cran
03225fac13 Fix race condition in KfRaiseIrql().
After getting the current irql, if the kthread gets preempted and
subsequently runs on a different CPU, the saved irql could be wrong.

Also, correct the panic string.

PR:		kern/165630
Submitted by:	Vladislav Movchan <vladislav.movchan at gmail.com>
2012-03-04 17:08:43 +00:00
Juli Mallett
a6d20bbaa2 On MIPS, _ALIGN always aligns to 8 bytes, even for 32-bit binaries. This might
not be ideal, but is the ABI we've shipped so far.  Fix macros which reflect
the results of _ALIGN on 32-bit MIPS to use the right alignment.

This fixes sendmsg under COMPAT_FREEBSD32 on n64 MIPS kernels.
2012-03-03 21:39:12 +00:00
Juli Mallett
9624d94701 o) Add COMPAT_FREEBSD32 support for MIPS kernels using the n64 ABI with userlands
using the o32 ABI.  This mostly follows nwhitehorn's lead in implementing
   COMPAT_FREEBSD32 on powerpc64.
o) Add a new type to the freebsd32 compat layer, time32_t, which is time_t in the
   32-bit ABI being used.  Since the MIPS port is relatively-new, even the 32-bit
   ABIs use a 64-bit time_t.
o) Because time{spec,val}32 has the same size and layout as time{spec,val} on MIPS
   with 32-bit compatibility, then, disable some code which assumes otherwise
   wrongly when built for MIPS.  A more general macro to check in this case would
   seem like a good idea eventually.  If someone adds support for using n32
   userland with n64 kernels on MIPS, then they will have to add a variety of
   flags related to each piece of the ABI that can vary.  That's probably the
   right time to generalize further.
o) Add MIPS to the list of architectures which use PAD64_REQUIRED in the
   freebsd32 compat code.  Probably this should be generalized at some point.

Reviewed by:	gonzo
2012-03-03 08:19:18 +00:00
Martin Matuska
41c0675e6e Add procfs to jail-mountable filesystems.
Reviewed by:	jamie
MFC after:	1 week
2012-02-29 00:30:18 +00:00
Konstantin Belousov
526d0bd547 Fix found places where uio_resid is truncated to int.
Add the sysctl debug.iosize_max_clamp, enabled by default. Setting the
sysctl to zero allows to perform the SSIZE_MAX-sized i/o requests from
the usermode.

Discussed with:	bde, das (previous versions)
MFC after:	1 month
2012-02-21 01:05:12 +00:00
Konstantin Belousov
3494f31ad2 Fix misuse of the kernel map in miscellaneous image activators.
Vnode-backed mappings cannot be put into the kernel map, since it is a
system map.

Use exec_map for transient mappings, and remove the mappings with
kmem_free_wakeup() to notify the waiters on available map space.

Do not map the whole executable into KVA at all to copy it out into
usermode.  Directly use vn_rdwr() for the case of not page aligned
binary.

There is one place left where the potentially unbounded amount of data
is mapped into exec_map, namely, in the COFF image activator
enumeration of the needed shared libraries.

Reviewed by:   alc
MFC after:     2 weeks
2012-02-17 23:47:16 +00:00
Ed Schouten
7870adb640 Remove direct access to si_name.
Code should just use the devtoname() function to obtain the name of a
character device. Also add const keywords to pieces of code that need it
to build properly.

MFC after:	2 weeks
2012-02-10 12:35:57 +00:00
David Xu
d56e058a79 Add 32-bit compat code for AIO kevent flags introduced in revision 230857. 2012-02-05 04:49:31 +00:00
Konstantin Belousov
8c6f8f3d5b Add support for the extended FPU states on amd64, both for native
64bit and 32bit ABIs.  As a side-effect, it enables AVX on capable
CPUs.

In particular:

- Query the CPU support for XSAVE, list of the supported extensions
  and the required size of FPU save area. The hw.use_xsave tunable is
  provided for disabling XSAVE, and hw.xsave_mask may be used to
  select the enabled extensions.

- Remove the FPU save area from PCB and dynamically allocate the
  (run-time sized) user save area on the top of the kernel stack,
  right above the PCB. Reorganize the thread0 PCB initialization to
  postpone it after BSP is queried for save area size.

- The dumppcb, stoppcbs and susppcbs now do not carry the FPU state as
  well. FPU state is only useful for suspend, where it is saved in
  dynamically allocated suspfpusave area.

- Use XSAVE and XRSTOR to save/restore FPU state, if supported and
  enabled.

- Define new mcontext_t flag _MC_HASFPXSTATE, indicating that
  mcontext_t has a valid pointer to out-of-struct extended FPU
  state. Signal handlers are supplied with stack-allocated fpu
  state. The sigreturn(2) and setcontext(2) syscall honour the flag,
  allowing the signal handlers to inspect and manipilate extended
  state in the interrupted context.

- The getcontext(2) never returns extended state, since there is no
  place in the fixed-sized mcontext_t to place variable-sized save
  area. And, since mcontext_t is embedded into ucontext_t, makes it
  impossible to fix in a reasonable way.  Instead of extending
  getcontext(2) syscall, provide a sysarch(2) facility to query
  extended FPU state.

- Add ptrace(2) support for getting and setting extended state; while
  there, implement missed PT_I386_{GET,SET}XMMREGS for 32bit binaries.

- Change fpu_kern KPI to not expose struct fpu_kern_ctx layout to
  consumers, making it opaque. Internally, struct fpu_kern_ctx now
  contains a space for the extended state. Convert in-kernel consumers
  of fpu_kern KPI both on i386 and amd64.

First version of the support for AVX was submitted by Tim Bird
<tim.bird am sony com> on behalf of Sony. This version was written
from scratch.

Tested by:	pho (previous version), Yamagi Burmeister <lists yamagi org>
MFC after:	1 month
2012-01-21 17:45:27 +00:00
Kirk McKusick
cc672d3599 Make sure all intermediate variables holding mount flags (mnt_flag)
and that all internal kernel calls passing mount flags are declared
as uint64_t so that flags in the top 32-bits are not lost.

MFC after: 2 weeks
2012-01-17 01:08:01 +00:00
Mikolaj Golub
fe7f89b71a Abrogate nchr argument in proc_getargv() and proc_getenvv(): we always want
to read strings completely to know the actual size.

As a side effect it fixes the issue with kern.proc.args and kern.proc.env
sysctls, which didn't return the size of available data when calling
sysctl(3) with the NULL argument for oldp.

Note, in get_ps_strings(), which does actual work for proc_getargv() and
proc_getenvv(), we still have a safety limit on the size of data read in
case of a corrupted procces stack.

Suggested by:	kib
MFC after:	3 days
2012-01-15 18:47:24 +00:00
Ulrich Spörlein
9a14aa017b Convert files to UTF-8 2012-01-15 13:23:18 +00:00
Dimitry Andric
69ee3e2f52 In sys/compat/linux/linux_ioctl.c, work around a warning when a pointer
is compared to an integer, by casting the pointer to l_uintptr_t.  No
functional difference on both i386 and amd64.

Reviewed by:	ed, jhb
MFC after:	1 week
2012-01-03 18:49:39 +00:00
Dimitry Andric
84143cee4f In sys/compat/ndis/subr_ntoskrnl.c, change the RtlFillMemory function
definition from K&R to ANSI, to avoid a clang warning about the uint8_t
parameter being promoted to int, which is not compatible with the type
declared in the earlier prototype.

MFC after:	1 week
2011-12-30 17:18:09 +00:00
John Baldwin
dd01579cde Implement linux_fadvise64() and linux_fadvise64_64() using
kern_posix_fadvise().

Reviewed by:	silence on emulation@
MFC after:	2 weeks
2011-12-29 15:34:59 +00:00
Mikolaj Golub
022eba3410 Protect process environment variables with p_candebug().
Discussed with:	jilles, kib, rwatson
MFC after:	2 weeks
2011-12-04 21:43:13 +00:00
Mikolaj Golub
7a837e17ba Retire linprocfs_doargv(). Instead use new functions, proc_getargv()
and proc_getenvv(), which were implemented using linprocfs_doargv() as
a reference.

Suggested by:	kib
Reviewed by:	kib
Approved by:	des (linprocfs maintainer)
MFC after:	2 weeks
2011-11-22 20:45:11 +00:00
Lawrence Stewart
cf13a58510 - Add the ffclock_getcounter(), ffclock_getestimate() and ffclock_setestimate()
system calls to provide feed-forward clock management capabilities to
  userspace processes. ffclock_getcounter() returns the current value of the
  kernel's feed-forward clock counter. ffclock_getestimate() returns the current
  feed-forward clock parameter estimates and ffclock_setestimate() updates the
  feed-forward clock parameter estimates.

- Document the syscalls in the ffclock.2 man page.

- Regenerate the script-derived syscall related files.

Committed on behalf of Julien Ridoux and Darryl Veitch from the University of
Melbourne, Australia, as part of the FreeBSD Foundation funded "Feed-Forward
Clock Synchronization Algorithms" project.

For more information, see http://www.synclab.org/radclock/

Submitted by:	Julien Ridoux (jridoux at unimelb edu au)
2011-11-21 01:26:10 +00:00
Ed Schouten
767a32641c Make the Linux *at() calls a bit more complete.
Properly support:

- AT_EACCESS for faccessat(),
- AT_SYMLINK_FOLLOW for linkat().
2011-11-19 07:19:37 +00:00
Ed Schouten
51cfb9474f Regenerate system call tables. 2011-11-19 06:36:11 +00:00
Ed Schouten
d3a993d46b Improve *access*() parameter name consistency.
The current code mixes the use of `flags' and `mode'. This is a bit
confusing, since the faccessat() function as a `flag' parameter to store
the AT_ flag.

Make this less confusing by using the same name as used in the POSIX
specification -- `amode'.
2011-11-19 06:35:15 +00:00
John Baldwin
7edec6214e - Split out a kern_posix_fadvise() from the posix_fadvise() system call so
it can be used by in-kernel consumers.
- Make kern_posix_fallocate() public.
- Use kern_posix_fadvise() and kern_posix_fallocate() to implement the
  freebsd32 wrappers for the two system calls.
2011-11-14 18:00:15 +00:00
Sergey Kandaurov
81f7f2c4db struct timespec32: change types of tv_sec and tv_nsec fields to signed
to match native struct timespec ABI on __LP32__.

This change is a prerequisite for upcoming futimens()/utimensat() in whose
implementations it is assumed that timespec32 can take a negative value.

MFC after:	1 week
2011-11-11 07:17:00 +00:00
Ryan Stone
493b584dbd Correct the types of the arguments to return probes of the syscall
provider.  Previously we were erroneously supplying the argument types of
the corresponding entry probe.

Reviewed by:	rpaulo
MFC after:	1 week
2011-11-11 03:49:42 +00:00
Ed Schouten
6472ac3d8a Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.
The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.
2011-11-07 15:43:11 +00:00
Ed Schouten
d745c852be Mark MALLOC_DEFINEs static that have no corresponding MALLOC_DECLAREs.
This means that their use is restricted to a single C file.
2011-11-07 06:44:47 +00:00
John Baldwin
cd06ae5c1b Regen. 2011-11-04 04:06:31 +00:00
John Baldwin
936c09ac0f Add the posix_fadvise(2) system call. It is somewhat similar to
madvise(2) except that it operates on a file descriptor instead of a
memory region.  It is currently only supported on regular files.

Just as with madvise(2), the advice given to posix_fadvise(2) can be
divided into two types.  The first type provide hints about data access
patterns and are used in the file read and write routines to modify the
I/O flags passed down to VOP_READ() and VOP_WRITE().  These modes are
thus filesystem independent.  Note that to ease implementation (and
since this API is only advisory anyway), only a single non-normal
range is allowed per file descriptor.

The second type of hints are used to hint to the OS that data will or
will not be used.  These hints are implemented via a new VOP_ADVISE().
A default implementation is provided which does nothing for the WILLNEED
request and attempts to move any clean pages to the cache page queue for
the DONTNEED request.  This latter case required two other changes.
First, a new V_CLEANONLY flag was added to vinvalbuf().  This requests
vinvalbuf() to only flush clean buffers for the vnode from the buffer
cache and to not remove any backing pages from the vnode.  This is
used to ensure clean pages are not wired into the buffer cache before
attempting to move them to the cache page queue.  The second change adds
a new vm_object_page_cache() method.  This method is somewhat similar to
vm_object_page_remove() except that instead of freeing each page in the
specified range, it attempts to move clean pages to the cache queue if
possible.

To preserve the ABI of struct file, the f_cdevpriv pointer is now reused
in a union to point to the currently active advice region if one is
present for regular files.

Reviewed by:	jilles, kib, arch@
Approved by:	re (kib)
MFC after:	1 month
2011-11-04 04:02:50 +00:00
Konstantin Belousov
126b36a21e Control the execution permission of the readable segments for
i386 binaries on the amd64 and ia64 with the sysctl, instead of
unconditionally enabling it.

Reviewed by:	marcel
2011-10-15 12:35:18 +00:00
John Baldwin
660d55fb88 Regen. 2011-10-14 11:47:14 +00:00
John Baldwin
f940342426 Use PAIR32TO64() for the offset and length parameters to
freebsd32_posix_fallocate() to properly handle big-endian platforms.

Reviewed by:	mdf
MFC after:	1 week
2011-10-14 11:46:46 +00:00
Marcel Moolenaar
90fd594a1d Use PTRIN(). 2011-10-13 22:33:03 +00:00
Marcel Moolenaar
f8244106ab Wrap mprotect(2) so that we can add execute permissions when read
permissions are requested. This is needed on amd64 and ia64 for
JDK 1.4.x
2011-10-13 18:25:10 +00:00
Marcel Moolenaar
eb8afcd36a Wrap mprotect(2) 2011-10-13 18:21:11 +00:00
Marcel Moolenaar
488a16050c In freebsd32_mmap() and when compiling for amd64 or ia64, also
ask for execute permissions when read permissions are wanted.
This is needed for JDK 1.4.x on i386.
2011-10-13 18:18:42 +00:00
Christian Brueffer
796fa5e465 Add curly braces missed in r226247.
Pointy hat to:	brueffer
Submitted by:	many
MFC after:	1 week
2011-10-11 13:40:37 +00:00
Christian Brueffer
05ad7ad667 Properly free linux_gidset in case of an error.
CID:		4136
Found with:	Coverity Prevent(tm)
MFC after:	1 week
2011-10-11 10:32:23 +00:00
Jung-uk Kim
bf3a36cc7b Use the caculated length instead of maximum length. 2011-10-06 21:55:05 +00:00
Jung-uk Kim
b6f96462ba Remove a now-defunct variable. 2011-10-06 21:40:08 +00:00
Jung-uk Kim
3106d6704b Use uint32_t instead of u_int32_t. Fix style(9) nits. 2011-10-06 21:17:46 +00:00
Jung-uk Kim
c02637c717 Make sure to ignore the leading NULL byte from Linux abstract namespace. 2011-10-06 21:09:28 +00:00
Jung-uk Kim
f05531a392 Restore the original socket address length if it was not really AF_INET6. 2011-10-06 20:48:23 +00:00
Jung-uk Kim
43399111a7 Retern more appropriate errno when Linux path name is too long. 2011-10-06 20:28:08 +00:00
Jung-uk Kim
0007f669ca Inline do_sa_get() function and remove an unused return value. 2011-10-06 20:20:30 +00:00
Jung-uk Kim
c15cdbf2f3 Unroll inlined strnlen(9) and make it easier to read. No functional change. 2011-10-06 19:59:14 +00:00
Colin Percival
5da3eb94fc Fix a bug in UNIX socket handling in the linux emulator which was
exposed by the security fix in FreeBSD-SA-11:05.unix.

Approved by:	so (cperciva)
Approved by:	re (kib)
Security:	Related to FreeBSD-SA-11:05.unix, but not actually
		a security fix.
2011-10-04 19:07:38 +00:00
Kip Macy
9eca9361f9 Auto-generated code from sys_ prefixing makesyscalls.sh change
Approved by:	re(bz)
2011-09-16 14:04:14 +00:00
Kip Macy
8451d0dd78 In order to maximize the re-usability of kernel code in user space this
patch modifies makesyscalls.sh to prefix all of the non-compatibility
calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel
entry points and all places in the code that use them. It also
fixes an additional name space collision between the kernel function
psignal and the libc function of the same name by renaming the kernel
psignal kern_psignal(). By introducing this change now we will ease future
MFCs that change syscalls.

Reviewed by:	rwatson
Approved by:	re (bz)
2011-09-16 13:58:51 +00:00
Jonathan Anderson
cfb5f76865 Add experimental support for process descriptors
A "process descriptor" file descriptor is used to manage processes
without using the PID namespace. This is required for Capsicum's
Capability Mode, where the PID namespace is unavailable.

New system calls pdfork(2) and pdkill(2) offer the functional equivalents
of fork(2) and kill(2). pdgetpid(2) allows querying the PID of the remote
process for debugging purposes. The currently-unimplemented pdwait(2) will,
in the future, allow querying rusage/exit status. In the interim, poll(2)
may be used to check (and wait for) process termination.

When a process is referenced by a process descriptor, it does not issue
SIGCHLD to the parent, making it suitable for use in libraries---a common
scenario when using library compartmentalisation from within large
applications (such as web browsers). Some observers may note a similarity
to Mach task ports; process descriptors provide a subset of this behaviour,
but in a UNIX style.

This feature is enabled by "options PROCDESC", but as with several other
Capsicum kernel features, is not enabled by default in GENERIC 9.0.

Reviewed by: jhb, kib
Approved by: re (kib), mentor (rwatson)
Sponsored by: Google Inc
2011-08-18 22:51:30 +00:00
Robert Watson
a9d2f8d84f Second-to-last commit implementing Capsicum capabilities in the FreeBSD
kernel for FreeBSD 9.0:

Add a new capability mask argument to fget(9) and friends, allowing system
call code to declare what capabilities are required when an integer file
descriptor is converted into an in-kernel struct file *.  With options
CAPABILITIES compiled into the kernel, this enforces capability
protection; without, this change is effectively a no-op.

Some cases require special handling, such as mmap(2), which must preserve
information about the maximum rights at the time of mapping in the memory
map so that they can later be enforced in mprotect(2) -- this is done by
narrowing the rights in the existing max_protection field used for similar
purposes with file permissions.

In namei(9), we assert that the code is not reached from within capability
mode, as we're not yet ready to enforce namespace capabilities there.
This will follow in a later commit.

Update two capability names: CAP_EVENT and CAP_KEVENT become
CAP_POST_KEVENT and CAP_POLL_KEVENT to more accurately indicate what they
represent.

Approved by:	re (bz)
Submitted by:	jonathan
Sponsored by:	Google Inc
2011-08-11 12:30:23 +00:00
Konstantin Belousov
dda4f96087 Implement the linprocfs swaps file, providing information about the
configured swap devices in the Linux-compatible format.

Based on the submission by:	Robert Millan <rmh debian org>
PR:	kern/159281
Reviewed by:	bde
Approved by:	re (kensmith)
MFC after:	2 weeks
2011-08-01 19:12:15 +00:00
Bjoern A. Zeeb
925af54487 Rename ki_ocomm to ki_tdname and OCOMMLEN to TDNAMLEN.
Provide backward compatibility defines under BURN_BRIDGES.

Suggested by:	jhb
Reviewed by:	emaste
Sponsored by:	Sandvine Incorporated
Approved by:	re (kib)
2011-07-18 20:06:15 +00:00
Dmitry Morozovsky
6edeb2c725 Correct small typo in a do{}while(0) define
Approved by:	kib
MFC after:	2 weeks
2011-07-17 17:12:17 +00:00
Bjoern A. Zeeb
74d7a2539e Remove the 'either' from the comment as it'll be less obvious that we
removed semmap in a bit of time from now.   Re-wrap.

Suggested by:	jhb
2011-07-17 05:33:22 +00:00
Jonathan Anderson
63ba8b5ede Auto-generated system call code with cap_new(), cap_getrights().
Approved by: mentor (rwatson), re (Capsicum blanket)
Sponsored by: Google Inc
2011-07-15 18:33:12 +00:00
Jonathan Anderson
cfb9df5541 Add cap_new() and cap_getrights() system calls.
Implement two previously-reserved Capsicum system calls:
- cap_new() creates a capability to wrap an existing file descriptor
- cap_getrights() queries the rights mask of a capability.

Approved by: mentor (rwatson), re (Capsicum blanket)
Sponsored by: Google Inc
2011-07-15 18:26:19 +00:00
Bjoern A. Zeeb
1080a2c85d Remove semaphore map entry count "semmap" field and its tuning
option that is highly recommended to be adjusted in too much
documentation while doing nothing in FreeBSD since r2729 (rev 1.1).

ipcs(1) needs to be recompiled as it is accessing _KERNEL private
variables.

Reviewed by:	jhb (before comment change on linux code)
Sponsored by:	Sandvine Incorporated
2011-07-14 14:18:14 +00:00
Sergey Kandaurov
e0607dec4d Return empty cmdline/environ string for processes with kernel address
space. This is consistent with the behavior in linux.

PR:		kern/157871
Reported by:	Petr Salinger <Petr Salinger att seznam cz>
Verified on:	GNU/kFreeBSD debian 8.2-1-amd64 (by reporter)
Reviewed by:	kib (some time ago)
MFC after:	2 weeks
2011-06-17 07:30:56 +00:00
Konstantin Belousov
160e5953c2 Regen. 2011-06-16 22:06:35 +00:00
Konstantin Belousov
7181217886 Implement compat32 for old lseek, for the a.out binaries on amd64. 2011-06-16 22:05:56 +00:00
Alexander Leidinger
f4cb7c85e6 Commit the missing linux_videdev2_compat.h (lost somewhere between
commit tree patch generation -> successful compile tree build test -> commmit).

Pointy hat to:	netchild
2011-05-04 13:09:20 +00:00
Alexander Leidinger
60c6d23685 Add FEATURE macros for v4l and v4l2 to the linuxulator.
Suggested by:	ae
2011-05-04 09:52:34 +00:00
Alexander Leidinger
15bf9014c9 This is v4l2 support for the linuxulator. This allows to access FreeBSD
native devices which support the v4l2 API from processes running within
the linuxulator, e.g. skype or flash can access the multimedia/pwcbsd
or multimedia/webcamd supplied drivers.

Submitted by:	nox
MFC after:	1 month
2011-05-04 09:05:39 +00:00
Alexander Leidinger
4c94038794 Fix typo in comment, improve comment. 2011-05-04 08:42:31 +00:00
Alexander Leidinger
d0f5ca6d40 Add explanation about the use-permission and FreeBSDify it. 2011-05-04 08:41:55 +00:00
Alexander Leidinger
41ebeb8e6f Copy the v4l2 header unchanged from the vendor branch. 2011-05-04 08:31:58 +00:00
Matthew D Fleming
7323776b01 Regen. 2011-04-18 16:32:47 +00:00
Matthew D Fleming
d91f88f7f3 Add the posix_fallocate(2) syscall. The default implementation in
vop_stdallocate() is filesystem agnostic and will run as slow as a
read/write loop in userspace; however, it serves to correctly
implement the functionality for filesystems that do not implement a
VOP_ALLOCATE.

Note that __FreeBSD_version was already bumped today to 900036 for any
ports which would like to use this function.

Also reserve space in the syscall table for posix_fadvise(2).

Reviewed by:	-arch (previous version)
2011-04-18 16:32:22 +00:00
Edward Tomasz Napierala
c21e81c87e Remove stray semicolon. 2011-04-10 10:15:49 +00:00
Jung-uk Kim
3453537fa5 Use atomic load & store for TSC frequency. It may be overkill for amd64 but
safer for i386 because it can be easily over 4 GHz now.  More worse, it can
be easily changed by user with 'machdep.tsc_freq' tunable (directly) or
cpufreq(4) (indirectly).  Note it is intentionally not used in performance
critical paths to avoid performance regression (but we should, in theory).
Alternatively, we may add "virtual TSC" with lower frequency if maximum
frequency overflows 32 bits (and ignore possible incoherency as we do now).
2011-04-07 23:28:28 +00:00
Edward Tomasz Napierala
1ba5ad4210 Add accounting for most of the memory-related resources.
Sponsored by:	The FreeBSD Foundation
Reviewed by:	kib (earlier version)
2011-04-05 20:23:59 +00:00
Konstantin Belousov
78272eedf0 Implement compat32 shims for PCIOCGETCONF.
There is a generic problem with the shims for ioctls that receive
pointers to the usermode data areas in the data argument. We either have
to modify the handler to accept UIO_USERSPACE/UIO_SYSSPACE indicator, or
allocate and fill a usermode memory for data buffer in the host format.
The change goes the second route, in particular because we do not need
to modify the handler.

Submitted by:	John Wehle <john feith com>
MFC after:	2 weeks
2011-04-02 16:02:25 +00:00
Konstantin Belousov
4ee107ddc9 Provide the structures and ioctl number definition for handling
PCIOCGETCONF compat32.

Submitted by:	John Wehle <john feith com>
MFC after:	2 weeks
2011-04-02 15:47:23 +00:00
Konstantin Belousov
c3b2c0520b Regen 2011-04-01 11:16:53 +00:00
Konstantin Belousov
7332c129e0 Add support for executing the FreeBSD 1/i386 a.out binaries on amd64.
In particular:
- implement compat shims for old stat(2) variants and ogetdirentries(2);
- implement delivery of signals with ancient stack frame layout and
  corresponding sigreturn(2);
- implement old getpagesize(2);
- provide a user-mode trampoline and LDT call gate for lcall $7,$0;
- port a.out image activator and connect it to the build as a module
  on amd64.

The changes are hidden under COMPAT_43.

MFC after:   1 month
2011-04-01 11:16:29 +00:00
Andriy Gapon
a930718af1 Revert r220032:linux compat: add SO_PASSCRED option with basic handling
I have not properly thought through the commit.  After r220031 (linux
compat: improve and fix sendmsg/recvmsg compatibility) the basic
handling for SO_PASSCRED is not sufficient as it breaks recvmsg
functionality for SCM_CREDS messages because now we would need to handle
sockcred data in addition to cmsgcred.  And that is not implemented yet.

Pointyhat to:	avg
2011-03-31 08:14:51 +00:00
Edward Tomasz Napierala
66db16fc49 Regenerate. 2011-03-30 17:59:54 +00:00
Edward Tomasz Napierala
ec125fbbc5 Add rctl. It's used by racct to take user-configurable actions based
on the set of rules it maintains and the current resource usage.  It also
privides userland API to manage that ruleset.

Sponsored by:	The FreeBSD Foundation
Reviewed by:	kib (earlier version)
2011-03-30 17:48:15 +00:00
Konstantin Belousov
eea5c61835 Regen. 2011-03-30 14:46:55 +00:00
Konstantin Belousov
8666550972 Provide compat32 shims for kldstat(2).
Requested and tested by:	jpaetzel
MFC after:	1 week
2011-03-30 14:46:12 +00:00
Andriy Gapon
01a9e1a11b linux compat: add SO_PASSCRED option with basic handling
This seems to have been a part of a bigger patch by dchagin that either
haven't been committed or committed partially.

Submitted by:	dchagin, nox
MFC after:	2 weeks
2011-03-26 11:25:36 +00:00
Andriy Gapon
605da56bc3 linux compat: improve and fix sendmsg/recvmsg compatibility
- implement baseic stubs for capget, capset, prctl PR_GET_KEEPCAPS
  and prctl PR_SET_KEEPCAPS.
- add SCM_CREDS support to sendmsg and recvmsg
- modify sendmsg to ignore control messages if not using UNIX
  domain sockets

This should allow linux pulse audio daemon and client work on FreeBSD
and interoperate with native counter-parts modulo the differences in
pulseaudio versions.

PR:		kern/149168
Submitted by:	John Wehle <john@feith.com>
Reviewed by:	netchild
MFC after:	2 weeks
2011-03-26 11:05:53 +00:00
Konstantin Belousov
bfac1583db Implement compat32 MEMRANGE_GET and MEMRANGE_SET. This is needed to
run 32bit Xorg server with VESA driver.

Submitted by:	John Wehle <john feith com>
MFC after:	1 week
2011-03-25 11:52:31 +00:00
Konstantin Belousov
bc13c742fa Fully emulate MDIOCLIST for compat32.
MFC after:	1 week
2011-03-25 11:43:49 +00:00
Konstantin Belousov
f991044ba4 Remove unneccessary panics, that can be easily triggered by user.
The copyin() function handles NULL as well as any other pointer.

MFC after:	3 days
2011-03-25 11:05:28 +00:00
Konstantin Belousov
1e67ebb1a2 Fix file leakage in the freebsd32_ioctl routines.
Code inspection shows freebsd32_ioctl calls fget for a fd and calls
a subroutine to handle each specific ioctl.  It is expected that the
subroutine will call fdrop when done.  However many of the subroutines
will exit out early if copyin encounters an error resulting in fdrop
never being called.

Submitted by:	John Wehle <john feith com>
MFC after:	3 days
2011-03-25 10:57:57 +00:00
John Baldwin
8e6fa660f2 Fix some locking nits with the p_state field of struct proc:
- Hold the proc lock while changing the state from PRS_NEW to PRS_NORMAL
  in fork to honor the locking requirements.  While here, expand the scope
  of the PROC_LOCK() on the new process (p2) to avoid some LORs.  Previously
  the code was locking the new child process (p2) after it had locked the
  parent process (p1).  However, when locking two processes, the safe order
  is to lock the child first, then the parent.
- Fix various places that were checking p_state against PRS_NEW without
  having the process locked to use PROC_LOCK().  Every place was already
  locking the process, just after the PRS_NEW check.
- Remove or reduce the use of PROC_SLOCK() for places that were checking
  p_state against PRS_NEW.  The PROC_LOCK() alone is sufficient for reading
  the current state.
- Reorder fill_kinfo_proc() slightly so it only acquires PROC_SLOCK() once.

MFC after:	1 week
2011-03-24 18:40:11 +00:00
Alexander Leidinger
0d7b5e545c Staticize functions which are not used somewhere else, move the
corresponding prototypes from the header to the code file.
2011-03-15 13:40:47 +00:00
Andriy Gapon
d549ef5638 add DTrace systrace support for linux32 and freebsd32 on amd64 syscalls
Regenerate system call and systrace support files.

PR:		kern/152822
Submitted by:	Artem Belevich <fbsdlist@src.cx>
Reviewed by:	jhb (earlier version)
MFC after:	3 weeks
2011-03-12 08:58:19 +00:00
Andriy Gapon
56ede1074e add DTrace systrace support for linux32 and freebsd32 on amd64 syscalls
This commits makes necessary changes in syscall/sysent generation
infrastructure.

PR:		kern/152822
Submitted by:	Artem Belevich <fbsdlist@src.cx>
Reviewed by:	jhb (ealier version)
MFC after:	3 weeks
2011-03-12 08:51:43 +00:00
Dmitry Chagin
31f7ad1545 Style(9) fixes. No functional changes.
MFC after:	2 Week
2011-03-12 07:47:05 +00:00
John Baldwin
c28a98e948 Remove now-obsolete comment.
Submitted by:	netchild
MFC after:	1 week
2011-03-10 19:50:12 +00:00
Jung-uk Kim
e09faf083f Remove custom interrupt dispatcher. This is a pointless micro-optimization
and it may cause problems if SS and SP are modified by real-mode code.

MFC after:	1 month
2011-03-09 16:16:38 +00:00
Dmitry Chagin
a2cd91cf28 Indeed, remove bogus since r219405 check of the Linux ABI.
Pointed out:	jhb

MFC after:	2 Week
2011-03-09 05:59:33 +00:00
Dmitry Chagin
e5d81ef1b5 Extend struct sysvec with new method sv_schedtail, which is used for an
explicit process at fork trampoline path instead of eventhadler(schedtail)
invocation for each child process.

Remove eventhandler(schedtail) code and change linux ABI to use newly added
sysvec method.

While here replace explicit comparing of module sysentvec structure with the
newly created process sysentvec to detect the linux ABI.

Discussed with:	kib

MFC after:	2 Week
2011-03-08 19:01:45 +00:00
Edward Tomasz Napierala
7123f4cd6f Export login class information via kinfo and make it possible to view
it using "ps -o class".
2011-03-05 14:41:49 +00:00
Edward Tomasz Napierala
e776709347 Regenerate. 2011-03-05 12:46:24 +00:00
Edward Tomasz Napierala
2bfc50bc4f Add two new system calls, setloginclass(2) and getloginclass(2). This makes
it possible for the kernel to track login class the process is assigned to,
which is required for RCTL.  This change also make setusercontext(3) call
setloginclass(2) and makes it possible to retrieve current login class using
id(1).

Reviewed by:	kib (as part of a larger patch)
2011-03-05 12:40:35 +00:00
Dmitry Chagin
3a4bc25691 Print out shared flag for debug purpose.
MFC after:	1 Week
2011-03-03 18:29:55 +00:00
Dmitry Chagin
815cb72a0c Switch PROCESS_SHARE to AUTO_SHARE (as umtx do). Even for SHARED,
if page mapped MAP_ANON linux uses private algorithm too.

Disscussed with:	jhb

MFC after:	3 Days
2011-03-03 18:19:10 +00:00
Robert Watson
ddfe0c2ba4 Regenerate system call files following addition of cap_enter(2),
cap_getmode(2), and capabilities.conf.

Reviewed by:	anderson
Discussed with:	benl, kris, pjd
Obtained from:	Capsicum Project
Sponsored by:	Google, Inc.
MFC after:	3 months
2011-03-01 13:30:23 +00:00
Robert Watson
96fcc75fdf Add initial support for Capsicum's Capability Mode to the FreeBSD kernel,
compiled conditionally on options CAPABILITIES:

Add a new credential flag, CRED_FLAG_CAPMODE, which indicates that a
subject (typically a process) is in capability mode.

Add two new system calls, cap_enter(2) and cap_getmode(2), which allow
setting and querying (but never clearing) the flag.

Export the capability mode flag via process information sysctls.

Sponsored by:	Google, Inc.
Reviewed by:	anderson
Discussed with:	benl, kris, pjd
Obtained from:	Capsicum Project
MFC after:	3 months
2011-03-01 13:23:37 +00:00
Rebecca Cran
23d5a8b50f Use the cprd_mem field when setting the start and length for a memory
resource - the layout of cprd_port is identical but using cprd_mem
makes the code easier to understand.

PR:		kern/118493
Submitted by:	Weongyo Jeong <weongyo.jeong at gmail.com>
MFC after:	3 days
2011-02-23 21:45:28 +00:00
John Baldwin
21f8f506fb Use umtx_key objects to uniquely identify futexes. Private futexes in
different processes that happen to use the same user address in the
separate processes will now be treated as distinct futexes rather than the
same futex.  We can now honor shared futexes properly by mapping them to a
PROCESS_SHARED umtx_key.  Private futexes use THREAD_SHARED umtx_key
objects.

In conjunction with:	dchagin
Reviewed by:	kib
MFC after:	1 week
2011-02-23 13:23:28 +00:00
Rebecca Cran
6bccea7c2b Fix typos - remove duplicate "the".
PR:	bin/154928
Submitted by:	Eitan Adler <lists at eitanadler.com>
MFC after: 	3 days
2011-02-21 09:01:34 +00:00
Dmitry Chagin
f9e66923e5 Do not clobber %rdx.
Before calling vfork() syscall the linux user-space stores the current PID
in the %rdx and restore it when the parent process will leave the kernel.
2011-02-20 07:58:30 +00:00
Dmitry Chagin
09d6cb0a23 For realtime signals fill the sigval value. 2011-02-15 21:46:36 +00:00
Dmitry Chagin
f3481dd9ab Make a linux_rt_sigtimedwait() system call is actually working.
1) Translate the native signal number in the appropriate Linux signal.
2) Remove bogus code, which can lead to a panic as it calls
   kern_sigtimedwait with same ksiginfo.
3) Return the corresponding signal number.
2011-02-15 21:42:48 +00:00
Dmitry Chagin
8c50c56206 Style(9) fix. Wrap long lines in linux_rt_sigtimedwait(). 2011-02-15 21:24:50 +00:00
Dmitry Chagin
d207e753da Put the macro declaration in the relevant include file for future use. 2011-02-15 21:22:09 +00:00
Dmitry Chagin
e2ef00a426 Style(9) fix. Do not initialize variables in the declarations. 2011-02-14 17:24:58 +00:00
Dmitry Chagin
49fa1a745e Sort include files in the alphabetical order. 2011-02-13 20:07:48 +00:00
Dmitry Chagin
4ca49f41ec Remove comment about 'ftlk' LOR. 2011-02-13 18:46:34 +00:00
Dmitry Chagin
890c582fe5 Stop printing the LOR, as this is expected behavior. 2011-02-13 18:41:40 +00:00
Dmitry Chagin
7c3b05b99c The bitset field of freshly created futex should be initialized explicity.
Otherwise, REQUEUE operations fails.
2011-02-13 17:56:22 +00:00
Dmitry Chagin
d14cc07d07 Rename used_requeue and use it as bitwise field to store more flags.
Reimplement used_requeue logic with LINUX_XDEPR_REQUEUEOP flag.
2011-02-12 20:58:59 +00:00
Dmitry Chagin
cfa57401b0 Slightly rewrite linux_fork:
1) Remove bogus error checking.
2) A new process exit from kernel through fork_trampoline(),
   so remove bogus check.
2011-02-12 20:16:25 +00:00
Dmitry Chagin
9588e04dde Remove bogus include <machine/frame.h> 2011-02-12 19:14:57 +00:00
Dmitry Chagin
222198ab0b Move linux_clone(), linux_fork(), linux_vfork() to a MI path. 2011-02-12 18:17:12 +00:00
Alexander Leidinger
529844c77c Linux' shm_open() fails because it wants to find some funky shmfs
to construct the full pathname. It starts to search at the default
mountpoint which is /dev/shm. If this fails it runs through fstab
and searches for shmfs and tmpfs. Whatever it finds will be
statfs()'ed to be checked for Linux' fs magic for shmfs (0x01021994).

Ideally our tmpfs should deliver this fs magic to Linux processes, but
as our tmpfs is considered to be an experimental feature we can not
assume that there is always a tmpfs available.

To make shared memory work in the Linuxulator, force the fs type of
/dev/shm (which can be a symlink) to match what Linux expects. The user
is responsible (info has to be added to the linux base ports and the docs)
to setup a suitable link for /dev/shm.

Noticed by:	Andre Albsmeier <Andre.Albsmeier@siemens.com>
Submitted by:	Andre Albsmeier <Andre.Albsmeier@siemens.com>
MFC after:	1 month
2011-02-09 20:23:22 +00:00
Dmitry Chagin
78ec1867a2 Yet another unimplemented futex operation, print out about.
Submitted by:	arundel
MFC after:	1 month.
2011-01-31 06:06:23 +00:00
Dmitry Chagin
5163762354 Implement a futex BITSET op.
Submitted by:	arundel
MFC after:	1 month.
2011-01-31 05:59:05 +00:00
Bjoern A. Zeeb
63827f255c Update interface stats counters to match the current format in linux and
try to export as much information as we can match.

Requested on:	Debian GNU/kFreeBSD list (debian-bsd lists.debian.org) 2010-12
Tested by:	Mats Erik Andersson (mats.andersson gisladisker.se)
MFC after:	10 days
2011-01-31 00:09:52 +00:00
Dmitry Chagin
596ba1bd95 Style(9) fixes.
MFC after:	1 Month.
2011-01-28 19:04:15 +00:00
Dmitry Chagin
adc7ece00a Implement a variation of the linux_common_wait() which should
be used by linuxolator itself.

Move linux_wait4() to MD path as it requires native struct
rusage translation to struct l_rusage on linux32/amd64.

MFC after:	1 Month.
2011-01-28 18:47:07 +00:00
Dmitry Chagin
d908c2d2a2 Style(9) fix.
MFC after:	1 month.
2011-01-28 05:42:14 +00:00
Dmitry Chagin
a5c1afadeb Add macro to test the sv_flags of any process. Change some places to test
the flags instead of explicit comparing with address of known sysentvec
structures.

MFC after:	1 month
2011-01-26 20:03:58 +00:00
Dmitry Chagin
9a6a64d3c4 Style(9) fix.
Approved by:	kib(mentor)
MFC after:	1 month
2011-01-23 09:50:39 +00:00
Konstantin Belousov
e225428f27 In linuxolator getdents_common(), it seems there is no reason to loop
if no records where returned by VOP_READDIR(). Readdir implementations
allowed to return 0 records when first record is larger then supplied
buffer. In this case trying to execute VOP_READDIR() again causes the
syscall looping forewer.

The goto was there from the day 1, which goes back to 1995 year.

Reported and tested by:	Beat G?tzi <beat chruetertee ch>
MFC after:   2 weeks
2011-01-19 12:19:25 +00:00
Matthew D Fleming
f4f04709ac Fix a few more SYSCTL_PROC() that were missing a CTLFLAG type specifier. 2011-01-19 00:57:58 +00:00
Konstantin Belousov
6297a3d843 Create shared (readonly) page. Each ABI may specify the use of page by
setting SV_SHP flag and providing pointer to the vm object and mapping
address. Provide simple allocator to carve space in the page, tailored
to put the code with alignment restrictions.

Enable shared page use for amd64, both native and 32bit FreeBSD
binaries.  Page is private mapped at the top of the user address
space, moving a start of the stack one page down. Move signal
trampoline code from the top of the stack to the shared page.

Reviewed by:	 alc
2011-01-08 16:13:44 +00:00
Sean Farley
506e9a3a87 Fix the LINUX_SOUND_MIXER_INFO ioctl to return success after the
information is set to FreeBSD.  It had been falling through to the end
of linux_ioctl_sound() and returning ENOIOCTL.  Noticed when running the
Linux ALSA amixer tool.

Add a LINUX_SOUND_MIXER_READ_CAPS ioctl which is used by the Skype
v2.1.0.81 binary.

Reviewed by:	gavin
MFC after:	2 weeks
2010-12-30 02:18:04 +00:00
Tijl Coosemans
81bd5041a2 Merge amd64 and i386 bus.h and move the resulting header to x86. Replace
the original amd64 and i386 headers with stubs.

Rename (AMD64|I386)_BUS_SPACE_* to X86_BUS_SPACE_* everywhere.

Reviewed by:	imp (previous version), jhb
Approved by:	kib (mentor)
2010-12-20 16:39:43 +00:00
Konstantin Belousov
da86244d0d Restore the ABI of struct kinfo_proc32 after r213536.
MFC after:	3 days
2010-12-19 21:18:33 +00:00
Bernhard Schmidt
5f5ca78b03 Implement NdisGetRoutineAddress and MmGetSystemRoutineAddress used in
newer Ralink drivers.

Submitted by:	Paul B Mahol <onemda at gmail.com>
2010-12-06 20:54:53 +00:00
Bernhard Schmidt
8b57b7eca6 Add a dummy for IoOpenDeviceRegistryKey().
With that change the Atheros 9xxx driver is actually usable and does not
panic anymore.

Submitted by:	Paul B Mahol <onemda at gmail.com>
MFC after:	2 weeks
2010-11-29 10:21:45 +00:00
Bernhard Schmidt
a94ca271e7 Some drivers rely on the existence of certain keys. The Atheros 9xxx
driver for example requests the NetCfgInstanceId but doesn't check the
returned status code and will happily access random memory instead.

Submitted by:	Paul B Mahol <onemda at gmail.com>
MFC after:	2 weeks
2010-11-29 10:10:56 +00:00
Bernhard Schmidt
7a9417182e Add prototype for InitializeSListHead(). 2010-11-23 22:17:06 +00:00
Bernhard Schmidt
191385fb0e Add a few functions used in newer drivers. Fix RtlCompareMemory() while
here.

Submitted by:	Paul B Mahol <onemda@gmail.com>
2010-11-23 21:49:32 +00:00
Sergey Kandaurov
f03749ca2d Update MNT_ROOTFS comments after changes in the root mount logic.
Reported by:	arundel
Suggested by:	marcel (phrasing)
Approved by:	kib (mentor)
2010-11-23 13:49:15 +00:00
Konstantin Belousov
f253bb0190 Add include guards.
MFC after:	3 days
2010-11-23 12:47:15 +00:00
Bernhard Schmidt
823fc080d7 Resurrect amd64 support.
- Many drivers on amd64 are picking system uptime, interrupt time and ticks
  via global data structure instead of calling functions for performance
  reasons. For now just patch such address so driver will not trigger page
  fault when trying to access such data. In future, additional callout may
  be added to update data in periodic intervals.
- On amd64 we need to allocate "shadow space" on stack before calling any
  function.

Submitted by:	Paul B Mahol <onemda at gmail.com>
2010-11-22 20:46:38 +00:00
Bernhard Schmidt
ea245594a6 Prefer pmap_extract() over pmap_kextract() as done in MmIsAddressValid().
According to the comment for MmIsAddressValid() there are issues on PAE
kernels using pmap_kextract().

Submitted by:	Paul B Mahol <onemda at gmail.com>
2010-11-22 20:39:29 +00:00
Dimitry Andric
95353459ae Fix linux kernel module breakage introduced in r215675, by including
<sys/sysent.h>.

Noticed by:	many
Pointy hat to:	netchild
2010-11-22 20:23:18 +00:00
Attilio Rao
7f08176ee8 Add the ability for GDB to printout the thread name along with other
thread specific informations.

In order to do that, and in order to avoid KBI breakage with existing
infrastructure the following semantic is implemented:
- For live programs, a new member to the PT_LWPINFO is added (pl_tdname)
- For cores, a new ELF note is added (NT_THRMISC) that can be used for
  storing thread specific, miscellaneous, informations. Right now it is
  just popluated with a thread name.

GDB, then, retrieves the correct informations from the corefile via the
BFD interface, as it groks the ELF notes and create appropriate
pseudo-sections.

Sponsored by:	Sandvine Incorporated
Tested by:	gianni
Discussed with:	dim, kan, kib
MFC after:	2 weeks
2010-11-22 14:42:13 +00:00
Alexander Leidinger
526384ecf2 Do not take the process lock. The assignment to u_short inside the
properly aligned structure is atomic on all supported architectures, and
the thread that should see side-effect of assignment is the same thread
that does assignment.

Use a more appropriate conditional to detect the linux ABI.

Suggested by:	kib
X-MFC:		together with r215664
2010-11-22 12:42:32 +00:00
Alexander Leidinger
5706ce8b58 Remove trailing dot from the unimplemented futex messages to make
them consistent with the syscall and ipc messages.

Submitted by:	arundel
MFC after:	3 days
2010-11-22 09:25:32 +00:00
Alexander Leidinger
bb63fdde6d By using the 32-bit Linux version of Sun's Java Development Kit 1.6
on FreeBSD (amd64), invocations of "javac" (or "java") eventually
end with the output of "Killed" and exit code 137.

This is caused by:
1. After calling exec() in multithreaded linux program threads are not
   destroyed and continue running. They get killed after program being
   executed finishes.

2. linux_exit_group doesn't return correct exit code when called not
   from group leader. Which happens regularly using sun jvm.

The submitters fix this in a similar way to how NetBSD handles this.

I took the PRs away from dchagin, who seems to be out of touch of
this since a while (no response from him).

The patches committed here are from [2], with some little modifications
from me to the style.

PR:		141439 [1], 144194 [2]
Submitted by:	Stefan Schmidt <stefan.schmidt@stadtbuch.de>, gk
Reviewed by:	rdivacky (in april 2010)
MFC after:	5 days
2010-11-22 09:06:59 +00:00
Bernhard Schmidt
511fa5ac98 Fix a panic on i386 for drivers using MmAllocateContiguousMemory()
and MmAllocateContiguousMemorySpecifyCache().

Those two functions take 64-bit variable(s) for their arguments. On i386
that takes additional 32-bit variable per argument. This is required so
that windrv_wrap() can correctly wrap function that miniport driver calls
with stdcall convention. Similar explanation is provided in subr_ndis.c for
other functions.

Submitted by:	 Paul B Mahol <onemda at gmail.com>
2010-11-17 09:32:39 +00:00
Bernhard Schmidt
4152177570 Use kmem_alloc_contig() to honour the cache_type variable.
Pointed out by:	alc
2010-11-17 09:28:17 +00:00
Dag-Erling Smørgrav
36b0a37317 Remove no-op assignment.
Submitted by:	clang via arundel@
MFC after:	2 weeks
2010-11-15 23:14:14 +00:00
Alexander Leidinger
809290db9e Some style(9) fixes.
Submitted by:	arundel
MFC after:	1 week
2010-11-15 13:07:10 +00:00
Alexander Leidinger
be44a97cd9 - print out the PID and program name of the program trying to use an
unsupported futex operation
- for those futex operations which are known to be not supported,
  print out which futex operation it is
- shortcut the error return of the unsupported FUTEX_CLOCK_REALTIME in
  some cases:
    FUTEX_CLOCK_REALTIME can be used to tell linux to use
    CLOCK_REALTIME instead of CLOCK_MONOTONIC. FUTEX_CLOCK_REALTIME
    however must only be set, if either FUTEX_WAIT_BITSET or
    FUTEX_WAIT_REQUEUE_PI are set too. If that's not the case
    we can die with ENOSYS right at the beginning.

Submitted by:	arundel
Reviewed by:	rdivacky (earlier iteration of the patch)
MFC after:	1 week
2010-11-15 13:03:35 +00:00
Bernhard Schmidt
1f0820e9c3 According to specs for MmAllocateContiguousMemorySpecifyCache() physically
contiguous memory with requested restrictions must be allocated.

Submitted by:	Paul B Mahol <onemda at gmail.com>
2010-11-11 18:43:31 +00:00
Dag-Erling Smørgrav
6ff168e663 Break long line. 2010-11-08 15:14:14 +00:00
Dag-Erling Smørgrav
199b5a28d6 Fix CPU ID in /proc/cpuinfo.
PR:		kern/56451
Submitted by:	arundel@
MFC after:	3 weeks
2010-11-08 12:04:41 +00:00
Bernhard Schmidt
4fa2655157 Remove 4.x, 5.x and 6.x compatibility bits.
Submitted by:	Paul B Mahol <onemda at gmail.com>
2010-11-04 18:43:57 +00:00
Konstantin Belousov
113801819a Remove stale comment.
Submitted by:	arundel
MFC after:	3 days
2010-10-14 19:30:44 +00:00
Konstantin Belousov
78ae4338a2 Add macro DECLARE_MODULE_TIED to denote a module as requiring the
kernel of exactly the same __FreeBSD_version as the headers module was
compiled against.

Mark our in-tree ABI emulators with DECLARE_MODULE_TIED. The modules
use kernel interfaces that the Release Engineering Team feel are not
stable enough to guarantee they will not change during the life cycle
of a STABLE branch. In particular, the layout of struct sysentvec is
declared to be not part of the STABLE KBI.

Discussed with:	bz, rwatson
Approved by:	re (bz, kensmith)
MFC after:	2 weeks
2010-10-12 09:18:17 +00:00
Jung-uk Kim
2a9479393a Simplify timeout check in futex_wait() using itimerfix() and return error
if the given timeout is invalid.  Consistently use int type for timeout and
correct a format string in futex_sleep().
2010-10-06 18:51:22 +00:00
Alexander Leidinger
5e82f12aca Fix a comparision of an uninitialised pointer.
Submitted by:	arundel
Found by:	clang analysis (automatic service by uqs@)
Reviewed by:	rdivacky
2010-10-06 07:34:41 +00:00
Andrew Thompson
c9558efd73 Use the printf-like capability from kproc_create().
Submitted by:	Paul B Mahol
2010-10-05 20:56:08 +00:00
Jung-uk Kim
e116381d02 Prefer pmap_unmapbios() over pmap_unmapdev(). The binary does not change
after this because pmap_unmapbios() is a macro for pmap_unmapdev() on amd64.
2010-10-05 18:38:23 +00:00
Konstantin Belousov
6c82a991c3 In linprocfs_doargv():
- handle compat32 processes;
- remove the checks for copied in addresses to belong into valid
  usermode range, proc_rwmem() does this;
- simplify loop reading single string, limit the total amount of strings
  collected by ARG_MAX bytes;
- correctly add '\0' at the end of each copied string;
- fix style.

In linprocfs_doprocenviron():
- unlock the process before calling copyin code [1]. The process is held
  by pseudofs.

In linprocfs_doproccmdline:
- use linprocfs_doargv() to handle !curproc case for which p_args is not cached.

Reported by:		plulnet [1]
Tested by:		pluknet
Approved by:		des (linprocfs maintainer, previous
				version of the patch)
MFC after:		3 weeks
2010-09-28 11:32:17 +00:00
Dag-Erling Smørgrav
6e7a4f6c36 Implement proc/$$/environment.
Submitted by:	Fernando Apesteguía <fernando.apesteguia@gmail.com>
MFC after:	3 weeks
2010-09-16 07:56:34 +00:00
Matthew D Fleming
4d369413e1 Replace sbuf_overflowed() with sbuf_error(), which returns any error
code associated with overflow or with the drain function.  While this
function is not expected to be used often, it produces more information
in the form of an errno that sbuf_overflowed() did.
2010-09-10 16:42:16 +00:00
Jung-uk Kim
97e6525d6a Add x86bios_set_intr() to set interrupt vectors for real mode and simplify
x86bios_get_intr() a little.
2010-08-25 21:03:50 +00:00
Jung-uk Kim
bc339276fb Check opcode for short jump as well. Some option ROMs do short jumps
(e.g., some NVIDIA video cards) and we were not able to do POST while
resuming because we only honored long jump.

MFC after:	3 days
2010-08-25 20:52:40 +00:00
Konstantin Belousov
ee235befcb Supply some useful information to the started image using ELF aux vectors.
In particular, provide pagesize and pagesizes array, the canary value
for SSP use, number of host CPUs and osreldate.

Tested by:	marius (sparc64)
MFC after:	1 month
2010-08-17 08:55:45 +00:00
Jung-uk Kim
077c4b480e Place spinlock_enter() and spinlock_exit() just around X86EMU calls. 2010-08-10 15:22:48 +00:00
Jung-uk Kim
449918b191 Tidy up locking and memory allocation for the real mode emulator wrapper.
Now we use a regular mutex instead of a spin mutex.  When we enter and exit
the emulator, spinlock_enter() and spinlock_exit() are additionally used.
Move some page table related stuff from x86bios_init() and x86bios_uninit()
to x86bios_map_mem() and x86bios_unmap_mem().
2010-08-10 06:25:08 +00:00
Jung-uk Kim
f2c73cefa0 Tidy up printf() calls for debugging. 2010-08-09 22:06:08 +00:00
Jung-uk Kim
b316507576 Initialize a variable just before its use. 2010-08-09 18:10:32 +00:00
Jung-uk Kim
b41f3f4cde Reduce diffs between VM86 and X86EMU wrappers for x86bios_alloc() and
x86bios_free().  Add strict sanity checks for VM86 wrapper and add strict
page table locking for X86EMU wrapper.
2010-08-09 17:54:26 +00:00
Konstantin Belousov
1757d9699d Prefer struct sysentvec sv_psstrings to hardcoding FREEBSD32_PS_STRINGS
in the compat32 code. Use sv_usrstack instead of FREEBSD32_USRSTACK as well.

MFC after:	1 week
2010-08-07 11:57:13 +00:00
Konstantin Belousov
4605ef76e7 Add compat32 definition for (old) struct ostat.
MFC after:	1 week
2010-08-07 11:53:38 +00:00
Jung-uk Kim
a6d613a57f Do not block any I/O port on amd64. 2010-08-07 04:05:58 +00:00
Jung-uk Kim
d7a5fb634f Optimize interrupt vector lookup. There is no need to check the page table. 2010-08-07 03:45:45 +00:00
Jung-uk Kim
fc82156f95 Consistently use architecture specific macros. 2010-08-06 15:24:37 +00:00
Jung-uk Kim
f10776734f Fix allocation of multiple pages, which forgot to increase page number.
Particularly, it caused "vm86_addpage: overlap" panics under VirtualBox.
Add a safety check before freeing memory while I am here.
2010-08-06 15:04:01 +00:00
Jung-uk Kim
0a3493e5e7 Re-add flag register for output. Some BIOS calls actually use it to return
success/failure status.  Oops.
2010-08-05 19:30:57 +00:00
Jung-uk Kim
c5e960de35 Do not copy stack pointer and flags. These registers are unconditionally
destroyed from vm86_prepcall().
2010-08-05 19:12:35 +00:00
Jung-uk Kim
439f3d8b81 Implement a simple native VM86 backend for X86BIOS. Now i386 uses native
VM86 calls instead of the real mode emulator as a backend.  VM86 has been
proven reliable for very long time and it is actually few times faster than
emulation.  Increase maximum number of page table entries per VM86 context
from 3 to 8 pages.  It was (ridiculously) low and insufficient for new VM86
backend, which shares one context globally.  Slighly rearrange and clean up
the emulator backend to accommodate new code.  The only visible change here
is stack size, which is decreased from 64K to 4K bytes to sync. with VM86.
Actually, it seems there is no need for big stack in real mode.

MFC after:	1 month
2010-08-05 18:48:30 +00:00
Konstantin Belousov
64dc04de68 Copy inode birthtime to the struct stat32.
MFC after:	1 week
2010-08-04 14:38:20 +00:00
Konstantin Belousov
45b6fa3b54 Fix style.
MFC after:	1 week
2010-08-04 14:35:05 +00:00
Konstantin Belousov
34ab36a3dc When compat32 recvmsg(2) does not need to copy out control messages, set
msg_controllen to 0.

PR:	kern/149227
Submitted by:	Stef Walter <stef memberwebs com>
MFC after:	1 weeks
2010-08-03 11:23:44 +00:00
Alan Cox
2af6e14d39 Introduce exec_alloc_args(). The objective being to encapsulate the
details of the string buffer allocation in one place.

Eliminate the portion of the string buffer that was dedicated to storing
the interpreter name.  The pointer to the interpreter name can simply be
made to point to the appropriate argument string.

Reviewed by:	kib
2010-07-27 17:31:03 +00:00
Konstantin Belousov
a9c9e80bec Revert r210451, and the similar part of the r210431. The forward-declaration
for the enum tag when enum definition is not complete is not allowed by
C99, and is gcc extension.

Requested by:	stefanf
MFC after:	28 days
2010-07-26 12:52:44 +00:00
Alan Cox
9e4e511499 Change the order in which the file name, arguments, environment, and
shell command are stored in exec*()'s demand-paged string buffer.  For
a "buildworld" on an 8GB amd64 multiprocessor, the new order reduces
the number of global TLB shootdowns by 31%.  It also eliminates about
330k page faults on the kernel address space.

Change exec_shell_imgact() to use "args->begin_argv" consistently as
the start of the argument and environment strings.  Previously, it
would sometimes use "args->buf", which is the start of the overall
buffer, but no longer the start of the argument and environment
strings.  While I'm here, eliminate unnecessary passing of "&length"
to copystr(), where we don't actually care about the length of the
copied string.

Clean up the initialization of the exec map.  In particular, use the
correct size for an entry, and express that size in the same way that
is used when an entry is allocated.  The old size was one page too
large.  (This discrepancy originated in 2004 when I rewrote
exec_map_first_page() to use sf_buf_alloc() instead of the exec map
for mapping the first page of the executable.)

Reviewed by:	kib
2010-07-25 17:43:38 +00:00
Konstantin Belousov
0b53d1569e Remove the linux_exec_copyin_args(), freebsd32_exec_copyin_args() may
server as well. COMPAT_FREEBSD32 is a prerequisite for COMPAT_LINUX32.

Reviewed by:	alc
MFC after:	3 weeks
2010-07-23 21:30:33 +00:00
Alan Cox
69a8f9e3d1 Eliminate a little bit of duplicated code. 2010-07-23 18:58:27 +00:00
Edward Tomasz Napierala
49e134dfa6 Remove proc locking, it's not needed after r210132. 2010-07-17 15:52:11 +00:00
Edward Tomasz Napierala
c5dfcf4cc1 Make svr4(4) version of poll(2) use the same limit of file descriptors as the
usual poll(2) does, instead of checking resource limits.
2010-07-15 18:44:58 +00:00
Konstantin Belousov
67322a4cd2 Constify source argument for siginfo_to_siginfo32().
MFC after:	1 week
2010-07-04 11:43:53 +00:00
John Baldwin
ad6eec7b9e Tweak the in-kernel API for sending signals to threads:
- Rename tdsignal() to tdsendsignal() and make it private to kern_sig.c.
- Add tdsignal() and tdksignal() routines that mirror psignal() and
  pksignal() except that they accept a thread as an argument instead of
  a process.  They send a signal to a specific thread rather than to an
  individual process.

Reviewed by:	kib
2010-06-29 20:41:52 +00:00
Konstantin Belousov
13cedde2cb Regenerate 2010-06-28 18:17:21 +00:00
Konstantin Belousov
153ac44cf6 Count number of threads that enter and leave dynamically registered
syscalls. On the dynamic syscall deregistration, wait until all
threads leave the syscall code. This somewhat increases the safety
of the loadable modules unloading.

Reviewed by:	jhb
Tested by:	pho
MFC after:	1 month
2010-06-28 18:06:46 +00:00
Jung-uk Kim
362487c0ba Let x86bios_alloc() pass contigmalloc(9) flags. Use it to set M_WAITOK
from VESA BIOS initialization.  All other malloc(9) uses in the function is
blocking any way.
2010-06-23 17:20:51 +00:00
Ed Schouten
6c9cdb5860 ANSIfy prototypes in subr_usbd.c.
Clang generates the following warnings when building subr_usbd.c:

| subr_usbd.c:598:13: warning: promoted type 'int' of K&R function
|   parameter is not compatible with the parameter type 'uint8_t' (aka
|   'unsigned char') declared in a previous prototype
| subr_usbd.c:627:13: warning: promoted type 'int' of K&R function
|   parameter is not compatible with the parameter type 'uint8_t' (aka
|   'unsigned char') declared in a previous prototype
| subr_usbd.c:649:13: warning: promoted type 'int' of K&R function
|   parameter is not compatible with the parameter type 'uint8_t' (aka
|   'unsigned char') declared in a previous prototype

Instead of just ANSIfying these three prototypes, do it for the entire
file.

Spotted by:	clang
2010-06-12 12:19:08 +00:00
John Baldwin
3aa6d94e0c Update several places that iterate over CPUs to use CPU_FOREACH(). 2010-06-11 18:46:34 +00:00
Wojciech A. Koszek
eedfc35c5c Bring USB fixes for linux(4).
Intention of this commit is to let us take a full advantage
of libusb(8) ported to Linux. This decreases a possibility of getting
any collisions within ioctl() "command" space, especially with
relation to  LINUX_SNDCTL_SEQ... stuff.

Basically, we provide commands, that will be mapped in the kernel
to correct ones and forward those to the USB layer. Port enabling
functionality brought with this patch is here:

	http://www.freebsd.org/cgi/query-pr.cgi?pr=146895

Bump __FreeBSD_version to catch, since which version installing a
port makes sense.

This patch should bring no regressions. So far, only i386 is tested.

Tested by:	thompsa@
Reviewed by:	thompsa@
OKed by:	netchild@
2010-05-24 07:04:00 +00:00
Konstantin Belousov
afe1a68827 Reorganize syscall entry and leave handling.
Extend struct sysvec with three new elements:
sv_fetch_syscall_args - the method to fetch syscall arguments from
  usermode into struct syscall_args. The structure is machine-depended
  (this might be reconsidered after all architectures are converted).
sv_set_syscall_retval - the method to set a return value for usermode
  from the syscall. It is a generalization of
  cpu_set_syscall_retval(9) to allow ABIs to override the way to set a
  return value.
sv_syscallnames - the table of syscall names.

Use sv_set_syscall_retval in kern_sigsuspend() instead of hardcoding
the call to cpu_set_syscall_retval().

The new functions syscallenter(9) and syscallret(9) are provided that
use sv_*syscall* pointers and contain the common repeated code from
the syscall() implementations for the architecture-specific syscall
trap handlers.

Syscallenter() fetches arguments, calls syscall implementation from
ABI sysent table, and set up return frame. The end of syscall
bookkeeping is done by syscallret().

Take advantage of single place for MI syscall handling code and
implement ptrace_lwpinfo pl_flags PL_FLAG_SCE, PL_FLAG_SCX and
PL_FLAG_EXEC. The SCE and SCX flags notify the debugger that the
thread is stopped at syscall entry or return point respectively.  The
EXEC flag augments SCX and notifies debugger that the process address
space was changed by one of exec(2)-family syscalls.

The i386, amd64, sparc64, sun4v, powerpc and ia64 syscall()s are
changed to use syscallenter()/syscallret(). MIPS and arm are not
converted and use the mostly unchanged syscall() implementation.

Reviewed by:	jhb, marcel, marius, nwhitehorn, stas
Tested by:	marcel (ia64), marius (sparc64), nwhitehorn (powerpc),
	stas (mips)
MFC after:	1 month
2010-05-23 18:32:02 +00:00
Alexander Leidinger
eddc400373 - #ifdef out the cliplist part, skype seems like using an uninitialized
variable and can cause problems, without the cliplist handling it works
  without problems
- improve the cliplist error handling
- fix VIDIOCGTUNER and VIDIOCSMICROCODE (still no hardware available to test)

Submitted by:	J.R. Oldroyd <jr@opal.com>
X-MFC after:	soon (together with all the v4l stuff)
2010-05-03 14:19:58 +00:00
Jung-uk Kim
483191871b Reduce MD code further. At least, it compiles on ia64 now (but it is not
connected to build).  The idea/code was shamelessly taken from r207329.
2010-05-01 01:05:07 +00:00
Jung-uk Kim
2083bca542 Do not initialize mutex and return error if it cannot map memory. 2010-05-01 00:36:40 +00:00