Commit Graph

2389 Commits

Author SHA1 Message Date
mav
1d2330e15a Remake Linux' SOUND_MIXER_INFO IOCTL as a wrapper around new FreeBSD's one.
Submitted by:	Dmitry Luhtionov <dmitryluhtionov@gmail.com>
MFC after:	3 days
2014-09-24 08:18:11 +00:00
sbruno
c738e0d253 Bump minimum linux compat version to support Centos6 ports updates for linux.
Update linux compat minimum revision to match linux-c6 now in ports.  This
is a candidate for 10.1 R as it matches the current state of supported
linux compat packages in the ports tree.

PR:		187786
Reviewed by:	xmj
MFC after:	2 days
Relnotes:	yes
2014-09-22 17:26:07 +00:00
glebius
94b048a2d0 Fix build on 32-bit machines.
Pointy hat to:	glebius
2014-09-18 20:29:17 +00:00
glebius
42586cabb8 - Use if_get_counter() to fetch ifnet statistics.
- Report IFCOUNTER_OQDROPS to linprocfs. Wasn't there before.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-09-18 16:44:28 +00:00
bz
122003e2ff Implement most of timer_{create,settime,gettime,getoverrun,delete}
for amd64/linux32.  Fix the entirely bogus (untested) version from
r161310 for i386/linux using the same shared code in compat/linux.

It is unclear to me if we could support more clock mappings but
the current set allows me to successfully run commercial
32bit linux software under linuxolator on amd64.

Reviewed by:		jhb
Differential Revision:	D784
MFC after:		3 days
Sponsored by:		DARPA, AFRL
2014-09-18 08:36:45 +00:00
mjg
4cf719a9ee Add missing proctree locking to fill_kinfo_proc consumers.
This fixes r270444.

Pointy hat:	mjg
Reported by:	many
MFC after:	1 week
2014-08-30 03:10:55 +00:00
mjg
ec92f2e61c Return real parent pid in kinfo (used by e.g. ps)
Add a separate field which exports tracer pid and add a new keyword
("tracer") for ps to display it.

This is a follow up to r270444.

Reviewed by:	kib
MFC after:	1 week
Relnotes:	yes
2014-08-28 08:41:11 +00:00
kib
1e11271b7e Regen. 2014-08-27 01:02:19 +00:00
kib
75f74b437b Fix handling of the third argument for fcntl(2). The native syscall
uses long for arg, which needs translation.

Discussed with and tested by:	mjg
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-08-27 01:02:02 +00:00
glebius
0236597739 All mbuf external free functions never fail, so let them be void.
Sponsored by:	Nginx, Inc.
2014-07-11 13:58:48 +00:00
marcel
9f28abd980 Remove ia64.
This includes:
o   All directories named *ia64*
o   All files named *ia64*
o   All ia64-specific code guarded by __ia64__
o   All ia64-specific makefile logic
o   Mention of ia64 in comments and documentation

This excludes:
o   Everything under contrib/
o   Everything under crypto/
o   sys/xen/interface
o   sys/sys/elf_common.h

Discussed at: BSDcan
2014-07-07 00:27:09 +00:00
hselasky
35b126e324 Pull in r267961 and r267973 again. Fix for issues reported will follow. 2014-06-28 03:56:17 +00:00
gjb
fc21f40567 Revert r267961, r267973:
These changes prevent sysctl(8) from returning proper output,
such as:

 1) no output from sysctl(8)
 2) erroneously returning ENOMEM with tools like truss(1)
    or uname(1)
 truss: can not get etype: Cannot allocate memory
2014-06-27 22:05:21 +00:00
hselasky
bd1ed65f0f Extend the meaning of the CTLFLAG_TUN flag to automatically check if
there is an environment variable which shall initialize the SYSCTL
during early boot. This works for all SYSCTL types both statically and
dynamically created ones, except for the SYSCTL NODE type and SYSCTLs
which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to
be used in the case a tunable sysctl has a custom initialisation
function allowing the sysctl to still be marked as a tunable. The
kernel SYSCTL API is mostly the same, with a few exceptions for some
special operations like iterating childrens of a static/extern SYSCTL
node. This operation should probably be made into a factored out
common macro, hence some device drivers use this. The reason for
changing the SYSCTL API was the need for a SYSCTL parent OID pointer
and not only the SYSCTL parent OID list pointer in order to quickly
generate the sysctl path. The motivation behind this patch is to avoid
parameter loading cludges inside the OFED driver subsystem. Instead of
adding special code to the OFED driver subsystem to post-load tunables
into dynamically created sysctls, we generalize this in the kernel.

Other changes:
- Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask"
to "hw.pcic.intr_mask".
- Removed redundant TUNABLE statements throughout the kernel.
- Some minor code rewrites in connection to removing not needed
TUNABLE statements.
- Added a missing SYSCTL_DECL().
- Wrapped two very long lines.
- Avoid malloc()/free() inside sysctl string handling, in case it is
called to initialize a sysctl from a tunable, hence malloc()/free() is
not ready when sysctls from the sysctl dataset are registered.
- Bumped FreeBSD version to indicate SYSCTL API change.

MFC after:	2 weeks
Sponsored by:	Mellanox Technologies
2014-06-27 16:33:43 +00:00
mav
797781309b - Add support for SG_GET_SG_TABLESIZE IOCTL to report that we don't support
scatter/gather lists.
- Return error for still unsupported SG 3.x API read/write calls.

MFC after:	1 month
2014-06-04 12:05:47 +00:00
mav
cb97f6171b Overhaul CAM SG driver IOCTL interfaces.
Make it really work for native FreeBSD programs.  Before this it was broken
for years due to different number of pointer dereferences in Linux and
FreeBSD IOCTL paths, permanently returning errors to FreeBSD programs.
This change breaks the driver FreeBSD IOCTL ABI, making it more strict,
but since it was not working any way -- who bother.

Add shims for 32-bit programs on 64-bit host, translating the argument
of the SG_IO IOCTL for both FreeBSD and Linux ABIs.

With this change I was able to run 32-bit Linux sg3_utils tools and simple
32 and 64-bit FreeBSD test tools on both 32 and 64-bit FreeBSD systems.

MFC after:	1 month
2014-06-02 19:53:53 +00:00
dchagin
f38753151f Glibc was switched to the FUTEX_WAIT_BITSET op and CLOCK_REALTIME
flag has been added instead of FUTEX_WAIT to replace the FUTEX_WAIT
logic which needs to do gettimeofday() calls before the futex syscall
to convert the absolute timeout to a relative timeout.
Before this the CLOCK_MONOTONIC used by the FUTEX_WAIT_BITSET op.

When the FUTEX_CLOCK_REALTIME is specified the timeout is an absolute
time, not a relative time. Rework futex_wait to handle this.
On the side fix the futex leak in error case and remove useless
parentheses.

Properly calculate the timeout for the CLOCK_MONOTONIC case.

MFC after:	3 days
2014-05-31 14:58:53 +00:00
dchagin
1e378c6cc6 In r218101 I have not changed properly the futex syscall definition.
Some Linux futex ops atomically verifies that the futex address uaddr
(uval) contains the value val. Comparing signed uval and unsigned val
may lead to an unexpected result, mostly to a deadlock.

So copyin uaddr to an unsigned int to compare the parameters correctly.

While here change ktr records to print parameters in more readable format.

Tested by	eadler@

MFC after:	3 days
2014-05-28 05:57:35 +00:00
marcel
60f16a83d9 In freebsd32_sendmsg(), replace the call to sockargs() followed by a
call to freebsd32_convert_msg_in() with freebsd32_copyin_control() to
readin and convert in a single step. This makes it simpler to put all
the control messages in a single mbuf or mbuf cluster as per the
limitations imposed upon us by ip6_setpktopts().

The logic is as follows:
1.  Go over the array of control messages to determine overall size
    and include extra padding for proper alignment as we go.
2.  Get a mbuf or mbuf cluster as needed or fail if the overall
    (adjusted) size is larger than a cluster.
3.  Go over the array of control messages again, but now copy them
    into kernel space and into aligned offsets.
4.  Update the length of the control message to take padding between
    the header and the data into account (but not for padding added
    between one control message and the next).

Obtained from:	Juniper Networks, Inc.
MFC after:	1 week
2014-04-05 18:56:01 +00:00
imp
eebc91c3f0 Remove instances of variables that were set, but never used. gcc 4.9
warns about these by default.
2014-03-30 23:43:36 +00:00
bdrewery
6fcf6199a4 Rename global cnt to vm_cnt to avoid shadowing.
To reduce the diff struct pcu.cnt field was not renamed, so
PCPU_OP(cnt.field) is still used. pc_cnt and pcpu are also used in
kvm(3) and vmstat(8). The goal was to not affect externally used KPI.

Bump __FreeBSD_version_ in case some out-of-tree module/code relies on the
the global cnt variable.

Exp-run revealed no ports using it directly.

No objection from:	arch@
Sponsored by:	EMC / Isilon Storage Division
2014-03-22 10:26:09 +00:00
kib
b236080eb1 Make the array pointed to by AT_PAGESIZES auxv properly aligned.
Also, remove the expression which calculated the location of the
strings for a new image and grown over the time to be
non-comprehensible.  Instead, calculate the offsets by steps, which
also makes fixing the alignments much cleaner.

Reported and reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-03-19 12:35:04 +00:00
attilio
f931c33558 Regen per r263318.
Sponsored by:	EMC / Isilon storage division
2014-03-18 21:34:11 +00:00
attilio
25d02685fb Remove dead code from umtx support:
- Retire long time unused (basically always unused) sys__umtx_lock()
  and sys__umtx_unlock() syscalls
- struct umtx and their supporting definitions
- UMUTEX_ERROR_CHECK flag
- Retire UMTX_OP_LOCK/UMTX_OP_UNLOCK from _umtx_op() syscall

__FreeBSD_version is not bumped yet because it is expected that further
breakages to the umtx interface will follow up in the next days.
However there will be a final bump when necessary.

Sponsored by:	EMC / Isilon storage division
Reviewed by:	jhb
2014-03-18 21:32:03 +00:00
emaste
dfd2dcdc01 Update NetBSD Foundation copyrights to 2-clause BSD
The NetBSD Foundation states "Third parties are encouraged to change the
license on any files which have a 4-clause license contributed to the
NetBSD Foundation to a 2-clause license."

This change removes clauses 3 and 4 from copyright / license blocks that
list The NetBSD Foundation as the only copyright holder.

Sponsored by:	The FreeBSD Foundation
2014-03-18 01:40:25 +00:00
rwatson
33fdc14c0c Update kernel inclusions of capability.h to use capsicum.h instead; some
further refinement is required as some device drivers intended to be
portable over FreeBSD versions rely on __FreeBSD_version to decide whether
to include capability.h.

MFC after:	3 weeks
2014-03-16 10:55:57 +00:00
jmg
b66f059b49 change td_retval into a union w/ off_t, with defines to mask the
change...  This eliminates a cast, and also forces td_retval
(often 2 32-bit registers) to be aligned so that off_t's can be
stored there on arches with strict alignment requirements like
armeb (AVILA)...  On i386, this doesn't change alignment, and on
amd64 it doesn't either, as register_t is already 64bits...

This will also prevent future breakage due to people adding additional
fields to the struct...

This gets AVILA booting a bit farther...

Reviewed by:	bde
2014-03-16 00:53:40 +00:00
glebius
b38edcd355 Since 32-bit if_baudrate isn't enough to describe a baud rate of a 10 Gbit
interface, in the r241616 a crutch was provided. It didn't work well, and
finally we decided that it is time to break ABI and simply make if_baudrate
a 64-bit value. Meanwhile, the entire struct if_data was reviewed.

o Remove the if_baudrate_pf crutch.

o Make all fields of struct if_data fixed machine independent size. The
  notion of data (packet counters, etc) are by no means MD. And it is a
  bug that on amd64 we've got a 64-bit counters, while on i386 32-bit,
  which at modern speeds overflow within a second.

  This also removes quite a lot of COMPAT_FREEBSD32 code.

o Give 16 bit for the ifi_datalen field. This field was provided to
  make future changes to if_data less ABI breaking. Unfortunately the
  8 bit size of it had effectively limited sizeof if_data to 256 bytes.

o Give 32 bits to ifi_mtu and ifi_metric.
o Give 64 bits to the rest of fields, since they are counters.

__FreeBSD_version bumped.

Discussed with:	emax
Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-03-13 03:42:24 +00:00
eadler
ea79ffcf4b linprocfs: add support for /sys/kernel/random/uuid
PR:		kern/186187
Submitted by:	Fernando <fernando.apesteguia@gmail.com>
MFC After:	2 weeks
2014-02-27 00:43:10 +00:00
kib
f0cb8e7d88 The posix_madvise(3) and posix_fadvise(2) should return error on
failure, same as posix_fallocate(2).

Noted by:	Bob Bishop <rb@gid.co.uk>
Discussed with:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-01-30 18:04:39 +00:00
kib
05b9ae7031 The posix_fallocate(2) syscall should return error number on error,
without modifying errno.

Reported and tested by:	Gennady Proskurin <gpr@mail.ru>
Reviewed by:	mdf
PR:	standards/186028
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-01-23 17:24:26 +00:00
adrian
c46f73c7ae Implement a kqueue notification path for sendfile.
This fires off a kqueue note (of type sendfile) to the configured kqfd
when the sendfile transaction has completed and the relevant memory
backing the transaction is no longer in use by this transaction.
This is analogous to SF_SYNC waiting for the mbufs to complete -
except now you don't have to wait.

Both SF_SYNC and SF_KQUEUE should work together, even if it
doesn't necessarily make any practical sense.

This is designed for use by applications which use backing cache/store
files (eg Varnish) or POSIX shared memory (not sure anything is using
it yet!) to know when a region of memory is free for re-use.  Note
it doesn't mark the region as free overall - only free from this
transaction.  The application developer still needs to track which
ranges are in the process of being recycled and wait until all
pending transactions are completed.

TODO:

* documentation, as always

Sponsored by:	Netflix, Inc.
2014-01-17 05:26:55 +00:00
adrian
19f7055283 Refactor out the common sendfile code from the do_sendfile() and the
compat32 sendfile syscall.

Sponsored by:	Netflix, Inc.
2014-01-09 00:11:14 +00:00
adrian
86274dd213 Migrate the sendfile_sync structure into a public(ish) API in preparation
for extending and reusing it.

The sendfile_sync wrapper is mostly just a "mbuf transaction" wrapper,
used to indicate that the backing store for a group of mbufs has completed.
It's only being used by sendfile for now and it's only implementing a
sleep/wakeup rendezvous.  However, there are other potential signaling
paths (kqueue) and other potential uses (socket zero-copy write) where the
same mechanism would also be useful.

So, with that in mind:

* extract the sendfile_sync code out into sf_sync_*() methods
* teach the sf_sync_alloc method about the current config flag -
  it will eventually know about kqueue.
* move the sendfile_sync code out of do_sendfile() - the only thing
  it now knows about is the sfs pointer.  The guts of the sync
  rendezvous (setup, rendezvous/wait, free) is now done in the
  syscall wrapper.
* .. and teach the 32-bit compat sendfile call the same.

This should be a no-op.  It's primarily preparation work for teaching
the sendfile_sync about kqueue notification.

Tested:

* Peter Holm's sendfile stress / regression scripts

Sponsored by:	Netflix, Inc.
2013-12-01 03:53:21 +00:00
peter
ac40be45fb jail_v0.ip_number was always in host byte order. This was handled
in one of the many layers of indirection and shims through stable/7
in jail_handle_ips().  When it was cleaned up and unified through
kern_jail() for 8.x, the byte order swap was lost.

This only matters for ancient binaries that call jail(2) themselves
internally.
2013-11-28 19:40:33 +00:00
kib
0123c14853 Add an kinfo sysctl to retrieve signal trampoline location for the
given process.

Note that the correctness of the trampoline length returned for ABIs
which do not use shared page depends on the correctness of the struct
sysvec sv_szsigcodebase member, which will be fixed on as-need basis.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2013-11-26 19:47:09 +00:00
avg
71889a5eff dtrace sdt: remove the ugly sname parameter of SDT_PROBE_DEFINE
In its stead use the Solaris / illumos approach of emulating '-' (dash)
in probe names with '__' (two consecutive underscores).

Reviewed by:	markj
MFC after:	3 weeks
2013-11-26 08:46:27 +00:00
adrian
f447b2ef43 Fix the compat32 sendfile() to be in line with my recent changes.
Reminded by:	kib
2013-11-26 08:32:37 +00:00
attilio
7ee4e910ce - For kernel compiled only with KDTRACE_HOOKS and not any lock debugging
option, unbreak the lock tracing release semantic by embedding
  calls to LOCKSTAT_PROFILE_RELEASE_LOCK() direclty in the inlined
  version of the releasing functions for mutex, rwlock and sxlock.
  Failing to do so skips the lockstat_probe_func invokation for
  unlocking.
- As part of the LOCKSTAT support is inlined in mutex operation, for
  kernel compiled without lock debugging options, potentially every
  consumer must be compiled including opt_kdtrace.h.
  Fix this by moving KDTRACE_HOOKS into opt_global.h and remove the
  dependency by opt_kdtrace.h for all files, as now only KDTRACE_FRAMES
  is linked there and it is only used as a compile-time stub [0].

[0] immediately shows some new bug as DTRACE-derived support for debug
in sfxge is broken and it was never really tested.  As it was not
including correctly opt_kdtrace.h before it was never enabled so it
was kept broken for a while.  Fix this by using a protection stub,
leaving sfxge driver authors the responsibility for fixing it
appropriately [1].

Sponsored by:	EMC / Isilon storage division
Discussed with:	rstone
[0] Reported by:	rstone
[1] Discussed with:	philip
2013-11-25 07:38:45 +00:00
glebius
af0db43f1c Fix build.
Pointy hat to:	glebius
2013-11-05 19:17:19 +00:00
glebius
cb6df3f35c Axe IFF_SMART. Fortunately this layering violating flag was never used,
it was just declared.
2013-11-05 12:52:56 +00:00
glebius
3b6f8b896c Drop support for historic ioctls and also undefine them, so that code
that checks their presence via ifdef, won't use them.

Bump __FreeBSD_version as safety measure.
2013-11-05 10:29:47 +00:00
glebius
9e01f79e97 - Provide necessary includes.
- Remove unnecessary includes.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2013-10-29 11:17:49 +00:00
glebius
f469ae1d45 Include necessary headers that now are available due to pollution
via if_var.h.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2013-10-28 07:29:16 +00:00
glebius
2c1ec831c9 Provide includes that are needed in these files, and before were read
in implicitly via if.h -> if_var.h pollution.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2013-10-26 18:18:50 +00:00
glebius
ff6e113f1b The r48589 promised to remove implicit inclusion of if_var.h soon. Prepare
to this event, adding if_var.h to files that do need it. Also, include
all includes that now are included due to implicit pollution via if_var.h

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2013-10-26 17:58:36 +00:00
kib
677c1f8ce9 Add padding to match the compat32 struct stat32 definition to the real
struct stat on 32bit architectures.

Debugged and tested by:	bsam
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Approved by:	re (marius)
2013-10-04 22:05:23 +00:00
markj
8ecd1f0d70 Fix some typos that were causing probe argument types to show up as unknown.
Reviewed by:	rwatson (mac provider)
Approved by:	re (glebius)
MFC after:	1 week
2013-10-01 15:40:27 +00:00
markj
d3af58e0a0 Regenerate syscall argument strings after r255777.
Approved by:	re (gjb)
MFC after:	1 week
2013-09-21 23:06:36 +00:00
jhb
c6151e30b1 Regen.
Approved by:	re (delphij)
2013-09-19 18:56:00 +00:00
jhb
d3ef75b6c7 Extend the support for exempting processes from being killed when swap is
exhausted.
- Add a new protect(1) command that can be used to set or revoke protection
  from arbitrary processes.  Similar to ktrace it can apply a change to all
  existing descendants of a process as well as future descendants.
- Add a new procctl(2) system call that provides a generic interface for
  control operations on processes (as opposed to the debugger-specific
  operations provided by ptrace(2)).  procctl(2) uses a combination of
  idtype_t and an id to identify the set of processes on which to operate
  similar to wait6().
- Add a PROC_SPROTECT control operation to manage the protection status
  of a set of processes.  MADV_PROTECT still works for backwards
  compatability.
- Add a p_flag2 to struct proc (and a corresponding ki_flag2 to kinfo_proc)
  the first bit of which is used to track if P_PROTECT should be inherited
  by new child processes.

Reviewed by:	kib, jilles (earlier version)
Approved by:	re (delphij)
MFC after:	1 month
2013-09-19 18:53:42 +00:00
rdivacky
fae003a069 Revert r255672, it has some serious flaws, leaking file references etc.
Approved by:	re (delphij)
2013-09-18 18:48:33 +00:00
rdivacky
d57db3eead Implement epoll support in Linuxulator. This is a tiny wrapper around kqueue
to implement epoll subset of functionality. The kqueue user data are 32bit
on i386 which is not enough for epoll user data so this patch overrides
kqueue fileops to maintain enough space in struct file.

Initial patch developed by me in 2007 and then extended and finished
by Yuri Victorovich.

Approved by:    re (delphij)
Sponsored by:   Google Summer of Code
Submitted by:   Yuri Victorovich <yuri at rawbw dot com>
Tested by:      Yuri Victorovich <yuri at rawbw dot com>
2013-09-18 17:56:04 +00:00
jilles
d252cd0ae3 Regenerate for freebsd32_cap_enter().
Approved by:	re (hrs)
2013-09-17 20:49:05 +00:00
jilles
5faad32e2c Disallow cap_enter() in freebsd32 compatibility mode.
The freebsd32 compatibility mode (for running 32-bit binaries on 64-bit
kernels) does not currently allow any system calls in capability mode, but
still permits cap_enter(). As a result, 32-bit binaries on 64-bit kernels
that use capability mode do not work (they crash after being disallowed to
call sys_exit()). Affected binaries include dhclient and uniq. The latter's
crashes cause obscure build failures.

This commit makes freebsd32 cap_enter() fail with [ENOSYS], as if capability
mode was not compiled in. Applications deal with this by doing their work
without capability mode.

This commit does not fix the uncommon situation where a 64-bit process
enters capability mode and then executes a 32-bit binary using fexecve().

This commit should be reverted when allowing the necessary freebsd32 system
calls in capability mode.

Reviewed by:	pjd
Approved by:	re (hrs)
2013-09-17 20:48:19 +00:00
jhb
04bb6e10cd Add a mmap flag (MAP_32BIT) on 64-bit platforms to request that a mapping use
an address in the first 2GB of the process's address space.  This flag should
have the same semantics as the same flag on Linux.

To facilitate this, add a new parameter to vm_map_find() that specifies an
optional maximum virtual address.  While here, fix several callers of
vm_map_find() to use a VMFS_* constant for the findspace argument instead of
TRUE and FALSE.

Reviewed by:	alc
Approved by:	re (kib)
2013-09-09 18:11:59 +00:00
pjd
d1a65cb7ef Regenerate after r255219.
Sponsored by:	The FreeBSD Foundation
2013-09-05 00:11:59 +00:00
pjd
029a6f5d92 Change the cap_rights_t type from uint64_t to a structure that we can extend
in the future in a backward compatible (API and ABI) way.

The cap_rights_t represents capability rights. We used to use one bit to
represent one right, but we are running out of spare bits. Currently the new
structure provides place for 114 rights (so 50 more than the previous
cap_rights_t), but it is possible to grow the structure to hold at least 285
rights, although we can make it even larger if 285 rights won't be enough.

The structure definition looks like this:

	struct cap_rights {
		uint64_t	cr_rights[CAP_RIGHTS_VERSION + 2];
	};

The initial CAP_RIGHTS_VERSION is 0.

The top two bits in the first element of the cr_rights[] array contain total
number of elements in the array - 2. This means if those two bits are equal to
0, we have 2 array elements.

The top two bits in all remaining array elements should be 0.
The next five bits in all array elements contain array index. Only one bit is
used and bit position in this five-bits range defines array index. This means
there can be at most five array elements in the future.

To define new right the CAPRIGHT() macro must be used. The macro takes two
arguments - an array index and a bit to set, eg.

	#define	CAP_PDKILL	CAPRIGHT(1, 0x0000000000000800ULL)

We still support aliases that combine few rights, but the rights have to belong
to the same array element, eg:

	#define	CAP_LOOKUP	CAPRIGHT(0, 0x0000000000000400ULL)
	#define	CAP_FCHMOD	CAPRIGHT(0, 0x0000000000002000ULL)

	#define	CAP_FCHMODAT	(CAP_FCHMOD | CAP_LOOKUP)

There is new API to manage the new cap_rights_t structure:

	cap_rights_t *cap_rights_init(cap_rights_t *rights, ...);
	void cap_rights_set(cap_rights_t *rights, ...);
	void cap_rights_clear(cap_rights_t *rights, ...);
	bool cap_rights_is_set(const cap_rights_t *rights, ...);

	bool cap_rights_is_valid(const cap_rights_t *rights);
	void cap_rights_merge(cap_rights_t *dst, const cap_rights_t *src);
	void cap_rights_remove(cap_rights_t *dst, const cap_rights_t *src);
	bool cap_rights_contains(const cap_rights_t *big, const cap_rights_t *little);

Capability rights to the cap_rights_init(), cap_rights_set(),
cap_rights_clear() and cap_rights_is_set() functions are provided by
separating them with commas, eg:

	cap_rights_t rights;

	cap_rights_init(&rights, CAP_READ, CAP_WRITE, CAP_FSTAT);

There is no need to terminate the list of rights, as those functions are
actually macros that take care of the termination, eg:

	#define	cap_rights_set(rights, ...)				\
		__cap_rights_set((rights), __VA_ARGS__, 0ULL)
	void __cap_rights_set(cap_rights_t *rights, ...);

Thanks to using one bit as an array index we can assert in those functions that
there are no two rights belonging to different array elements provided
together. For example this is illegal and will be detected, because CAP_LOOKUP
belongs to element 0 and CAP_PDKILL to element 1:

	cap_rights_init(&rights, CAP_LOOKUP | CAP_PDKILL);

Providing several rights that belongs to the same array's element this way is
correct, but is not advised. It should only be used for aliases definition.

This commit also breaks compatibility with some existing Capsicum system calls,
but I see no other way to do that. This should be fine as Capsicum is still
experimental and this change is not going to 9.x.

Sponsored by:	The FreeBSD Foundation
2013-09-05 00:09:56 +00:00
will
a52b9ca1d3 Add the ability to display the default FIB number for a process to the
ps(1) utility, e.g. "ps -O fib".

bin/ps/keyword.c:
	Add the "fib" keyword and default its column name to "FIB".

bin/ps/ps.1:
	Add "fib" as a supported keyword.

sys/compat/freebsd32/freebsd32.h:
sys/kern/kern_proc.c:
sys/sys/user.h:
	Add the default fib number for a process (p->p_fibnum)
	to the user land accessible process data of struct kinfo_proc.

Submitted by:	Oliver Fromme <olli@fromme.com>, gibbs
2013-08-26 23:48:21 +00:00
andre
6c0efad132 Give (*ext_free) an int return value allowing for very sophisticated
external mbuf buffer management capabilities in the future.

For now only EXT_FREE_OK is defined with current legacy behavior.

Sponsored by:	The FreeBSD Foundation
2013-08-25 10:57:09 +00:00
pjd
50f2a13249 Regenerate after r254491. 2013-08-18 13:38:39 +00:00
pjd
a1397643f3 The cap_rights_limit(2) system calls needs a wrapper for 32bit binaries
running under 64bit kernels as the 'rights' argument has to be split
into two registers or the half of the rights will disappear.

Reported by:	jilles
Sponsored by:	The FreeBSD Foundation
2013-08-18 13:37:54 +00:00
pjd
999f81be5b Move the PAIR32TO64() macro and the RETVAL_HI/RETVAL_LO defines to a
header file for use by other .c files.

Sponsored by:	The FreeBSD Foundation
2013-08-18 13:34:11 +00:00
pjd
6e04ef92e4 Regenerate after r254481. 2013-08-18 10:31:30 +00:00
pjd
3014e000ae Implement 32bit versions of the cap_ioctls_limit(2) and cap_ioctls_get(2)
system calls as unsigned longs have different size on i386 and amd64.

Reported by:	jilles
Sponsored by:	The FreeBSD Foundation
2013-08-18 10:30:41 +00:00
markj
8e4bf18907 Remove a couple of unused macros.
MFC after:	3 days
2013-08-17 21:53:37 +00:00
pjd
8919b4e852 Regenerate after r254447.
Sponsored by:	The FreeBSD Foundation
2013-08-17 14:18:41 +00:00
pjd
b3f1c95907 Make pdfork(2), pdkill(2) and pdgetpid(2) syscalls available for 32bit
binaries running under 64bit kernel.

Sponsored by:	The FreeBSD Foundation
2013-08-17 14:17:13 +00:00
glebius
722a1a5e5d Make sendfile() a method in the struct fileops. Currently only
vnode backed file descriptors have this method implemented.

Reviewed by:	kib
Sponsored by:	Nginx, Inc.
Sponsored by:	Netflix
2013-08-15 07:54:31 +00:00
jeff
de4ecca213 Replace kernel virtual address space allocation with vmem. This provides
transparent layering and better fragmentation.

 - Normalize functions that allocate memory to use kmem_*
 - Those that allocate address space are named kva_*
 - Those that operate on maps are named kmap_*
 - Implement recursive allocation handling for kmem_arena in vmem.

Reviewed by:	alc
Tested by:	pho
Sponsored by:	EMC / Isilon Storage Division
2013-08-07 06:21:20 +00:00
kib
445bcc30a7 Regenerate. 2013-07-21 19:44:53 +00:00
kib
a7dacef5ab Implement compat32 wrappers for the ktimer_* syscalls.
Reported, reviewed and tested by:	Petr Salinger <Petr.Salinger@seznam.cz>
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2013-07-21 19:43:52 +00:00
kib
e9d8b81db7 Wrap kmq_notify(2) for compat32 to properly consume struct sigevent32
argument.

Reviewed and tested by:	Petr Salinger <Petr.Salinger@seznam.cz>
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2013-07-21 19:40:30 +00:00
kib
5dee91b64a The freebsd32_lio_listio() compat syscall takes the struct sigevent32.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2013-07-21 19:36:53 +00:00
kib
97d40396c6 Move the convert_sigevent32() utility function into freebsd32_misc.c
for consumption outside the vfs_aio.c.

For SIGEV_THREAD_ID and SIGEV_SIGNAL notification delivery methods,
also copy in the sigev_value, since librt event pumping loop compares
note generation number with the value passed through sigev_value.

Tested by:	Petr Salinger <Petr.Salinger@seznam.cz>
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2013-07-21 19:33:48 +00:00
kib
c43c8949f9 Cosmetic change, use the same union name on the left and right sides
of the conversion.

Tested by:	Petr Salinger <Petr.Salinger@seznam.cz>
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2013-07-21 19:17:46 +00:00
kib
2287ae0451 Regenerate 2013-07-20 13:40:03 +00:00
kib
82f12b6237 id_t is 64bit, provide the compat32 wrapper for clock_getcpuclockid2(2).
Reported and tested by:	Petr Salinger <Petr.Salinger@seznam.cz>
PR:	threads/180652
Sponsored by:	The FreeBSD Foundation
2013-07-20 13:39:41 +00:00
hselasky
6308c49781 Add some missing LIBUSB IOCTL conversion codes. 2013-07-14 10:13:01 +00:00
netchild
32d31c91c0 - Move videodev headers from compat/linux to contrib/v4l (cp from vendor and
apply diff to compat/linux versions).
- The cp implies an update of videodev2.h to the linux kernel 2.6.34.14 one.

The update makes video in skype v4 work on FreeBSD.

Tested by:	Artyom Mirgorodskiy <artyom.mirgorodsky@gmail.com>
		(update of header only)
2013-07-06 19:59:06 +00:00
glebius
85cf0e083f aio_mlock() added:
- Regen for r251526.
  - Bump __FreeBSD_version.
2013-06-08 13:30:13 +00:00
glebius
9a02f3097d Add new system call - aio_mlock(). The name speaks for itself. It allows
to perform the mlock(2) operation, which can consume a lot of time, under
control of aio(4).

Reviewed by:	kib, jilles
Sponsored by:	Nginx, Inc.
2013-06-08 13:27:57 +00:00
alc
b4fae70474 Relax the vm object locking. Use a read lock.
Sponsored by:	EMC / Isilon Storage Division
2013-06-05 17:00:10 +00:00
obrien
0c92a49de4 Add a "kern.features" MIB for 32bit support under a 64bit kernel. 2013-05-31 21:43:17 +00:00
kib
19425ad923 Regenerate. 2013-05-21 11:41:08 +00:00
kib
ad28c68314 Fix the wait6(2) on 32bit architectures and for the compat32, by using
the right type for the argument in syscalls.master.  Also fix the
posix_fallocate(2) and posix_fadvise(2) compat32 syscalls on the
architectures which require padding of the 64bit argument.

Noted and reviewed by:	jhb
Pointy hat to:	kib
MFC after:	1 week
2013-05-21 11:40:16 +00:00
jilles
49a5937b77 Regenerate files for pipe2(). 2013-05-01 22:45:04 +00:00
jilles
16772c421d Add pipe2() system call.
The pipe2() function is similar to pipe() but allows setting FD_CLOEXEC and
O_NONBLOCK (on both sides) as part of the function.

If p points to two writable ints, pipe2(p, 0) is equivalent to pipe(p).

If the pointer is not valid, behaviour differs: pipe2() writes into the
array from the kernel like socketpair() does, while pipe() writes into the
array from an architecture-specific assembler wrapper.

Reviewed by:	kan, kib
2013-05-01 22:42:42 +00:00
jilles
66a7a3379b Regenerate files for accept4(). 2013-05-01 20:12:58 +00:00
jilles
299afd25fd Add accept4() system call.
The accept4() function, compared to accept(), allows setting the new file
descriptor atomically close-on-exec and explicitly controlling the
non-blocking status on the new socket. (Note that the latter point means
that accept() is not equivalent to any form of accept4().)

The linuxulator's accept4 implementation leaves a race window where the new
file descriptor is not close-on-exec because it calls sys_accept(). This
implementation leaves no such race window (by using falloc() flags). The
linuxulator could be fixed and simplified by using the new code.

Like accept(), accept4() is async-signal-safe, a cancellation point and
permitted in capability mode.
2013-05-01 20:10:21 +00:00
mdf
a3d624db5a Regen.
MFC after:	1 week
2013-04-02 05:30:52 +00:00
mdf
da578c6492 Fix return type of extattr_set_* and fix rmextattr(8) utility.
extattr_set_{fd,file,link} is logically a write(2)-like operation and
should return ssize_t, just like extattr_get_*.  Also, the user-space
utility was using an int for the return value of extattr_get_* and
extattr_list_*, both of which return an ssize_t.

MFC after:	1 week
2013-04-02 05:30:41 +00:00
jilles
9d8a3c5c3b Rename do_pipe() to kern_pipe2() and declare it properly. 2013-03-31 17:42:54 +00:00
pjd
f44b21d5e5 Regenerate after r248599.
Sponsored by:	The FreeBSD Foundation
2013-03-21 23:02:19 +00:00
pjd
635dbe90f2 Implement chflagsat(2) system call, similar to fchmodat(2), but operates on
file flags.

Reviewed by:	kib, jilles
Sponsored by:	The FreeBSD Foundation
2013-03-21 22:59:01 +00:00
pjd
5fc1bac315 Regenerate after r248597.
Sponsored by:	The FreeBSD Foundation
2013-03-21 22:47:03 +00:00
pjd
2a3cf7f364 - Make 'flags' argument to chflags(2), fchflags(2) and lchflags(2) of type
u_long. Before this change it was of type int for syscalls, but prototypes
  in sys/stat.h and documentation for chflags(2) and fchflags(2) (but not
  for lchflags(2)) stated that it was u_long. Now some related functions
  use u_long type for flags (strtofflags(3), fflagstostr(3)).
- Make path argument of type 'const char *' for consistency.

Discussed on:	arch
Sponsored by:	The FreeBSD Foundation
2013-03-21 22:44:33 +00:00
glebius
b37af62b9e Use m_get/m_gethdr instead of compat macros.
Sponsored by:	Nginx, Inc.
2013-03-15 12:55:30 +00:00
attilio
bf1dc90446 MFC 2013-03-08 00:03:07 +00:00
eadler
a0bd41720a Remove check for NULL prior to free(9) and m_freem(9).
Approved by:	cperciva (mentor)
2013-03-04 02:21:34 +00:00
pjd
369ed4d4ad Regen after r247667. 2013-03-02 21:12:54 +00:00
pjd
702516e70b - Implement two new system calls:
int bindat(int fd, int s, const struct sockaddr *addr, socklen_t addrlen);
	int connectat(int fd, int s, const struct sockaddr *name, socklen_t namelen);

  which allow to bind and connect respectively to a UNIX domain socket with a
  path relative to the directory associated with the given file descriptor 'fd'.

- Add manual pages for the new syscalls.

- Make the new syscalls available for processes in capability mode sandbox.

- Add capability rights CAP_BINDAT and CAP_CONNECTAT that has to be present on
  the directory descriptor for the syscalls to work.

- Update audit(4) to support those two new syscalls and to handle path
  in sockaddr_un structure relative to the given directory descriptor.

- Update procstat(1) to recognize the new capability rights.

- Document the new capability rights in cap_rights_limit(2).

Sponsored by:	The FreeBSD Foundation
Discussed with:	rwatson, jilles, kib, des
2013-03-02 21:11:30 +00:00
attilio
e98f58faf6 MFC 2013-03-02 14:48:41 +00:00
pjd
48e0f13795 Regen after r247602. 2013-03-02 00:55:09 +00:00
pjd
f07ebb8888 Merge Capsicum overhaul:
- Capability is no longer separate descriptor type. Now every descriptor
  has set of its own capability rights.

- The cap_new(2) system call is left, but it is no longer documented and
  should not be used in new code.

- The new syscall cap_rights_limit(2) should be used instead of
  cap_new(2), which limits capability rights of the given descriptor
  without creating a new one.

- The cap_getrights(2) syscall is renamed to cap_rights_get(2).

- If CAP_IOCTL capability right is present we can further reduce allowed
  ioctls list with the new cap_ioctls_limit(2) syscall. List of allowed
  ioctls can be retrived with cap_ioctls_get(2) syscall.

- If CAP_FCNTL capability right is present we can further reduce fcntls
  that can be used with the new cap_fcntls_limit(2) syscall and retrive
  them with cap_fcntls_get(2).

- To support ioctl and fcntl white-listing the filedesc structure was
  heavly modified.

- The audit subsystem, kdump and procstat tools were updated to
  recognize new syscalls.

- Capability rights were revised and eventhough I tried hard to provide
  backward API and ABI compatibility there are some incompatible changes
  that are described in detail below:

	CAP_CREATE old behaviour:
	- Allow for openat(2)+O_CREAT.
	- Allow for linkat(2).
	- Allow for symlinkat(2).
	CAP_CREATE new behaviour:
	- Allow for openat(2)+O_CREAT.

	Added CAP_LINKAT:
	- Allow for linkat(2). ABI: Reuses CAP_RMDIR bit.
	- Allow to be target for renameat(2).

	Added CAP_SYMLINKAT:
	- Allow for symlinkat(2).

	Removed CAP_DELETE. Old behaviour:
	- Allow for unlinkat(2) when removing non-directory object.
	- Allow to be source for renameat(2).

	Removed CAP_RMDIR. Old behaviour:
	- Allow for unlinkat(2) when removing directory.

	Added CAP_RENAMEAT:
	- Required for source directory for the renameat(2) syscall.

	Added CAP_UNLINKAT (effectively it replaces CAP_DELETE and CAP_RMDIR):
	- Allow for unlinkat(2) on any object.
	- Required if target of renameat(2) exists and will be removed by this
	  call.

	Removed CAP_MAPEXEC.

	CAP_MMAP old behaviour:
	- Allow for mmap(2) with any combination of PROT_NONE, PROT_READ and
	  PROT_WRITE.
	CAP_MMAP new behaviour:
	- Allow for mmap(2)+PROT_NONE.

	Added CAP_MMAP_R:
	- Allow for mmap(PROT_READ).
	Added CAP_MMAP_W:
	- Allow for mmap(PROT_WRITE).
	Added CAP_MMAP_X:
	- Allow for mmap(PROT_EXEC).
	Added CAP_MMAP_RW:
	- Allow for mmap(PROT_READ | PROT_WRITE).
	Added CAP_MMAP_RX:
	- Allow for mmap(PROT_READ | PROT_EXEC).
	Added CAP_MMAP_WX:
	- Allow for mmap(PROT_WRITE | PROT_EXEC).
	Added CAP_MMAP_RWX:
	- Allow for mmap(PROT_READ | PROT_WRITE | PROT_EXEC).

	Renamed CAP_MKDIR to CAP_MKDIRAT.
	Renamed CAP_MKFIFO to CAP_MKFIFOAT.
	Renamed CAP_MKNODE to CAP_MKNODEAT.

	CAP_READ old behaviour:
	- Allow pread(2).
	- Disallow read(2), readv(2) (if there is no CAP_SEEK).
	CAP_READ new behaviour:
	- Allow read(2), readv(2).
	- Disallow pread(2) (CAP_SEEK was also required).

	CAP_WRITE old behaviour:
	- Allow pwrite(2).
	- Disallow write(2), writev(2) (if there is no CAP_SEEK).
	CAP_WRITE new behaviour:
	- Allow write(2), writev(2).
	- Disallow pwrite(2) (CAP_SEEK was also required).

	Added convinient defines:

	#define	CAP_PREAD		(CAP_SEEK | CAP_READ)
	#define	CAP_PWRITE		(CAP_SEEK | CAP_WRITE)
	#define	CAP_MMAP_R		(CAP_MMAP | CAP_SEEK | CAP_READ)
	#define	CAP_MMAP_W		(CAP_MMAP | CAP_SEEK | CAP_WRITE)
	#define	CAP_MMAP_X		(CAP_MMAP | CAP_SEEK | 0x0000000000000008ULL)
	#define	CAP_MMAP_RW		(CAP_MMAP_R | CAP_MMAP_W)
	#define	CAP_MMAP_RX		(CAP_MMAP_R | CAP_MMAP_X)
	#define	CAP_MMAP_WX		(CAP_MMAP_W | CAP_MMAP_X)
	#define	CAP_MMAP_RWX		(CAP_MMAP_R | CAP_MMAP_W | CAP_MMAP_X)
	#define	CAP_RECV		CAP_READ
	#define	CAP_SEND		CAP_WRITE

	#define	CAP_SOCK_CLIENT \
		(CAP_CONNECT | CAP_GETPEERNAME | CAP_GETSOCKNAME | CAP_GETSOCKOPT | \
		 CAP_PEELOFF | CAP_RECV | CAP_SEND | CAP_SETSOCKOPT | CAP_SHUTDOWN)
	#define	CAP_SOCK_SERVER \
		(CAP_ACCEPT | CAP_BIND | CAP_GETPEERNAME | CAP_GETSOCKNAME | \
		 CAP_GETSOCKOPT | CAP_LISTEN | CAP_PEELOFF | CAP_RECV | CAP_SEND | \
		 CAP_SETSOCKOPT | CAP_SHUTDOWN)

	Added defines for backward API compatibility:

	#define	CAP_MAPEXEC		CAP_MMAP_X
	#define	CAP_DELETE		CAP_UNLINKAT
	#define	CAP_MKDIR		CAP_MKDIRAT
	#define	CAP_RMDIR		CAP_UNLINKAT
	#define	CAP_MKFIFO		CAP_MKFIFOAT
	#define	CAP_MKNOD		CAP_MKNODAT
	#define	CAP_SOCK_ALL		(CAP_SOCK_CLIENT | CAP_SOCK_SERVER)

Sponsored by:	The FreeBSD Foundation
Reviewed by:	Christoph Mallon <christoph.mallon@gmx.de>
Many aspects discussed with:	rwatson, benl, jonathan
ABI compatibility discussed with:	kib
2013-03-02 00:53:12 +00:00
delphij
b1482c7ae7 Fix wrong assignment.
Submitted by:	Sascha Wildner <saw online de>
Obtained from:	DragonFly rev 9568dd07a22a136e380e6c19a8ea188eb92976d5
MFC after:	2 weeks
2013-03-01 23:21:18 +00:00
attilio
15bf891afe Rename VM_OBJECT_LOCK(), VM_OBJECT_UNLOCK() and VM_OBJECT_TRYLOCK() to
their "write" versions.

Sponsored by:	EMC / Isilon storage division
2013-02-20 12:03:20 +00:00
jhb
2617d9f095 Reduce duplication between i386/linux/linux.h and amd64/linux32/linux.h
by moving bits that are MI out into headers in compat/linux.

Reviewed by:	Chagin Dmitry  dmitry | gmail
MFC after:	2 weeks
2013-01-29 18:41:30 +00:00
dchagin
fa34eceef7 Arithmetic on pointers takes into account the size of the type. Properly cast the pointer to avoid incorrect pointer scaling.
MFC after:	1 Week
2013-01-25 14:40:54 +00:00
jhb
af17a55dfd Don't assume that all Linux TCP-level socket options are identical to
FreeBSD TCP-level socket options (only the first two are).  Instead,
using a mapping function and fail unsupported options as we do for other
socket option levels.

MFC after:	2 weeks
2013-01-23 21:44:48 +00:00
glebius
8e20fa5ae9 Mechanically substitute flags from historic mbuf allocator with
malloc(9) flags within sys.

Exceptions:

- sys/contrib not touched
- sys/mbuf.h edited manually
2012-12-05 08:04:20 +00:00
cperciva
748c98fc62 MFS security patches which seem to have accidentally not reached HEAD:
Fix insufficient message length validation for EAP-TLS messages.

Fix Linux compatibility layer input validation error.

Security:	FreeBSD-SA-12:07.hostapd
Security:	FreeBSD-SA-12:08.linux
Security:	CVE-2012-4445, CVE-2012-4576
With hat:	so@
2012-11-23 01:48:31 +00:00
kib
de90907af2 Style fixes for r242958.
Reported and reviewed by:	bde
MFC after:	28 days
2012-11-16 06:22:14 +00:00
kib
63c9e066e5 Regen 2012-11-13 12:53:41 +00:00
kib
1409e8df20 Add the wait6(2) system call. It takes POSIX waitid()-like process
designator to select a process which is waited for. The system call
optionally returns siginfo_t which would be otherwise provided to
SIGCHLD handler, as well as extended structure accounting for child
and cumulative grandchild resource usage.

Allow to get the current rusage information for non-exited processes
as well, similar to Solaris.

The explicit WEXITED flag is required to wait for exited processes,
allowing for more fine-grained control of the events the waiter is
interested in.

Fix the handling of siginfo for WNOWAIT option for all wait*(2)
family, by not removing the queued signal state.

PR:	standards/170346
Submitted by:	"Jukka A. Ukkonen" <jau@iki.fi>
MFC after:	1 month
2012-11-13 12:52:31 +00:00
kib
f16ea99007 The r241025 fixed the case when a binary, executed from nullfs mount,
was still possible to open for write from the lower filesystem.  There
is a symmetric situation where the binary could already has file
descriptors opened for write, but it can be executed from the nullfs
overlay.

Handle the issue by passing one v_writecount reference to the lower
vnode if nullfs vnode has non-zero v_writecount.  Note that only one
write reference can be donated, since nullfs only keeps one use
reference on the lower vnode.  Always use the lower vnode v_writecount
for the checks.

Introduce the VOP_GET_WRITECOUNT to read v_writecount, which is
currently always bypassed to the lower vnode, and VOP_ADD_WRITECOUNT
to manipulate the v_writecount value, which manages a single bypass
reference to the lower vnode.  Caling the VOPs instead of directly
accessing v_writecount provide the fix described in the previous
paragraph.

Tested by:	pho
MFC after:	3 weeks
2012-11-02 13:56:36 +00:00
kib
560aa751e0 Remove the support for using non-mpsafe filesystem modules.
In particular, do not lock Giant conditionally when calling into the
filesystem module, remove the VFS_LOCK_GIANT() and related
macros. Stop handling buffers belonging to non-mpsafe filesystems.

The VFS_VERSION is bumped to indicate the interface change which does
not result in the interface signatures changes.

Conducted and reviewed by:	attilio
Tested by:	pho
2012-10-22 17:50:54 +00:00
kevlo
ceb08698f2 Revert previous commit...
Pointyhat to:	kevlo (myself)
2012-10-10 08:36:38 +00:00
kevlo
8747a46991 Prefer NULL over 0 for pointers 2012-10-09 08:27:40 +00:00
kib
8f845e475e Fix the mis-handling of the VV_TEXT on the nullfs vnodes.
If you have a binary on a filesystem which is also mounted over by
nullfs, you could execute the binary from the lower filesystem, or
from the nullfs mount. When executed from lower filesystem, the lower
vnode gets VV_TEXT flag set, and the file cannot be modified while the
binary is active. But, if executed as the nullfs alias, only the
nullfs vnode gets VV_TEXT set, and you still can open the lower vnode
for write.

Add a set of VOPs for the VV_TEXT query, set and clear operations,
which are correctly bypassed to lower vnode.

Tested by:	pho (previous version)
MFC after:	2 weeks
2012-09-28 11:25:02 +00:00
kevlo
ce4326d001 Remove redundant check 2012-09-12 10:12:03 +00:00
jhb
e8b429e1c0 Remove some more NetBSD compat shims and other unused bits from these
drivers:
- Remove scsi_low_pisa.*, they were unused.
- Remove <compat/netbsd/physio_proc.h> and calls to the stubs in that
  header.  They were empty nops.
- Retire sl_xname and use device_get_nameunit() and device_printf() with
  the underlying device_t instead.
- Remove unused {ct,ncv,nsp,stg}print() functions.
- Remove empty SOFT_INTR_REQUIRED() macro and the unused sl_irq member.
2012-09-10 18:49:49 +00:00
davidxu
b788233f5b regen. 2012-08-17 02:47:16 +00:00
davidxu
3f0806aa1f Implement syscall clock_getcpuclockid2, so we can get a clock id
for process, thread or others we want to support.
Use the syscall to implement POSIX API clock_getcpuclock and
pthread_getcpuclockid.

PR:	168417
2012-08-17 02:26:31 +00:00
kib
9f82bec8aa Regenerate. 2012-08-15 15:18:20 +00:00
kib
b2d9fb2c07 Provide 32bit compat for truncate(2) and ftruncate(2).
MFC after:	1 week
2012-08-15 15:17:56 +00:00
kib
876d70b046 Regenerate. 2012-08-14 12:09:36 +00:00
kib
bc5359fb02 Implement the old mmap syscall for compat32, when COMPAT_43 option is
enabled. The syscall is used by FreeBSD 1.1.5.1 dynamic linker.

MFC after:	1 week
2012-08-14 12:09:09 +00:00
kib
6253150288 Cosmetics: define FREEBSD32_MINUSER and AOUT32_MINUSER for struct
sysentvec .sv_minuser. Also improve style.

Submitted by:	Oliver Pinter <oliver.pinter@gmail.com>
MFC after:	1 week
2012-07-22 13:41:45 +00:00
kib
53224f018a Extend the KPI to lock and unlock f_offset member of struct file. It
now fully encapsulates all accesses to f_offset, and extends f_offset
locking to other consumers that need it, in particular, to lseek() and
variants of getdirentries().

Ensure that on 32bit architectures f_offset, which is 64bit quantity,
always read and written under the mtxpool protection. This fixes
apparently easy to trigger race when parallel lseek()s or lseek() and
read/write could destroy file offset.

The already broken ABI emulations, including iBCS and SysV, are not
converted (yet).

Tested by:	pho
No objections from:	jhb
MFC after:    3 weeks
2012-07-02 21:01:03 +00:00
kevlo
07ebfe1b9c Make sure that each va_start has one and only one matching va_end,
especially in error cases.
2012-05-29 01:48:06 +00:00
kib
cae6484163 Fix ki_cow for compat32 binaries.
MFC after:	3 days
2012-05-27 05:24:53 +00:00
ed
0d9131d0d0 Regenerate system call tables. 2012-05-25 21:52:57 +00:00
ed
55e4d6365d Remove use of non-ISO-C integer types from system call tables.
These files already use ISO-C-style integer types, so make them less
inconsistent by preferring the standard types.
2012-05-25 21:50:48 +00:00
gleb
3c7243df78 Add kern_fhstat(), adjust sys_fhstat() to use it.
Extend kern_getdirentries() to accept uio segflag and optionally return
buffer residue.

Sponsored by:	Google Summer of Code 2011
2012-05-24 08:00:26 +00:00
netchild
9895b5ca9d - >500 static DTrace probes for the linuxulator
- DTrace scripts to check for errors, performance, ...
  they serve mostly as examples of what you can do with the static probe;s
  with moderate load the scripts may be overwhelmed, excessive lock-tracing
  may influence program behavior (see the last design decission)

Design decissions:
 - use "linuxulator" as the provider for the native bitsize; add the
   bitsize for the non-native emulation (e.g. "linuxuator32" on amd64)
 - Add probes only for locks which are acquired in one function and released
   in another function. Locks which are aquired and released in the same
   function should be easy to pair in the code, inter-function
   locking is more easy to verify in DTrace.
 - Probes for locks should be fired after locking and before releasing to
   prevent races (to provide data/function stability in DTrace, see the
   man-page of "dtrace -v ..." and the corresponding DTrace docs).
2012-05-05 19:42:38 +00:00
jkim
e210f689a8 - Implement pipe2 syscall for Linuxulator. This syscall appeared in 2.6.27
but GNU libc used it without checking its kernel version, e. g., Fedora 10.
- Move pipe(2) implementation for Linuxulator from MD files to MI file,
sys/compat/linux/linux_file.c.  There is no MD code for this syscall at all.
- Correct an argument type for pipe() from l_ulong * to l_int *.  Probably
this was the source of MI/MD confusion.

Reviewed by:	emulation
2012-04-16 21:22:02 +00:00
tijl
212d562cf9 Remove some unnecessary includes. 2012-03-18 19:15:11 +00:00
tijl
35c7447060 Eliminate ia32_reg.h by moving its contents to x86 and ia64 reg.h.
Reviewed by:	kib
2012-03-18 19:12:11 +00:00
tijl
2bf580ea66 Copy i386 reg.h to x86 and merge with amd64 reg.h. Replace i386/amd64/pc98
reg.h with stubs.

The tREGISTER macros are only made visible on i386. These macros are
deprecated and should not be available on amd64.

The i386 and amd64 versions of struct reg have been renamed to struct
__reg32 and struct __reg64. During compilation either __reg32 or __reg64
is defined as reg depending on the machine architecture. On amd64 the i386
struct is also available as struct reg32 which is used in COMPAT_FREEBSD32
code.

Most of compat/ia32/ia32_reg.h is now IA64 only.

Reviewed by:	kib (previous version)
2012-03-18 19:06:38 +00:00
tijl
9c671fcaca Move userland bits of i386 npx.h and amd64 fpu.h to x86 fpu.h.
Remove FPU types from compat/ia32/ia32_reg.h that are no longer needed.
Create machine/npx.h on amd64 to allow compiling i386 code that uses
this header.

The original npx.h and fpu.h define struct envxmm differently. Both
definitions have been included in the new x86 header as struct __envxmm32
and struct __envxmm64. During compilation either __envxmm32 or __envxmm64
is defined as envxmm depending on machine architecture. On amd64 the i386
struct is also available as struct envxmm32.

Reviewed by:	kib
2012-03-16 20:24:30 +00:00
brucec
88520fea69 Fix race condition in KfRaiseIrql().
After getting the current irql, if the kthread gets preempted and
subsequently runs on a different CPU, the saved irql could be wrong.

Also, correct the panic string.

PR:		kern/165630
Submitted by:	Vladislav Movchan <vladislav.movchan at gmail.com>
2012-03-04 17:08:43 +00:00
jmallett
6485e73b87 On MIPS, _ALIGN always aligns to 8 bytes, even for 32-bit binaries. This might
not be ideal, but is the ABI we've shipped so far.  Fix macros which reflect
the results of _ALIGN on 32-bit MIPS to use the right alignment.

This fixes sendmsg under COMPAT_FREEBSD32 on n64 MIPS kernels.
2012-03-03 21:39:12 +00:00
jmallett
50c253779f o) Add COMPAT_FREEBSD32 support for MIPS kernels using the n64 ABI with userlands
using the o32 ABI.  This mostly follows nwhitehorn's lead in implementing
   COMPAT_FREEBSD32 on powerpc64.
o) Add a new type to the freebsd32 compat layer, time32_t, which is time_t in the
   32-bit ABI being used.  Since the MIPS port is relatively-new, even the 32-bit
   ABIs use a 64-bit time_t.
o) Because time{spec,val}32 has the same size and layout as time{spec,val} on MIPS
   with 32-bit compatibility, then, disable some code which assumes otherwise
   wrongly when built for MIPS.  A more general macro to check in this case would
   seem like a good idea eventually.  If someone adds support for using n32
   userland with n64 kernels on MIPS, then they will have to add a variety of
   flags related to each piece of the ABI that can vary.  That's probably the
   right time to generalize further.
o) Add MIPS to the list of architectures which use PAD64_REQUIRED in the
   freebsd32 compat code.  Probably this should be generalized at some point.

Reviewed by:	gonzo
2012-03-03 08:19:18 +00:00
mm
77766742e1 Add procfs to jail-mountable filesystems.
Reviewed by:	jamie
MFC after:	1 week
2012-02-29 00:30:18 +00:00
kib
80ae8fe82c Fix found places where uio_resid is truncated to int.
Add the sysctl debug.iosize_max_clamp, enabled by default. Setting the
sysctl to zero allows to perform the SSIZE_MAX-sized i/o requests from
the usermode.

Discussed with:	bde, das (previous versions)
MFC after:	1 month
2012-02-21 01:05:12 +00:00
kib
abd1094f17 Fix misuse of the kernel map in miscellaneous image activators.
Vnode-backed mappings cannot be put into the kernel map, since it is a
system map.

Use exec_map for transient mappings, and remove the mappings with
kmem_free_wakeup() to notify the waiters on available map space.

Do not map the whole executable into KVA at all to copy it out into
usermode.  Directly use vn_rdwr() for the case of not page aligned
binary.

There is one place left where the potentially unbounded amount of data
is mapped into exec_map, namely, in the COFF image activator
enumeration of the needed shared libraries.

Reviewed by:   alc
MFC after:     2 weeks
2012-02-17 23:47:16 +00:00
ed
28b4a002d6 Remove direct access to si_name.
Code should just use the devtoname() function to obtain the name of a
character device. Also add const keywords to pieces of code that need it
to build properly.

MFC after:	2 weeks
2012-02-10 12:35:57 +00:00
davidxu
c4909ace45 Add 32-bit compat code for AIO kevent flags introduced in revision 230857. 2012-02-05 04:49:31 +00:00
kib
361bfae5c2 Add support for the extended FPU states on amd64, both for native
64bit and 32bit ABIs.  As a side-effect, it enables AVX on capable
CPUs.

In particular:

- Query the CPU support for XSAVE, list of the supported extensions
  and the required size of FPU save area. The hw.use_xsave tunable is
  provided for disabling XSAVE, and hw.xsave_mask may be used to
  select the enabled extensions.

- Remove the FPU save area from PCB and dynamically allocate the
  (run-time sized) user save area on the top of the kernel stack,
  right above the PCB. Reorganize the thread0 PCB initialization to
  postpone it after BSP is queried for save area size.

- The dumppcb, stoppcbs and susppcbs now do not carry the FPU state as
  well. FPU state is only useful for suspend, where it is saved in
  dynamically allocated suspfpusave area.

- Use XSAVE and XRSTOR to save/restore FPU state, if supported and
  enabled.

- Define new mcontext_t flag _MC_HASFPXSTATE, indicating that
  mcontext_t has a valid pointer to out-of-struct extended FPU
  state. Signal handlers are supplied with stack-allocated fpu
  state. The sigreturn(2) and setcontext(2) syscall honour the flag,
  allowing the signal handlers to inspect and manipilate extended
  state in the interrupted context.

- The getcontext(2) never returns extended state, since there is no
  place in the fixed-sized mcontext_t to place variable-sized save
  area. And, since mcontext_t is embedded into ucontext_t, makes it
  impossible to fix in a reasonable way.  Instead of extending
  getcontext(2) syscall, provide a sysarch(2) facility to query
  extended FPU state.

- Add ptrace(2) support for getting and setting extended state; while
  there, implement missed PT_I386_{GET,SET}XMMREGS for 32bit binaries.

- Change fpu_kern KPI to not expose struct fpu_kern_ctx layout to
  consumers, making it opaque. Internally, struct fpu_kern_ctx now
  contains a space for the extended state. Convert in-kernel consumers
  of fpu_kern KPI both on i386 and amd64.

First version of the support for AVX was submitted by Tim Bird
<tim.bird am sony com> on behalf of Sony. This version was written
from scratch.

Tested by:	pho (previous version), Yamagi Burmeister <lists yamagi org>
MFC after:	1 month
2012-01-21 17:45:27 +00:00
mckusick
af2e331939 Make sure all intermediate variables holding mount flags (mnt_flag)
and that all internal kernel calls passing mount flags are declared
as uint64_t so that flags in the top 32-bits are not lost.

MFC after: 2 weeks
2012-01-17 01:08:01 +00:00
trociny
d4e71152bd Abrogate nchr argument in proc_getargv() and proc_getenvv(): we always want
to read strings completely to know the actual size.

As a side effect it fixes the issue with kern.proc.args and kern.proc.env
sysctls, which didn't return the size of available data when calling
sysctl(3) with the NULL argument for oldp.

Note, in get_ps_strings(), which does actual work for proc_getargv() and
proc_getenvv(), we still have a safety limit on the size of data read in
case of a corrupted procces stack.

Suggested by:	kib
MFC after:	3 days
2012-01-15 18:47:24 +00:00
uqs
d61d88a310 Convert files to UTF-8 2012-01-15 13:23:18 +00:00
dim
d106f2fd7c In sys/compat/linux/linux_ioctl.c, work around a warning when a pointer
is compared to an integer, by casting the pointer to l_uintptr_t.  No
functional difference on both i386 and amd64.

Reviewed by:	ed, jhb
MFC after:	1 week
2012-01-03 18:49:39 +00:00
dim
e28679ec14 In sys/compat/ndis/subr_ntoskrnl.c, change the RtlFillMemory function
definition from K&R to ANSI, to avoid a clang warning about the uint8_t
parameter being promoted to int, which is not compatible with the type
declared in the earlier prototype.

MFC after:	1 week
2011-12-30 17:18:09 +00:00
jhb
cab9e82ffb Implement linux_fadvise64() and linux_fadvise64_64() using
kern_posix_fadvise().

Reviewed by:	silence on emulation@
MFC after:	2 weeks
2011-12-29 15:34:59 +00:00
trociny
783806918a Protect process environment variables with p_candebug().
Discussed with:	jilles, kib, rwatson
MFC after:	2 weeks
2011-12-04 21:43:13 +00:00
trociny
96623b87d9 Retire linprocfs_doargv(). Instead use new functions, proc_getargv()
and proc_getenvv(), which were implemented using linprocfs_doargv() as
a reference.

Suggested by:	kib
Reviewed by:	kib
Approved by:	des (linprocfs maintainer)
MFC after:	2 weeks
2011-11-22 20:45:11 +00:00
lstewart
cca3084242 - Add the ffclock_getcounter(), ffclock_getestimate() and ffclock_setestimate()
system calls to provide feed-forward clock management capabilities to
  userspace processes. ffclock_getcounter() returns the current value of the
  kernel's feed-forward clock counter. ffclock_getestimate() returns the current
  feed-forward clock parameter estimates and ffclock_setestimate() updates the
  feed-forward clock parameter estimates.

- Document the syscalls in the ffclock.2 man page.

- Regenerate the script-derived syscall related files.

Committed on behalf of Julien Ridoux and Darryl Veitch from the University of
Melbourne, Australia, as part of the FreeBSD Foundation funded "Feed-Forward
Clock Synchronization Algorithms" project.

For more information, see http://www.synclab.org/radclock/

Submitted by:	Julien Ridoux (jridoux at unimelb edu au)
2011-11-21 01:26:10 +00:00
ed
98496492d6 Make the Linux *at() calls a bit more complete.
Properly support:

- AT_EACCESS for faccessat(),
- AT_SYMLINK_FOLLOW for linkat().
2011-11-19 07:19:37 +00:00
ed
914a7cfed1 Regenerate system call tables. 2011-11-19 06:36:11 +00:00
ed
9cedd4d52c Improve *access*() parameter name consistency.
The current code mixes the use of `flags' and `mode'. This is a bit
confusing, since the faccessat() function as a `flag' parameter to store
the AT_ flag.

Make this less confusing by using the same name as used in the POSIX
specification -- `amode'.
2011-11-19 06:35:15 +00:00
jhb
9315d9d42c - Split out a kern_posix_fadvise() from the posix_fadvise() system call so
it can be used by in-kernel consumers.
- Make kern_posix_fallocate() public.
- Use kern_posix_fadvise() and kern_posix_fallocate() to implement the
  freebsd32 wrappers for the two system calls.
2011-11-14 18:00:15 +00:00
pluknet
b90a6bf12b struct timespec32: change types of tv_sec and tv_nsec fields to signed
to match native struct timespec ABI on __LP32__.

This change is a prerequisite for upcoming futimens()/utimensat() in whose
implementations it is assumed that timespec32 can take a negative value.

MFC after:	1 week
2011-11-11 07:17:00 +00:00
rstone
f6710005c7 Correct the types of the arguments to return probes of the syscall
provider.  Previously we were erroneously supplying the argument types of
the corresponding entry probe.

Reviewed by:	rpaulo
MFC after:	1 week
2011-11-11 03:49:42 +00:00
ed
0c56cf839d Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.
The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.
2011-11-07 15:43:11 +00:00
ed
e97eae1577 Mark MALLOC_DEFINEs static that have no corresponding MALLOC_DECLAREs.
This means that their use is restricted to a single C file.
2011-11-07 06:44:47 +00:00
jhb
767a02dc14 Regen. 2011-11-04 04:06:31 +00:00
jhb
78c075174e Add the posix_fadvise(2) system call. It is somewhat similar to
madvise(2) except that it operates on a file descriptor instead of a
memory region.  It is currently only supported on regular files.

Just as with madvise(2), the advice given to posix_fadvise(2) can be
divided into two types.  The first type provide hints about data access
patterns and are used in the file read and write routines to modify the
I/O flags passed down to VOP_READ() and VOP_WRITE().  These modes are
thus filesystem independent.  Note that to ease implementation (and
since this API is only advisory anyway), only a single non-normal
range is allowed per file descriptor.

The second type of hints are used to hint to the OS that data will or
will not be used.  These hints are implemented via a new VOP_ADVISE().
A default implementation is provided which does nothing for the WILLNEED
request and attempts to move any clean pages to the cache page queue for
the DONTNEED request.  This latter case required two other changes.
First, a new V_CLEANONLY flag was added to vinvalbuf().  This requests
vinvalbuf() to only flush clean buffers for the vnode from the buffer
cache and to not remove any backing pages from the vnode.  This is
used to ensure clean pages are not wired into the buffer cache before
attempting to move them to the cache page queue.  The second change adds
a new vm_object_page_cache() method.  This method is somewhat similar to
vm_object_page_remove() except that instead of freeing each page in the
specified range, it attempts to move clean pages to the cache queue if
possible.

To preserve the ABI of struct file, the f_cdevpriv pointer is now reused
in a union to point to the currently active advice region if one is
present for regular files.

Reviewed by:	jilles, kib, arch@
Approved by:	re (kib)
MFC after:	1 month
2011-11-04 04:02:50 +00:00
kib
8e118d38cf Control the execution permission of the readable segments for
i386 binaries on the amd64 and ia64 with the sysctl, instead of
unconditionally enabling it.

Reviewed by:	marcel
2011-10-15 12:35:18 +00:00
jhb
88abcce51b Regen. 2011-10-14 11:47:14 +00:00
jhb
58bffa17ce Use PAIR32TO64() for the offset and length parameters to
freebsd32_posix_fallocate() to properly handle big-endian platforms.

Reviewed by:	mdf
MFC after:	1 week
2011-10-14 11:46:46 +00:00
marcel
00779d76a5 Use PTRIN(). 2011-10-13 22:33:03 +00:00
marcel
9d926ddf66 Wrap mprotect(2) so that we can add execute permissions when read
permissions are requested. This is needed on amd64 and ia64 for
JDK 1.4.x
2011-10-13 18:25:10 +00:00
marcel
86b6ac698d Wrap mprotect(2) 2011-10-13 18:21:11 +00:00
marcel
d23bcad30d In freebsd32_mmap() and when compiling for amd64 or ia64, also
ask for execute permissions when read permissions are wanted.
This is needed for JDK 1.4.x on i386.
2011-10-13 18:18:42 +00:00
brueffer
35a20a3582 Add curly braces missed in r226247.
Pointy hat to:	brueffer
Submitted by:	many
MFC after:	1 week
2011-10-11 13:40:37 +00:00
brueffer
cc0ce9bd6a Properly free linux_gidset in case of an error.
CID:		4136
Found with:	Coverity Prevent(tm)
MFC after:	1 week
2011-10-11 10:32:23 +00:00
jkim
f3e29d5483 Use the caculated length instead of maximum length. 2011-10-06 21:55:05 +00:00
jkim
5bf71ef7d8 Remove a now-defunct variable. 2011-10-06 21:40:08 +00:00
jkim
f18be12037 Use uint32_t instead of u_int32_t. Fix style(9) nits. 2011-10-06 21:17:46 +00:00
jkim
40f14199fb Make sure to ignore the leading NULL byte from Linux abstract namespace. 2011-10-06 21:09:28 +00:00
jkim
bd4e9fe2ca Restore the original socket address length if it was not really AF_INET6. 2011-10-06 20:48:23 +00:00
jkim
55a4bbebe3 Retern more appropriate errno when Linux path name is too long. 2011-10-06 20:28:08 +00:00
jkim
cf76cae978 Inline do_sa_get() function and remove an unused return value. 2011-10-06 20:20:30 +00:00
jkim
9a911728e7 Unroll inlined strnlen(9) and make it easier to read. No functional change. 2011-10-06 19:59:14 +00:00
cperciva
1485b2ad01 Fix a bug in UNIX socket handling in the linux emulator which was
exposed by the security fix in FreeBSD-SA-11:05.unix.

Approved by:	so (cperciva)
Approved by:	re (kib)
Security:	Related to FreeBSD-SA-11:05.unix, but not actually
		a security fix.
2011-10-04 19:07:38 +00:00
kmacy
8bc0044e86 Auto-generated code from sys_ prefixing makesyscalls.sh change
Approved by:	re(bz)
2011-09-16 14:04:14 +00:00
kmacy
99851f359e In order to maximize the re-usability of kernel code in user space this
patch modifies makesyscalls.sh to prefix all of the non-compatibility
calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel
entry points and all places in the code that use them. It also
fixes an additional name space collision between the kernel function
psignal and the libc function of the same name by renaming the kernel
psignal kern_psignal(). By introducing this change now we will ease future
MFCs that change syscalls.

Reviewed by:	rwatson
Approved by:	re (bz)
2011-09-16 13:58:51 +00:00
jonathan
5ecd1c9d40 Add experimental support for process descriptors
A "process descriptor" file descriptor is used to manage processes
without using the PID namespace. This is required for Capsicum's
Capability Mode, where the PID namespace is unavailable.

New system calls pdfork(2) and pdkill(2) offer the functional equivalents
of fork(2) and kill(2). pdgetpid(2) allows querying the PID of the remote
process for debugging purposes. The currently-unimplemented pdwait(2) will,
in the future, allow querying rusage/exit status. In the interim, poll(2)
may be used to check (and wait for) process termination.

When a process is referenced by a process descriptor, it does not issue
SIGCHLD to the parent, making it suitable for use in libraries---a common
scenario when using library compartmentalisation from within large
applications (such as web browsers). Some observers may note a similarity
to Mach task ports; process descriptors provide a subset of this behaviour,
but in a UNIX style.

This feature is enabled by "options PROCDESC", but as with several other
Capsicum kernel features, is not enabled by default in GENERIC 9.0.

Reviewed by: jhb, kib
Approved by: re (kib), mentor (rwatson)
Sponsored by: Google Inc
2011-08-18 22:51:30 +00:00
rwatson
4af919b491 Second-to-last commit implementing Capsicum capabilities in the FreeBSD
kernel for FreeBSD 9.0:

Add a new capability mask argument to fget(9) and friends, allowing system
call code to declare what capabilities are required when an integer file
descriptor is converted into an in-kernel struct file *.  With options
CAPABILITIES compiled into the kernel, this enforces capability
protection; without, this change is effectively a no-op.

Some cases require special handling, such as mmap(2), which must preserve
information about the maximum rights at the time of mapping in the memory
map so that they can later be enforced in mprotect(2) -- this is done by
narrowing the rights in the existing max_protection field used for similar
purposes with file permissions.

In namei(9), we assert that the code is not reached from within capability
mode, as we're not yet ready to enforce namespace capabilities there.
This will follow in a later commit.

Update two capability names: CAP_EVENT and CAP_KEVENT become
CAP_POST_KEVENT and CAP_POLL_KEVENT to more accurately indicate what they
represent.

Approved by:	re (bz)
Submitted by:	jonathan
Sponsored by:	Google Inc
2011-08-11 12:30:23 +00:00
kib
96a4fe50dc Implement the linprocfs swaps file, providing information about the
configured swap devices in the Linux-compatible format.

Based on the submission by:	Robert Millan <rmh debian org>
PR:	kern/159281
Reviewed by:	bde
Approved by:	re (kensmith)
MFC after:	2 weeks
2011-08-01 19:12:15 +00:00
bz
1a8cc2bad9 Rename ki_ocomm to ki_tdname and OCOMMLEN to TDNAMLEN.
Provide backward compatibility defines under BURN_BRIDGES.

Suggested by:	jhb
Reviewed by:	emaste
Sponsored by:	Sandvine Incorporated
Approved by:	re (kib)
2011-07-18 20:06:15 +00:00
marck
2f33209e13 Correct small typo in a do{}while(0) define
Approved by:	kib
MFC after:	2 weeks
2011-07-17 17:12:17 +00:00
bz
0b1276123e Remove the 'either' from the comment as it'll be less obvious that we
removed semmap in a bit of time from now.   Re-wrap.

Suggested by:	jhb
2011-07-17 05:33:22 +00:00
jonathan
5132d7b9f3 Auto-generated system call code with cap_new(), cap_getrights().
Approved by: mentor (rwatson), re (Capsicum blanket)
Sponsored by: Google Inc
2011-07-15 18:33:12 +00:00
jonathan
4ec3aaddb5 Add cap_new() and cap_getrights() system calls.
Implement two previously-reserved Capsicum system calls:
- cap_new() creates a capability to wrap an existing file descriptor
- cap_getrights() queries the rights mask of a capability.

Approved by: mentor (rwatson), re (Capsicum blanket)
Sponsored by: Google Inc
2011-07-15 18:26:19 +00:00
bz
8448ba638c Remove semaphore map entry count "semmap" field and its tuning
option that is highly recommended to be adjusted in too much
documentation while doing nothing in FreeBSD since r2729 (rev 1.1).

ipcs(1) needs to be recompiled as it is accessing _KERNEL private
variables.

Reviewed by:	jhb (before comment change on linux code)
Sponsored by:	Sandvine Incorporated
2011-07-14 14:18:14 +00:00
pluknet
108671f810 Return empty cmdline/environ string for processes with kernel address
space. This is consistent with the behavior in linux.

PR:		kern/157871
Reported by:	Petr Salinger <Petr Salinger att seznam cz>
Verified on:	GNU/kFreeBSD debian 8.2-1-amd64 (by reporter)
Reviewed by:	kib (some time ago)
MFC after:	2 weeks
2011-06-17 07:30:56 +00:00
kib
7c77d46862 Regen. 2011-06-16 22:06:35 +00:00
kib
d5407645c7 Implement compat32 for old lseek, for the a.out binaries on amd64. 2011-06-16 22:05:56 +00:00
netchild
428c437c67 Commit the missing linux_videdev2_compat.h (lost somewhere between
commit tree patch generation -> successful compile tree build test -> commmit).

Pointy hat to:	netchild
2011-05-04 13:09:20 +00:00
netchild
897499f709 Add FEATURE macros for v4l and v4l2 to the linuxulator.
Suggested by:	ae
2011-05-04 09:52:34 +00:00
netchild
939c0e0ef5 This is v4l2 support for the linuxulator. This allows to access FreeBSD
native devices which support the v4l2 API from processes running within
the linuxulator, e.g. skype or flash can access the multimedia/pwcbsd
or multimedia/webcamd supplied drivers.

Submitted by:	nox
MFC after:	1 month
2011-05-04 09:05:39 +00:00
netchild
5e0eb77d30 Fix typo in comment, improve comment. 2011-05-04 08:42:31 +00:00
netchild
991566d22b Add explanation about the use-permission and FreeBSDify it. 2011-05-04 08:41:55 +00:00
netchild
443cec2596 Copy the v4l2 header unchanged from the vendor branch. 2011-05-04 08:31:58 +00:00
mdf
b0f8474766 Regen. 2011-04-18 16:32:47 +00:00
mdf
9c9a32d97b Add the posix_fallocate(2) syscall. The default implementation in
vop_stdallocate() is filesystem agnostic and will run as slow as a
read/write loop in userspace; however, it serves to correctly
implement the functionality for filesystems that do not implement a
VOP_ALLOCATE.

Note that __FreeBSD_version was already bumped today to 900036 for any
ports which would like to use this function.

Also reserve space in the syscall table for posix_fadvise(2).

Reviewed by:	-arch (previous version)
2011-04-18 16:32:22 +00:00
trasz
935751ee5c Remove stray semicolon. 2011-04-10 10:15:49 +00:00
jkim
95c723445e Use atomic load & store for TSC frequency. It may be overkill for amd64 but
safer for i386 because it can be easily over 4 GHz now.  More worse, it can
be easily changed by user with 'machdep.tsc_freq' tunable (directly) or
cpufreq(4) (indirectly).  Note it is intentionally not used in performance
critical paths to avoid performance regression (but we should, in theory).
Alternatively, we may add "virtual TSC" with lower frequency if maximum
frequency overflows 32 bits (and ignore possible incoherency as we do now).
2011-04-07 23:28:28 +00:00
trasz
92bec9b84c Add accounting for most of the memory-related resources.
Sponsored by:	The FreeBSD Foundation
Reviewed by:	kib (earlier version)
2011-04-05 20:23:59 +00:00
kib
4999e59537 Implement compat32 shims for PCIOCGETCONF.
There is a generic problem with the shims for ioctls that receive
pointers to the usermode data areas in the data argument. We either have
to modify the handler to accept UIO_USERSPACE/UIO_SYSSPACE indicator, or
allocate and fill a usermode memory for data buffer in the host format.
The change goes the second route, in particular because we do not need
to modify the handler.

Submitted by:	John Wehle <john feith com>
MFC after:	2 weeks
2011-04-02 16:02:25 +00:00
kib
72cac8e666 Provide the structures and ioctl number definition for handling
PCIOCGETCONF compat32.

Submitted by:	John Wehle <john feith com>
MFC after:	2 weeks
2011-04-02 15:47:23 +00:00
kib
e452eed194 Regen 2011-04-01 11:16:53 +00:00
kib
7c2eaa21fe Add support for executing the FreeBSD 1/i386 a.out binaries on amd64.
In particular:
- implement compat shims for old stat(2) variants and ogetdirentries(2);
- implement delivery of signals with ancient stack frame layout and
  corresponding sigreturn(2);
- implement old getpagesize(2);
- provide a user-mode trampoline and LDT call gate for lcall $7,$0;
- port a.out image activator and connect it to the build as a module
  on amd64.

The changes are hidden under COMPAT_43.

MFC after:   1 month
2011-04-01 11:16:29 +00:00
avg
94ec7d2988 Revert r220032:linux compat: add SO_PASSCRED option with basic handling
I have not properly thought through the commit.  After r220031 (linux
compat: improve and fix sendmsg/recvmsg compatibility) the basic
handling for SO_PASSCRED is not sufficient as it breaks recvmsg
functionality for SCM_CREDS messages because now we would need to handle
sockcred data in addition to cmsgcred.  And that is not implemented yet.

Pointyhat to:	avg
2011-03-31 08:14:51 +00:00
trasz
3adbd8337d Regenerate. 2011-03-30 17:59:54 +00:00
trasz
2f99052d80 Add rctl. It's used by racct to take user-configurable actions based
on the set of rules it maintains and the current resource usage.  It also
privides userland API to manage that ruleset.

Sponsored by:	The FreeBSD Foundation
Reviewed by:	kib (earlier version)
2011-03-30 17:48:15 +00:00
kib
dfc44ec337 Regen. 2011-03-30 14:46:55 +00:00
kib
5c02dd1e9a Provide compat32 shims for kldstat(2).
Requested and tested by:	jpaetzel
MFC after:	1 week
2011-03-30 14:46:12 +00:00
avg
df7a39b1d0 linux compat: add SO_PASSCRED option with basic handling
This seems to have been a part of a bigger patch by dchagin that either
haven't been committed or committed partially.

Submitted by:	dchagin, nox
MFC after:	2 weeks
2011-03-26 11:25:36 +00:00
avg
a923658583 linux compat: improve and fix sendmsg/recvmsg compatibility
- implement baseic stubs for capget, capset, prctl PR_GET_KEEPCAPS
  and prctl PR_SET_KEEPCAPS.
- add SCM_CREDS support to sendmsg and recvmsg
- modify sendmsg to ignore control messages if not using UNIX
  domain sockets

This should allow linux pulse audio daemon and client work on FreeBSD
and interoperate with native counter-parts modulo the differences in
pulseaudio versions.

PR:		kern/149168
Submitted by:	John Wehle <john@feith.com>
Reviewed by:	netchild
MFC after:	2 weeks
2011-03-26 11:05:53 +00:00
kib
948b7589fc Implement compat32 MEMRANGE_GET and MEMRANGE_SET. This is needed to
run 32bit Xorg server with VESA driver.

Submitted by:	John Wehle <john feith com>
MFC after:	1 week
2011-03-25 11:52:31 +00:00
kib
0d40bf4b19 Fully emulate MDIOCLIST for compat32.
MFC after:	1 week
2011-03-25 11:43:49 +00:00
kib
abbe29dccf Remove unneccessary panics, that can be easily triggered by user.
The copyin() function handles NULL as well as any other pointer.

MFC after:	3 days
2011-03-25 11:05:28 +00:00
kib
1d38d5630c Fix file leakage in the freebsd32_ioctl routines.
Code inspection shows freebsd32_ioctl calls fget for a fd and calls
a subroutine to handle each specific ioctl.  It is expected that the
subroutine will call fdrop when done.  However many of the subroutines
will exit out early if copyin encounters an error resulting in fdrop
never being called.

Submitted by:	John Wehle <john feith com>
MFC after:	3 days
2011-03-25 10:57:57 +00:00
jhb
c7ac62aecd Fix some locking nits with the p_state field of struct proc:
- Hold the proc lock while changing the state from PRS_NEW to PRS_NORMAL
  in fork to honor the locking requirements.  While here, expand the scope
  of the PROC_LOCK() on the new process (p2) to avoid some LORs.  Previously
  the code was locking the new child process (p2) after it had locked the
  parent process (p1).  However, when locking two processes, the safe order
  is to lock the child first, then the parent.
- Fix various places that were checking p_state against PRS_NEW without
  having the process locked to use PROC_LOCK().  Every place was already
  locking the process, just after the PRS_NEW check.
- Remove or reduce the use of PROC_SLOCK() for places that were checking
  p_state against PRS_NEW.  The PROC_LOCK() alone is sufficient for reading
  the current state.
- Reorder fill_kinfo_proc() slightly so it only acquires PROC_SLOCK() once.

MFC after:	1 week
2011-03-24 18:40:11 +00:00
netchild
2d4b88f638 Staticize functions which are not used somewhere else, move the
corresponding prototypes from the header to the code file.
2011-03-15 13:40:47 +00:00
avg
5a2a285ac9 add DTrace systrace support for linux32 and freebsd32 on amd64 syscalls
Regenerate system call and systrace support files.

PR:		kern/152822
Submitted by:	Artem Belevich <fbsdlist@src.cx>
Reviewed by:	jhb (earlier version)
MFC after:	3 weeks
2011-03-12 08:58:19 +00:00
avg
666906fcd7 add DTrace systrace support for linux32 and freebsd32 on amd64 syscalls
This commits makes necessary changes in syscall/sysent generation
infrastructure.

PR:		kern/152822
Submitted by:	Artem Belevich <fbsdlist@src.cx>
Reviewed by:	jhb (ealier version)
MFC after:	3 weeks
2011-03-12 08:51:43 +00:00
dchagin
1878bfa86d Style(9) fixes. No functional changes.
MFC after:	2 Week
2011-03-12 07:47:05 +00:00
jhb
faa7c47cee Remove now-obsolete comment.
Submitted by:	netchild
MFC after:	1 week
2011-03-10 19:50:12 +00:00
jkim
a023421583 Remove custom interrupt dispatcher. This is a pointless micro-optimization
and it may cause problems if SS and SP are modified by real-mode code.

MFC after:	1 month
2011-03-09 16:16:38 +00:00
dchagin
83dbcc9afd Indeed, remove bogus since r219405 check of the Linux ABI.
Pointed out:	jhb

MFC after:	2 Week
2011-03-09 05:59:33 +00:00
dchagin
69b8756d3d Extend struct sysvec with new method sv_schedtail, which is used for an
explicit process at fork trampoline path instead of eventhadler(schedtail)
invocation for each child process.

Remove eventhandler(schedtail) code and change linux ABI to use newly added
sysvec method.

While here replace explicit comparing of module sysentvec structure with the
newly created process sysentvec to detect the linux ABI.

Discussed with:	kib

MFC after:	2 Week
2011-03-08 19:01:45 +00:00
trasz
1618438630 Export login class information via kinfo and make it possible to view
it using "ps -o class".
2011-03-05 14:41:49 +00:00
trasz
0525662d59 Regenerate. 2011-03-05 12:46:24 +00:00
trasz
62f6a13e39 Add two new system calls, setloginclass(2) and getloginclass(2). This makes
it possible for the kernel to track login class the process is assigned to,
which is required for RCTL.  This change also make setusercontext(3) call
setloginclass(2) and makes it possible to retrieve current login class using
id(1).

Reviewed by:	kib (as part of a larger patch)
2011-03-05 12:40:35 +00:00
dchagin
75c35db69e Print out shared flag for debug purpose.
MFC after:	1 Week
2011-03-03 18:29:55 +00:00
dchagin
a583811c0a Switch PROCESS_SHARE to AUTO_SHARE (as umtx do). Even for SHARED,
if page mapped MAP_ANON linux uses private algorithm too.

Disscussed with:	jhb

MFC after:	3 Days
2011-03-03 18:19:10 +00:00
rwatson
9c02915234 Regenerate system call files following addition of cap_enter(2),
cap_getmode(2), and capabilities.conf.

Reviewed by:	anderson
Discussed with:	benl, kris, pjd
Obtained from:	Capsicum Project
Sponsored by:	Google, Inc.
MFC after:	3 months
2011-03-01 13:30:23 +00:00
rwatson
6894aabcb5 Add initial support for Capsicum's Capability Mode to the FreeBSD kernel,
compiled conditionally on options CAPABILITIES:

Add a new credential flag, CRED_FLAG_CAPMODE, which indicates that a
subject (typically a process) is in capability mode.

Add two new system calls, cap_enter(2) and cap_getmode(2), which allow
setting and querying (but never clearing) the flag.

Export the capability mode flag via process information sysctls.

Sponsored by:	Google, Inc.
Reviewed by:	anderson
Discussed with:	benl, kris, pjd
Obtained from:	Capsicum Project
MFC after:	3 months
2011-03-01 13:23:37 +00:00
brucec
17108b16ca Use the cprd_mem field when setting the start and length for a memory
resource - the layout of cprd_port is identical but using cprd_mem
makes the code easier to understand.

PR:		kern/118493
Submitted by:	Weongyo Jeong <weongyo.jeong at gmail.com>
MFC after:	3 days
2011-02-23 21:45:28 +00:00
jhb
6e95ae3692 Use umtx_key objects to uniquely identify futexes. Private futexes in
different processes that happen to use the same user address in the
separate processes will now be treated as distinct futexes rather than the
same futex.  We can now honor shared futexes properly by mapping them to a
PROCESS_SHARED umtx_key.  Private futexes use THREAD_SHARED umtx_key
objects.

In conjunction with:	dchagin
Reviewed by:	kib
MFC after:	1 week
2011-02-23 13:23:28 +00:00
brucec
6d9b42b486 Fix typos - remove duplicate "the".
PR:	bin/154928
Submitted by:	Eitan Adler <lists at eitanadler.com>
MFC after: 	3 days
2011-02-21 09:01:34 +00:00
dchagin
7309a9eb12 Do not clobber %rdx.
Before calling vfork() syscall the linux user-space stores the current PID
in the %rdx and restore it when the parent process will leave the kernel.
2011-02-20 07:58:30 +00:00
dchagin
be13e396c9 For realtime signals fill the sigval value. 2011-02-15 21:46:36 +00:00
dchagin
77c093e6c2 Make a linux_rt_sigtimedwait() system call is actually working.
1) Translate the native signal number in the appropriate Linux signal.
2) Remove bogus code, which can lead to a panic as it calls
   kern_sigtimedwait with same ksiginfo.
3) Return the corresponding signal number.
2011-02-15 21:42:48 +00:00
dchagin
ee445bda81 Style(9) fix. Wrap long lines in linux_rt_sigtimedwait(). 2011-02-15 21:24:50 +00:00
dchagin
5a6a633611 Put the macro declaration in the relevant include file for future use. 2011-02-15 21:22:09 +00:00
dchagin
efb892cc10 Style(9) fix. Do not initialize variables in the declarations. 2011-02-14 17:24:58 +00:00
dchagin
185c49188e Sort include files in the alphabetical order. 2011-02-13 20:07:48 +00:00
dchagin
bc6bf7dad6 Remove comment about 'ftlk' LOR. 2011-02-13 18:46:34 +00:00
dchagin
d9864e4484 Stop printing the LOR, as this is expected behavior. 2011-02-13 18:41:40 +00:00
dchagin
5cfdaf1c59 The bitset field of freshly created futex should be initialized explicity.
Otherwise, REQUEUE operations fails.
2011-02-13 17:56:22 +00:00
dchagin
c26a933750 Rename used_requeue and use it as bitwise field to store more flags.
Reimplement used_requeue logic with LINUX_XDEPR_REQUEUEOP flag.
2011-02-12 20:58:59 +00:00
dchagin
bd20f71621 Slightly rewrite linux_fork:
1) Remove bogus error checking.
2) A new process exit from kernel through fork_trampoline(),
   so remove bogus check.
2011-02-12 20:16:25 +00:00
dchagin
9cf49ae032 Remove bogus include <machine/frame.h> 2011-02-12 19:14:57 +00:00
dchagin
9f708ad0aa Move linux_clone(), linux_fork(), linux_vfork() to a MI path. 2011-02-12 18:17:12 +00:00
netchild
35ebea3dcf Linux' shm_open() fails because it wants to find some funky shmfs
to construct the full pathname. It starts to search at the default
mountpoint which is /dev/shm. If this fails it runs through fstab
and searches for shmfs and tmpfs. Whatever it finds will be
statfs()'ed to be checked for Linux' fs magic for shmfs (0x01021994).

Ideally our tmpfs should deliver this fs magic to Linux processes, but
as our tmpfs is considered to be an experimental feature we can not
assume that there is always a tmpfs available.

To make shared memory work in the Linuxulator, force the fs type of
/dev/shm (which can be a symlink) to match what Linux expects. The user
is responsible (info has to be added to the linux base ports and the docs)
to setup a suitable link for /dev/shm.

Noticed by:	Andre Albsmeier <Andre.Albsmeier@siemens.com>
Submitted by:	Andre Albsmeier <Andre.Albsmeier@siemens.com>
MFC after:	1 month
2011-02-09 20:23:22 +00:00
dchagin
633a205e26 Yet another unimplemented futex operation, print out about.
Submitted by:	arundel
MFC after:	1 month.
2011-01-31 06:06:23 +00:00
dchagin
6570332d31 Implement a futex BITSET op.
Submitted by:	arundel
MFC after:	1 month.
2011-01-31 05:59:05 +00:00
bz
126cbabd2e Update interface stats counters to match the current format in linux and
try to export as much information as we can match.

Requested on:	Debian GNU/kFreeBSD list (debian-bsd lists.debian.org) 2010-12
Tested by:	Mats Erik Andersson (mats.andersson gisladisker.se)
MFC after:	10 days
2011-01-31 00:09:52 +00:00
dchagin
e85dbed4b7 Style(9) fixes.
MFC after:	1 Month.
2011-01-28 19:04:15 +00:00
dchagin
051ceeb5f3 Implement a variation of the linux_common_wait() which should
be used by linuxolator itself.

Move linux_wait4() to MD path as it requires native struct
rusage translation to struct l_rusage on linux32/amd64.

MFC after:	1 Month.
2011-01-28 18:47:07 +00:00
dchagin
2501c5eeee Style(9) fix.
MFC after:	1 month.
2011-01-28 05:42:14 +00:00
dchagin
1e124ec538 Add macro to test the sv_flags of any process. Change some places to test
the flags instead of explicit comparing with address of known sysentvec
structures.

MFC after:	1 month
2011-01-26 20:03:58 +00:00
dchagin
72b8fc74b4 Style(9) fix.
Approved by:	kib(mentor)
MFC after:	1 month
2011-01-23 09:50:39 +00:00
kib
71201deeff In linuxolator getdents_common(), it seems there is no reason to loop
if no records where returned by VOP_READDIR(). Readdir implementations
allowed to return 0 records when first record is larger then supplied
buffer. In this case trying to execute VOP_READDIR() again causes the
syscall looping forewer.

The goto was there from the day 1, which goes back to 1995 year.

Reported and tested by:	Beat G?tzi <beat chruetertee ch>
MFC after:   2 weeks
2011-01-19 12:19:25 +00:00
mdf
caeebe8c54 Fix a few more SYSCTL_PROC() that were missing a CTLFLAG type specifier. 2011-01-19 00:57:58 +00:00
kib
06eb8de1e1 Create shared (readonly) page. Each ABI may specify the use of page by
setting SV_SHP flag and providing pointer to the vm object and mapping
address. Provide simple allocator to carve space in the page, tailored
to put the code with alignment restrictions.

Enable shared page use for amd64, both native and 32bit FreeBSD
binaries.  Page is private mapped at the top of the user address
space, moving a start of the stack one page down. Move signal
trampoline code from the top of the stack to the shared page.

Reviewed by:	 alc
2011-01-08 16:13:44 +00:00
scf
3e36dcd710 Fix the LINUX_SOUND_MIXER_INFO ioctl to return success after the
information is set to FreeBSD.  It had been falling through to the end
of linux_ioctl_sound() and returning ENOIOCTL.  Noticed when running the
Linux ALSA amixer tool.

Add a LINUX_SOUND_MIXER_READ_CAPS ioctl which is used by the Skype
v2.1.0.81 binary.

Reviewed by:	gavin
MFC after:	2 weeks
2010-12-30 02:18:04 +00:00
tijl
0f810ef0a2 Merge amd64 and i386 bus.h and move the resulting header to x86. Replace
the original amd64 and i386 headers with stubs.

Rename (AMD64|I386)_BUS_SPACE_* to X86_BUS_SPACE_* everywhere.

Reviewed by:	imp (previous version), jhb
Approved by:	kib (mentor)
2010-12-20 16:39:43 +00:00
kib
3d12be787f Restore the ABI of struct kinfo_proc32 after r213536.
MFC after:	3 days
2010-12-19 21:18:33 +00:00
bschmidt
bd1f37ab17 Implement NdisGetRoutineAddress and MmGetSystemRoutineAddress used in
newer Ralink drivers.

Submitted by:	Paul B Mahol <onemda at gmail.com>
2010-12-06 20:54:53 +00:00
bschmidt
2fe8bd5bcc Add a dummy for IoOpenDeviceRegistryKey().
With that change the Atheros 9xxx driver is actually usable and does not
panic anymore.

Submitted by:	Paul B Mahol <onemda at gmail.com>
MFC after:	2 weeks
2010-11-29 10:21:45 +00:00
bschmidt
a055d1840e Some drivers rely on the existence of certain keys. The Atheros 9xxx
driver for example requests the NetCfgInstanceId but doesn't check the
returned status code and will happily access random memory instead.

Submitted by:	Paul B Mahol <onemda at gmail.com>
MFC after:	2 weeks
2010-11-29 10:10:56 +00:00
bschmidt
77837b201a Add prototype for InitializeSListHead(). 2010-11-23 22:17:06 +00:00
bschmidt
b1be2833eb Add a few functions used in newer drivers. Fix RtlCompareMemory() while
here.

Submitted by:	Paul B Mahol <onemda@gmail.com>
2010-11-23 21:49:32 +00:00
pluknet
64dd3dbf39 Update MNT_ROOTFS comments after changes in the root mount logic.
Reported by:	arundel
Suggested by:	marcel (phrasing)
Approved by:	kib (mentor)
2010-11-23 13:49:15 +00:00
kib
d6e3624f09 Add include guards.
MFC after:	3 days
2010-11-23 12:47:15 +00:00
bschmidt
82896fc74f Resurrect amd64 support.
- Many drivers on amd64 are picking system uptime, interrupt time and ticks
  via global data structure instead of calling functions for performance
  reasons. For now just patch such address so driver will not trigger page
  fault when trying to access such data. In future, additional callout may
  be added to update data in periodic intervals.
- On amd64 we need to allocate "shadow space" on stack before calling any
  function.

Submitted by:	Paul B Mahol <onemda at gmail.com>
2010-11-22 20:46:38 +00:00
bschmidt
06a992a873 Prefer pmap_extract() over pmap_kextract() as done in MmIsAddressValid().
According to the comment for MmIsAddressValid() there are issues on PAE
kernels using pmap_kextract().

Submitted by:	Paul B Mahol <onemda at gmail.com>
2010-11-22 20:39:29 +00:00
dim
11b4830687 Fix linux kernel module breakage introduced in r215675, by including
<sys/sysent.h>.

Noticed by:	many
Pointy hat to:	netchild
2010-11-22 20:23:18 +00:00
attilio
7718cbcbf4 Add the ability for GDB to printout the thread name along with other
thread specific informations.

In order to do that, and in order to avoid KBI breakage with existing
infrastructure the following semantic is implemented:
- For live programs, a new member to the PT_LWPINFO is added (pl_tdname)
- For cores, a new ELF note is added (NT_THRMISC) that can be used for
  storing thread specific, miscellaneous, informations. Right now it is
  just popluated with a thread name.

GDB, then, retrieves the correct informations from the corefile via the
BFD interface, as it groks the ELF notes and create appropriate
pseudo-sections.

Sponsored by:	Sandvine Incorporated
Tested by:	gianni
Discussed with:	dim, kan, kib
MFC after:	2 weeks
2010-11-22 14:42:13 +00:00
netchild
81eaf1c587 Do not take the process lock. The assignment to u_short inside the
properly aligned structure is atomic on all supported architectures, and
the thread that should see side-effect of assignment is the same thread
that does assignment.

Use a more appropriate conditional to detect the linux ABI.

Suggested by:	kib
X-MFC:		together with r215664
2010-11-22 12:42:32 +00:00
netchild
47500ea1a1 Remove trailing dot from the unimplemented futex messages to make
them consistent with the syscall and ipc messages.

Submitted by:	arundel
MFC after:	3 days
2010-11-22 09:25:32 +00:00
netchild
46e50a7603 By using the 32-bit Linux version of Sun's Java Development Kit 1.6
on FreeBSD (amd64), invocations of "javac" (or "java") eventually
end with the output of "Killed" and exit code 137.

This is caused by:
1. After calling exec() in multithreaded linux program threads are not
   destroyed and continue running. They get killed after program being
   executed finishes.

2. linux_exit_group doesn't return correct exit code when called not
   from group leader. Which happens regularly using sun jvm.

The submitters fix this in a similar way to how NetBSD handles this.

I took the PRs away from dchagin, who seems to be out of touch of
this since a while (no response from him).

The patches committed here are from [2], with some little modifications
from me to the style.

PR:		141439 [1], 144194 [2]
Submitted by:	Stefan Schmidt <stefan.schmidt@stadtbuch.de>, gk
Reviewed by:	rdivacky (in april 2010)
MFC after:	5 days
2010-11-22 09:06:59 +00:00
bschmidt
389cf6e631 Fix a panic on i386 for drivers using MmAllocateContiguousMemory()
and MmAllocateContiguousMemorySpecifyCache().

Those two functions take 64-bit variable(s) for their arguments. On i386
that takes additional 32-bit variable per argument. This is required so
that windrv_wrap() can correctly wrap function that miniport driver calls
with stdcall convention. Similar explanation is provided in subr_ndis.c for
other functions.

Submitted by:	 Paul B Mahol <onemda at gmail.com>
2010-11-17 09:32:39 +00:00
bschmidt
87fbb488d4 Use kmem_alloc_contig() to honour the cache_type variable.
Pointed out by:	alc
2010-11-17 09:28:17 +00:00
des
5c90ddd6c4 Remove no-op assignment.
Submitted by:	clang via arundel@
MFC after:	2 weeks
2010-11-15 23:14:14 +00:00
netchild
ec2a62f430 Some style(9) fixes.
Submitted by:	arundel
MFC after:	1 week
2010-11-15 13:07:10 +00:00
netchild
d3aba4235e - print out the PID and program name of the program trying to use an
unsupported futex operation
- for those futex operations which are known to be not supported,
  print out which futex operation it is
- shortcut the error return of the unsupported FUTEX_CLOCK_REALTIME in
  some cases:
    FUTEX_CLOCK_REALTIME can be used to tell linux to use
    CLOCK_REALTIME instead of CLOCK_MONOTONIC. FUTEX_CLOCK_REALTIME
    however must only be set, if either FUTEX_WAIT_BITSET or
    FUTEX_WAIT_REQUEUE_PI are set too. If that's not the case
    we can die with ENOSYS right at the beginning.

Submitted by:	arundel
Reviewed by:	rdivacky (earlier iteration of the patch)
MFC after:	1 week
2010-11-15 13:03:35 +00:00
bschmidt
b1835fd211 According to specs for MmAllocateContiguousMemorySpecifyCache() physically
contiguous memory with requested restrictions must be allocated.

Submitted by:	Paul B Mahol <onemda at gmail.com>
2010-11-11 18:43:31 +00:00
des
87d7052b67 Break long line. 2010-11-08 15:14:14 +00:00
des
543d80b9c5 Fix CPU ID in /proc/cpuinfo.
PR:		kern/56451
Submitted by:	arundel@
MFC after:	3 weeks
2010-11-08 12:04:41 +00:00
bschmidt
7edbc46989 Remove 4.x, 5.x and 6.x compatibility bits.
Submitted by:	Paul B Mahol <onemda at gmail.com>
2010-11-04 18:43:57 +00:00
kib
dd8752941f Remove stale comment.
Submitted by:	arundel
MFC after:	3 days
2010-10-14 19:30:44 +00:00
kib
84212c7551 Add macro DECLARE_MODULE_TIED to denote a module as requiring the
kernel of exactly the same __FreeBSD_version as the headers module was
compiled against.

Mark our in-tree ABI emulators with DECLARE_MODULE_TIED. The modules
use kernel interfaces that the Release Engineering Team feel are not
stable enough to guarantee they will not change during the life cycle
of a STABLE branch. In particular, the layout of struct sysentvec is
declared to be not part of the STABLE KBI.

Discussed with:	bz, rwatson
Approved by:	re (bz, kensmith)
MFC after:	2 weeks
2010-10-12 09:18:17 +00:00
jkim
6c222550e8 Simplify timeout check in futex_wait() using itimerfix() and return error
if the given timeout is invalid.  Consistently use int type for timeout and
correct a format string in futex_sleep().
2010-10-06 18:51:22 +00:00
netchild
77b4c590d7 Fix a comparision of an uninitialised pointer.
Submitted by:	arundel
Found by:	clang analysis (automatic service by uqs@)
Reviewed by:	rdivacky
2010-10-06 07:34:41 +00:00
thompsa
b942a32370 Use the printf-like capability from kproc_create().
Submitted by:	Paul B Mahol
2010-10-05 20:56:08 +00:00
jkim
e9d0730bf8 Prefer pmap_unmapbios() over pmap_unmapdev(). The binary does not change
after this because pmap_unmapbios() is a macro for pmap_unmapdev() on amd64.
2010-10-05 18:38:23 +00:00
kib
2fc806f00c In linprocfs_doargv():
- handle compat32 processes;
- remove the checks for copied in addresses to belong into valid
  usermode range, proc_rwmem() does this;
- simplify loop reading single string, limit the total amount of strings
  collected by ARG_MAX bytes;
- correctly add '\0' at the end of each copied string;
- fix style.

In linprocfs_doprocenviron():
- unlock the process before calling copyin code [1]. The process is held
  by pseudofs.

In linprocfs_doproccmdline:
- use linprocfs_doargv() to handle !curproc case for which p_args is not cached.

Reported by:		plulnet [1]
Tested by:		pluknet
Approved by:		des (linprocfs maintainer, previous
				version of the patch)
MFC after:		3 weeks
2010-09-28 11:32:17 +00:00
des
a14c121cce Implement proc/$$/environment.
Submitted by:	Fernando Apesteguía <fernando.apesteguia@gmail.com>
MFC after:	3 weeks
2010-09-16 07:56:34 +00:00
mdf
ab3a8b533a Replace sbuf_overflowed() with sbuf_error(), which returns any error
code associated with overflow or with the drain function.  While this
function is not expected to be used often, it produces more information
in the form of an errno that sbuf_overflowed() did.
2010-09-10 16:42:16 +00:00
jkim
0117aaf574 Add x86bios_set_intr() to set interrupt vectors for real mode and simplify
x86bios_get_intr() a little.
2010-08-25 21:03:50 +00:00
jkim
8c8d33fe9f Check opcode for short jump as well. Some option ROMs do short jumps
(e.g., some NVIDIA video cards) and we were not able to do POST while
resuming because we only honored long jump.

MFC after:	3 days
2010-08-25 20:52:40 +00:00
kib
d9f088a03e Supply some useful information to the started image using ELF aux vectors.
In particular, provide pagesize and pagesizes array, the canary value
for SSP use, number of host CPUs and osreldate.

Tested by:	marius (sparc64)
MFC after:	1 month
2010-08-17 08:55:45 +00:00
jkim
b25a196078 Place spinlock_enter() and spinlock_exit() just around X86EMU calls. 2010-08-10 15:22:48 +00:00
jkim
781d513f0b Tidy up locking and memory allocation for the real mode emulator wrapper.
Now we use a regular mutex instead of a spin mutex.  When we enter and exit
the emulator, spinlock_enter() and spinlock_exit() are additionally used.
Move some page table related stuff from x86bios_init() and x86bios_uninit()
to x86bios_map_mem() and x86bios_unmap_mem().
2010-08-10 06:25:08 +00:00
jkim
6328a1bf23 Tidy up printf() calls for debugging. 2010-08-09 22:06:08 +00:00
jkim
c07bc7f517 Initialize a variable just before its use. 2010-08-09 18:10:32 +00:00
jkim
c9e34bbc39 Reduce diffs between VM86 and X86EMU wrappers for x86bios_alloc() and
x86bios_free().  Add strict sanity checks for VM86 wrapper and add strict
page table locking for X86EMU wrapper.
2010-08-09 17:54:26 +00:00
kib
8e1e89f01b Prefer struct sysentvec sv_psstrings to hardcoding FREEBSD32_PS_STRINGS
in the compat32 code. Use sv_usrstack instead of FREEBSD32_USRSTACK as well.

MFC after:	1 week
2010-08-07 11:57:13 +00:00
kib
8043767b92 Add compat32 definition for (old) struct ostat.
MFC after:	1 week
2010-08-07 11:53:38 +00:00
jkim
77b28d0e95 Do not block any I/O port on amd64. 2010-08-07 04:05:58 +00:00
jkim
012f478c81 Optimize interrupt vector lookup. There is no need to check the page table. 2010-08-07 03:45:45 +00:00
jkim
57b610d580 Consistently use architecture specific macros. 2010-08-06 15:24:37 +00:00
jkim
c1d76c06de Fix allocation of multiple pages, which forgot to increase page number.
Particularly, it caused "vm86_addpage: overlap" panics under VirtualBox.
Add a safety check before freeing memory while I am here.
2010-08-06 15:04:01 +00:00
jkim
9c60808f39 Re-add flag register for output. Some BIOS calls actually use it to return
success/failure status.  Oops.
2010-08-05 19:30:57 +00:00
jkim
0a7f48a833 Do not copy stack pointer and flags. These registers are unconditionally
destroyed from vm86_prepcall().
2010-08-05 19:12:35 +00:00
jkim
f183f61cf2 Implement a simple native VM86 backend for X86BIOS. Now i386 uses native
VM86 calls instead of the real mode emulator as a backend.  VM86 has been
proven reliable for very long time and it is actually few times faster than
emulation.  Increase maximum number of page table entries per VM86 context
from 3 to 8 pages.  It was (ridiculously) low and insufficient for new VM86
backend, which shares one context globally.  Slighly rearrange and clean up
the emulator backend to accommodate new code.  The only visible change here
is stack size, which is decreased from 64K to 4K bytes to sync. with VM86.
Actually, it seems there is no need for big stack in real mode.

MFC after:	1 month
2010-08-05 18:48:30 +00:00
kib
4a7e2ba2a3 Copy inode birthtime to the struct stat32.
MFC after:	1 week
2010-08-04 14:38:20 +00:00
kib
36b27b8587 Fix style.
MFC after:	1 week
2010-08-04 14:35:05 +00:00
kib
734aeecfaf When compat32 recvmsg(2) does not need to copy out control messages, set
msg_controllen to 0.

PR:	kern/149227
Submitted by:	Stef Walter <stef memberwebs com>
MFC after:	1 weeks
2010-08-03 11:23:44 +00:00
alc
256c63de28 Introduce exec_alloc_args(). The objective being to encapsulate the
details of the string buffer allocation in one place.

Eliminate the portion of the string buffer that was dedicated to storing
the interpreter name.  The pointer to the interpreter name can simply be
made to point to the appropriate argument string.

Reviewed by:	kib
2010-07-27 17:31:03 +00:00
kib
5d01c62502 Revert r210451, and the similar part of the r210431. The forward-declaration
for the enum tag when enum definition is not complete is not allowed by
C99, and is gcc extension.

Requested by:	stefanf
MFC after:	28 days
2010-07-26 12:52:44 +00:00
alc
02c0473d35 Change the order in which the file name, arguments, environment, and
shell command are stored in exec*()'s demand-paged string buffer.  For
a "buildworld" on an 8GB amd64 multiprocessor, the new order reduces
the number of global TLB shootdowns by 31%.  It also eliminates about
330k page faults on the kernel address space.

Change exec_shell_imgact() to use "args->begin_argv" consistently as
the start of the argument and environment strings.  Previously, it
would sometimes use "args->buf", which is the start of the overall
buffer, but no longer the start of the argument and environment
strings.  While I'm here, eliminate unnecessary passing of "&length"
to copystr(), where we don't actually care about the length of the
copied string.

Clean up the initialization of the exec map.  In particular, use the
correct size for an entry, and express that size in the same way that
is used when an entry is allocated.  The old size was one page too
large.  (This discrepancy originated in 2004 when I rewrote
exec_map_first_page() to use sf_buf_alloc() instead of the exec map
for mapping the first page of the executable.)

Reviewed by:	kib
2010-07-25 17:43:38 +00:00
kib
229b3b9c19 Remove the linux_exec_copyin_args(), freebsd32_exec_copyin_args() may
server as well. COMPAT_FREEBSD32 is a prerequisite for COMPAT_LINUX32.

Reviewed by:	alc
MFC after:	3 weeks
2010-07-23 21:30:33 +00:00
alc
0c709bf109 Eliminate a little bit of duplicated code. 2010-07-23 18:58:27 +00:00
trasz
e2cd3ad716 Remove proc locking, it's not needed after r210132. 2010-07-17 15:52:11 +00:00
trasz
e3a946ddad Make svr4(4) version of poll(2) use the same limit of file descriptors as the
usual poll(2) does, instead of checking resource limits.
2010-07-15 18:44:58 +00:00
kib
5f8b30cbbb Constify source argument for siginfo_to_siginfo32().
MFC after:	1 week
2010-07-04 11:43:53 +00:00
jhb
df7979cf76 Tweak the in-kernel API for sending signals to threads:
- Rename tdsignal() to tdsendsignal() and make it private to kern_sig.c.
- Add tdsignal() and tdksignal() routines that mirror psignal() and
  pksignal() except that they accept a thread as an argument instead of
  a process.  They send a signal to a specific thread rather than to an
  individual process.

Reviewed by:	kib
2010-06-29 20:41:52 +00:00
kib
180cca1c2d Regenerate 2010-06-28 18:17:21 +00:00
kib
b6d8416eac Count number of threads that enter and leave dynamically registered
syscalls. On the dynamic syscall deregistration, wait until all
threads leave the syscall code. This somewhat increases the safety
of the loadable modules unloading.

Reviewed by:	jhb
Tested by:	pho
MFC after:	1 month
2010-06-28 18:06:46 +00:00
jkim
f68d88b142 Let x86bios_alloc() pass contigmalloc(9) flags. Use it to set M_WAITOK
from VESA BIOS initialization.  All other malloc(9) uses in the function is
blocking any way.
2010-06-23 17:20:51 +00:00
ed
d7eaa4520b ANSIfy prototypes in subr_usbd.c.
Clang generates the following warnings when building subr_usbd.c:

| subr_usbd.c:598:13: warning: promoted type 'int' of K&R function
|   parameter is not compatible with the parameter type 'uint8_t' (aka
|   'unsigned char') declared in a previous prototype
| subr_usbd.c:627:13: warning: promoted type 'int' of K&R function
|   parameter is not compatible with the parameter type 'uint8_t' (aka
|   'unsigned char') declared in a previous prototype
| subr_usbd.c:649:13: warning: promoted type 'int' of K&R function
|   parameter is not compatible with the parameter type 'uint8_t' (aka
|   'unsigned char') declared in a previous prototype

Instead of just ANSIfying these three prototypes, do it for the entire
file.

Spotted by:	clang
2010-06-12 12:19:08 +00:00
jhb
9b74a62d73 Update several places that iterate over CPUs to use CPU_FOREACH(). 2010-06-11 18:46:34 +00:00
wkoszek
ab9f5dbe35 Bring USB fixes for linux(4).
Intention of this commit is to let us take a full advantage
of libusb(8) ported to Linux. This decreases a possibility of getting
any collisions within ioctl() "command" space, especially with
relation to  LINUX_SNDCTL_SEQ... stuff.

Basically, we provide commands, that will be mapped in the kernel
to correct ones and forward those to the USB layer. Port enabling
functionality brought with this patch is here:

	http://www.freebsd.org/cgi/query-pr.cgi?pr=146895

Bump __FreeBSD_version to catch, since which version installing a
port makes sense.

This patch should bring no regressions. So far, only i386 is tested.

Tested by:	thompsa@
Reviewed by:	thompsa@
OKed by:	netchild@
2010-05-24 07:04:00 +00:00
kib
4208ccbe79 Reorganize syscall entry and leave handling.
Extend struct sysvec with three new elements:
sv_fetch_syscall_args - the method to fetch syscall arguments from
  usermode into struct syscall_args. The structure is machine-depended
  (this might be reconsidered after all architectures are converted).
sv_set_syscall_retval - the method to set a return value for usermode
  from the syscall. It is a generalization of
  cpu_set_syscall_retval(9) to allow ABIs to override the way to set a
  return value.
sv_syscallnames - the table of syscall names.

Use sv_set_syscall_retval in kern_sigsuspend() instead of hardcoding
the call to cpu_set_syscall_retval().

The new functions syscallenter(9) and syscallret(9) are provided that
use sv_*syscall* pointers and contain the common repeated code from
the syscall() implementations for the architecture-specific syscall
trap handlers.

Syscallenter() fetches arguments, calls syscall implementation from
ABI sysent table, and set up return frame. The end of syscall
bookkeeping is done by syscallret().

Take advantage of single place for MI syscall handling code and
implement ptrace_lwpinfo pl_flags PL_FLAG_SCE, PL_FLAG_SCX and
PL_FLAG_EXEC. The SCE and SCX flags notify the debugger that the
thread is stopped at syscall entry or return point respectively.  The
EXEC flag augments SCX and notifies debugger that the process address
space was changed by one of exec(2)-family syscalls.

The i386, amd64, sparc64, sun4v, powerpc and ia64 syscall()s are
changed to use syscallenter()/syscallret(). MIPS and arm are not
converted and use the mostly unchanged syscall() implementation.

Reviewed by:	jhb, marcel, marius, nwhitehorn, stas
Tested by:	marcel (ia64), marius (sparc64), nwhitehorn (powerpc),
	stas (mips)
MFC after:	1 month
2010-05-23 18:32:02 +00:00
netchild
88731dce0c - #ifdef out the cliplist part, skype seems like using an uninitialized
variable and can cause problems, without the cliplist handling it works
  without problems
- improve the cliplist error handling
- fix VIDIOCGTUNER and VIDIOCSMICROCODE (still no hardware available to test)

Submitted by:	J.R. Oldroyd <jr@opal.com>
X-MFC after:	soon (together with all the v4l stuff)
2010-05-03 14:19:58 +00:00
jkim
428aff4f1c Reduce MD code further. At least, it compiles on ia64 now (but it is not
connected to build).  The idea/code was shamelessly taken from r207329.
2010-05-01 01:05:07 +00:00
jkim
b092dfff59 Do not initialize mutex and return error if it cannot map memory. 2010-05-01 00:36:40 +00:00
kib
1b4a81ab7e Provide compat32 shims for kinfo_proc sysctl. This allows 32bit ps(1) to
mostly work on 64bit host.

The work is based on an original patch submitted by emaste, obtained
from Sandvine's source tree.

Reviewed by:	jhb
MFC after:	1 week
2010-04-21 19:32:00 +00:00
kib
d4d906be00 Extract the code to copy-out struct rusage32 from struct rusage
into the new function.

Reviewed by:	jhb
MFC after:	1 week
2010-04-21 19:28:01 +00:00
emaste
c1468b9b67 Linux puts a blank line between each CPU. 2010-04-14 13:44:22 +00:00
bz
7fe3c0a85b Add a forward declaration to silence a warning when compiling ia32_genassym.c.
Reviewed by:	kib
MFC after:	3 days
2010-04-03 12:34:32 +00:00