Commit Graph

710 Commits

Author SHA1 Message Date
Alexander Leidinger
4b3583a354 MFP4 (107144[1]): Implement CLONE_FS on i386[1] and amd64.
Submitted by:	rdivacky	[1]
2006-10-15 13:22:14 +00:00
Alexander Leidinger
687c23be1d MFP4 (107868 - 107870):
Use a macro to test for a valid signal instead of doing it my hand everywhere.

Submitted by:	rdivacky
2006-10-15 12:51:43 +00:00
Robert Watson
827f0e85a6 Regenerate. 2006-09-21 16:20:38 +00:00
Robert Watson
e6f188152c Use AUE_CREAT instead of AUE_O_CREAT for linux_creat().
Obtained from:	TrustedBSD Project
2006-09-21 16:18:33 +00:00
Robert Watson
753a5e888c Regenerate. 2006-09-21 16:13:16 +00:00
Robert Watson
b5ca51459a Use AUE_GETDIRENTRIES instead of AUE_O_GETDENTS and AUE_NULL for a number
of directory reading system calls.

Respell a mis-spelled event name.

Clean up white space/line wraps in a couple of places.

Assign event numbers to some new system call entries that have turned
up in the list since audit support was added.

Obtained from:	TrustedBSD Project
2006-09-21 16:12:58 +00:00
Alexander Leidinger
6dc4e81071 style(9)
While I'm here add a MFC reminder, I forgot it in the previous commit.

Noticed by:	ssouhlal
MFC after:	1 week
2006-09-20 19:27:11 +00:00
Alexander Leidinger
a312f6a30a Bring the i386 linux mmap code more into line with how linux (2.4.x)
behaves. This fixes a lot of test which failed before. For amd64 there
are still some problems, but without any testers which apply patches
and run some predefines tests we can't do more ATM.

Submitted by:	Marcin Cieslak <saper@SYSTEM.PL> (minor fixups by myself)
Tested with:	LTP
2006-09-20 17:24:20 +00:00
Alexander Leidinger
bb59e63f8f Change futex lock from mutex to sx. Make futex_get atomic (protected by the
futex lock).

Sponsored by:	Google SoC 2006
Submitted by:	rdivacky
Suggested by:	jhb
2006-09-09 16:25:25 +00:00
Alexander Leidinger
a6c5f81339 Fix video playing and network connections in realplayer (and most likely
other stuff) in the osrelease=2.6.16 case:
 - implement CLONE_PARENT semantic
 - fix TLS loading in clone CLONE_SETTLS
 - lock proc in the currently disabled part of CLONE_THREAD

I suggest to not unload the linux module after testing this, there are
some "<defunct>" processes hanging around after exiting (they aren't
with osrelease=2.4.2) and they may panic your kernel when unloading the
linux module. They are in state Z and some of them consume CPU according
to ps. But I don't trust the CPU part, the idle threads gets too much CPU
that this may be possible (accumulating idle, X and 2 defunct processes
results in 104.7%, this looks to much to be a rounding error).

Noticed by:	Intron <mag@intron.ac>
Submitted by:	rdivacky (in collaboration with Intron)
Tested by:	Intron, netchild
Reviewed by:	jhb (previous version)
2006-08-27 18:51:32 +00:00
Alexander Leidinger
084556f5d7 regen 2006-08-27 08:58:00 +00:00
Alexander Leidinger
835e506190 Add the linux statfs64 call. This allows Tivoli backup to proceed a little
but further on -current (still not successful, but a step into the right
direction).

Sponsored by:	Google SoC 2006
Submitted by:	rdivacky
Tested by:	Paul Mather <paul@gromit.dlib.vt.edu>
2006-08-27 08:56:54 +00:00
Alexander Leidinger
40f734dd0d Emulate what vfork does instead of using it in linux_vfork. This way
we can do the stuff we need to do with linux processes at fork and
don't panic the kernel at exit of the child.

Submitted by:	rdivacky
Tested with:	tst-vfork* (glibc regression tests)
Tested by:	netchild
2006-08-25 11:59:56 +00:00
Alexander Leidinger
29ddc19bbf Get rid of some nested includes.
Sponsored by:	Google SoC 2006
Submitted by:	rdivacky
Noticed by:	jhb
2006-08-19 15:13:01 +00:00
Alexander Leidinger
94cb2ecf79 Move some stuff into headers where they belong.
Sponsored by:	Google SoC 2006
Submitted by:	rdivacky
Noticed by:	jhb, ssouhlal
2006-08-17 21:06:48 +00:00
Alexander Leidinger
0eef2f8a4e Style fixes to comments.
Sponsored by:	Google SoC 2006
Submitted by:	rdivacky
Noticed by:	jhb, ssouhlal
2006-08-16 18:54:51 +00:00
John Baldwin
f8f1f7fb85 Regen to propogate <prefix>_AUE_<mumble> changes as well as the earlier
systrace changes.
2006-08-15 17:37:01 +00:00
John Baldwin
df78f6d313 - Remove unused sysvec variables from various syscalls.conf.
- Send the systrace_args files for all the compat ABIs to /dev/null for
  now.  Right now makesyscalls.sh generates a file with a hardcoded
  function name, so it wouldn't work for any of the ABIs anyway.  Probably
  the function name should be configurable via a 'systracename' variable
  and the functions should be stored in a function pointer in the sysvec
  structure.
2006-08-15 17:25:55 +00:00
Alexander Leidinger
77b959aa51 add autogenerated systrace_args stuff for dtrace 2006-08-15 12:56:36 +00:00
Alexander Leidinger
9b44bfc556 Add the linux 2.6.x stuff (not used by default!):
- TLS - complete
 - pid/tid mangling - complete
 - thread area - complete
 - futexes - complete with issues
 - clone() extension - complete with some possible minor issues
 - mq*/timer*/clock* stuff - complete but untested and the mq* stuff is
   disabled when not build as part of the kernel with native FreeBSD mq*
   support (module support for this will come later)

Tested with:
 - linux-firefox - works, tested
 - linux-opera - works, tested
 - linux-realplay - doesnt work, issue with futexes
 - linux-skype - doesnt work, issue with futexes
 - linux-rt2-demo - works, tested
 - linux-acroread - doesnt work, unknown reason (coredump) and sometimes
   issue with futexes
 - various unix utilities in linux-base-gentoo3 and linux-base-fc4:
   everything tried worked

On amd64 not everything is supported like on i386, the catchup is planned for
later when the remaining bugs in the new functions are fixed.

To test this new stuff, you have to run
	sysctl compat.linux.osrelease=2.6.16
to switch back use
	sysctl compat.linux.osrelease=2.4.2

Don't switch while running a linux program, strange things may or may not
happen.

Sponsored by:			Google SoC 2006
Submitted by:			rdivacky
Some suggestions/help by:	jhb, kib, manu@NetBSD.org, netchild
2006-08-15 12:54:30 +00:00
Alexander Leidinger
c107650561 regen 2006-08-15 12:51:45 +00:00
Alexander Leidinger
b4359bd8e5 Add new syscalls in the linuxolator (only used when the sysctl
compat.linux.osrelease is changed to "2.6.16" or similar).

On amd64 not everything is supported like on i386, the catchup is planned for
later when the remaining bugs in the new functions are fixed.

Sponsored by:	Google SoC 2006
Submitted by:	rdivacky
2006-08-15 12:28:14 +00:00
Alexander Leidinger
50e422f056 Add some more errno mappings (bsd -> linux) and a comment about the status..
Submitted by:	"Intron" <mag@intron.ac>
2006-08-10 22:05:25 +00:00
John Baldwin
91ce2694d1 Regen for MPSAFE flag removal. 2006-07-28 19:08:37 +00:00
John Baldwin
af5bf12239 Now that all system calls are MPSAFE, retire the SYF_MPSAFE flag used to
mark system calls as being MPSAFE:
- Stop conditionally acquiring Giant around system call invocations.
- Remove all of the 'M' prefixes from the master system call files.
- Remove support for the 'M' prefix from the script that generates the
  syscall-related files from the master system call files.
- Don't explicitly set SYF_MPSAFE when registering nfssvc.
2006-07-28 19:05:28 +00:00
John Baldwin
e0b4add8d8 Various fixes to comments in the syscall master files including removing
cruft from the audit import and adding mention of COMPAT4 to freebsd32.
2006-07-28 18:55:18 +00:00
John Baldwin
3ce72960e8 Regen. 2006-07-21 20:41:33 +00:00
John Baldwin
b4c63329d5 - Pass the MPSAFE flag to namei() in linux_uselib() and handle conditional
Giant VFS locking in that function.
- Remove bogus code to handle the case where namei() returns success but a
  NULL vnode pointer.
- Note that this code duplicates exec_check_permissions() and annotate
  where it differs.
- Hold the vnode lock longer to protect the write to set VV_TEXT in
  v_vflag.
- Mark linux_uselib() MPSAFE.

Reviewed by:	rwatson
2006-07-21 20:22:13 +00:00
John Baldwin
90aff9de2d Regen. 2006-07-11 20:55:23 +00:00
John Baldwin
be5747d5b5 - Add conditional VFS Giant locking to getdents_common() (linux ABIs),
ibcs2_getdents(), ibcs2_read(), ogetdirentries(), svr4_sys_getdents(),
  and svr4_sys_getdents64() similar to that in getdirentries().
- Mark ibcs2_getdents(), ibcs2_read(), linux_getdents(), linux_getdents64(),
  linux_readdir(), ogetdirentries(), svr4_sys_getdents(), and
  svr4_sys_getdents64() MPSAFE.
2006-07-11 20:52:08 +00:00
John Baldwin
ec982ae761 Regen. 2006-07-06 21:43:14 +00:00
John Baldwin
ad6d226d43 - Protect the list of linux ioctl handlers with an sx lock.
- Hold Giant while calling linux ioctl handlers for now as they aren't all
  known to be MPSAFE yet.
- Mark linux_ioctl() MPSAFE.
2006-07-06 21:42:36 +00:00
John Baldwin
cec34dbf79 Regen. 2006-06-27 18:32:16 +00:00
John Baldwin
49d409a108 - Add a kern_semctl() helper function for __semctl(). It accepts a pointer
to a copied-in copy of the 'union semun' and a uioseg to indicate which
  memory space the 'buf' pointer of the union points to.  This is then used
  in linux_semctl() and svr4_sys_semctl() to eliminate use of the stackgap.
- Mark linux_ipc() and svr4_sys_semsys() MPSAFE.
2006-06-27 18:28:50 +00:00
John Baldwin
0cceebeeb2 Regen. 2006-06-27 14:47:08 +00:00
John Baldwin
597d608f86 - Expand the scope of Giant some in mount(2) to protect the vfsp structure
from going away.  mount(2) is now MPSAFE.
- Expand the scope of Giant some in unmount(2) to protect the mp structure
  (or rather, to handle concurrent unmount races) from going away.
  umount(2) is now MPSAFE, as well as linux_umount() and linux_oldumount().
- nmount(2) and linux_mount() were already MPSAFE.
2006-06-27 14:46:31 +00:00
John Baldwin
b820787fb3 Regen. 2006-06-26 18:37:36 +00:00
John Baldwin
cf837b8943 linux_brk() is MPSAFE. 2006-06-26 18:36:16 +00:00
Alexander Leidinger
aff681d258 regen after change to syscalls.master 2006-06-20 20:41:29 +00:00
Alexander Leidinger
502195ac72 Switch to using the DUMMY infrastructure instead of UNIMPL for the new
syscalls. This way there will be a log message printed to the console
(this time for real).

Note: UNIMPL should be used for syscalls we do not implement ever, e.g.
syscalls to load linux kernel modules.

Submitted by:	rdivacky
Sponsored by:	Goole SoC 2006
P4 IDs:		99600, 99602
2006-06-20 20:38:44 +00:00
Alexander Leidinger
4946fe7c4d regen after MFP4 (soc2006/rdivacky_linuxolator) of syscalls.master
P4-Changes:	similar to 98673 and 98675 but regenerated locally
Sponsored by:	Google SoC 2006
Submitted by:	rdivacky
2006-06-13 18:48:30 +00:00
Alexander Leidinger
c8b579c182 MFP4 (soc2006/rdivacky_linuxolator)
Update of syscall.master:
	o	Adding of several new dummy syscalls (268-310)
	o	Synchronization of amd64 syscall.master with i386 one
	o	Auditing added to amd64 syscall.master
	o	Change auditing type for lstat syscall (bugfix). [1]

P4-Changes:	98672, 98674
Noticed by:	rwatson [1]
Sponsored by:	Google SoC 2006
Submitted by:	rdivacky
2006-06-13 18:43:55 +00:00
Alexander Leidinger
ba5bd0001c regen (linux rt_sigpending) 2006-05-10 18:19:51 +00:00
Alexander Leidinger
17138b619c Implement rt_sigpending in the linuxolator.
PR:		92671
Submitted by:	Markus Niemist"o <markus.niemisto@gmx.net>
2006-05-10 18:17:29 +00:00
Doug Ambrisko
060e488247 Enhance the Linux emulation layer to make MegaRAID SAS managements tool happy.
Add back in a scheme to emulate old type major/minor numbers via hooks into
stat, linprocfs to return major/minors that Linux app's expect.  Currently
only /dev/null is always registered.  Drivers can register via the Linux
type shim similar to the ioctl shim but by using
linux_device_register_handler/linux_device_unregister_handler functions.
The structure is:

    struct linux_device_handler {
        char    *bsd_driver_name;
        char    *linux_driver_name;
        char    *bsd_device_name;
        char    *linux_device_name;
        int     linux_major;
        int     linux_minor;
        int     linux_char_device;
    };

Linprocfs uses this to display the major number of the driver.  The
soon to be available linsysfs will use it to fill in the driver name.
Linux_stat uses it to translate the major/minor into Linux type values.

Note major numbers are dynamically assigned via passing in a -1 for
the major number so we don't need to keep track of them.

This is somewhat needed due to us switching to our devfs.  MegaCli
will not run until I add in the linsysfs and mfi Linux compat changes.

Sponsored by:	IronPort Systems
2006-05-05 16:10:45 +00:00
Alexander Leidinger
c85625bfe7 regen 2006-03-18 20:49:01 +00:00
Alexander Leidinger
d4a3f5ddb6 Fixup some problems in my previous commit (COMPAT_43).
Pointyhat to:	netchild
2006-03-18 20:47:36 +00:00
Alexander Leidinger
1f7642e058 regen after COMPAT_43 removal 2006-03-18 18:24:38 +00:00
Alexander Leidinger
5c8919adf4 Get rid of the need of COMPAT_43 in the linuxolator.
Submitted by:	Divacky Roman <xdivac02@stud.fit.vutbr.cz>
Obtained from:	DragonFly (some parts)
2006-03-18 18:20:17 +00:00
John Baldwin
06ad42b2f7 Close some races between procfs/ptrace and exit(2):
- Reorder the events in exit(2) slightly so that we trigger the S_EXIT
  stop event earlier.  After we have signalled that, we set P_WEXIT and
  then wait for any processes with a hold on the vmspace via PHOLD to
  release it.  PHOLD now KASSERT()'s that P_WEXIT is clear when it is
  invoked, and PRELE now does a wakeup if P_WEXIT is set and p_lock drops
  to zero.
- Change proc_rwmem() to require that the processing read from has its
  vmspace held via PHOLD by the caller and get rid of all the junk to
  screw around with the vmspace reference count as we no longer need it.
- In ptrace() and pseudofs(), treat a process with P_WEXIT set as if it
  doesn't exist.
- Only do one PHOLD in kern_ptrace() now, and do it earlier so it covers
  FIX_SSTEP() (since on alpha at least this can end up calling proc_rwmem()
  to clear an earlier single-step simualted via a breakpoint).  We only
  do one to avoid races.  Also, by making the EINVAL error for unknown
  requests be part of the default: case in the switch, the various
  switch cases can now just break out to return which removes a _lot_ of
  duplicated PRELE and proc unlocks, etc.  Also, it fixes at least one bug
  where a LWP ptrace command could return EINVAL with the proc lock still
  held.
- Changed the locking for ptrace_single_step(), ptrace_set_pc(), and
  ptrace_clear_single_step() to always be called with the proc lock
  held (it was a mixed bag previously).  Alpha and arm have to drop
  the lock while the mess around with breakpoints, but other archs
  avoid extra lock release/acquires in ptrace().  I did have to fix a
  couple of other consumers in kern_kse and a few other places to
  hold the proc lock and PHOLD.

Tested by:	ps (1 mostly, but some bits of 2-4 as well)
MFC after:	1 week
2006-02-22 18:57:50 +00:00
John Baldwin
8917b8d28c - Always call exec_free_args() in kern_execve() instead of doing it in all
the callers if the exec either succeeds or fails early.
- Move the code to call exit1() if the exec fails after the vmspace is
  gone to the bottom of kern_execve() to cut down on some code duplication.
2006-02-06 22:06:54 +00:00
Robert Watson
3f4b50a482 Regenerate. 2006-02-06 01:40:48 +00:00
Robert Watson
35d982a761 Assign audit event identifiers to Linux i386 system calls.
Obtained from:	TrustedBSD Project
2006-02-06 01:40:30 +00:00
Maxim Sobolev
900b28f9f6 Remove kern.elf32.can_exec_dyn sysctl. Instead extend Brandinfo structure
with flags bitfield and set BI_CAN_EXEC_DYN flag for all brands that usually
allow executing elf dynamic binaries (aka shared libraries). When it is
requested to execute ET_DYN elf image check if this flag is on after we
know the elf brand allowing execution if so.

PR:		kern/87615
Submitted by:	Marcin Koziej <creep@desk.pl>
2005-12-26 21:23:57 +00:00
John Baldwin
410d857972 Remove linux_mib_destroy() (which I actually added in between 5.0 and 5.1)
which existed to cleanup the linux_osname mutex.  Now that MTX_SYSINIT()
has grown a SYSUNINIT to destroy mutexes on unload, the extra destroy here
was redundant and resulted in panics in debug kernels.

MFC after:	1 week
Reported by:	Goran Gajic ggajic at afrodita dot rcub dot bg dot ac dot yu
2005-12-15 16:30:41 +00:00
John Baldwin
728ef95410 The signal code is now an int rather than a long, so update debug printfs. 2005-10-14 20:22:57 +00:00
David Xu
9104847f21 1. Change prototype of trapsignal and sendsig to use ksiginfo_t *, most
changes in MD code are trivial, before this change, trapsignal and
   sendsig use discrete parameters, now they uses member fields of
   ksiginfo_t structure. For sendsig, this change allows us to pass
   POSIX realtime signal value to user code.

2. Remove cpu_thread_siginfo, it is no longer needed because we now always
   generate ksiginfo_t data and feed it to libpthread.

3. Add p_sigqueue to proc structure to hold shared signals which were
   blocked by all threads in the proc.

4. Add td_sigqueue to thread structure to hold all signals delivered to
   thread.

5. i386 and amd64 now return POSIX standard si_code, other arches will
   be fixed.

6. In this sigqueue implementation, pending signal set is kept as before,
   an extra siginfo list holds additional siginfo_t data for signals.
   kernel code uses psignal() still behavior as before, it won't be failed
   even under memory pressure, only exception is when deleting a signal,
   we should call sigqueue_delete to remove signal from sigqueue but
   not SIGDELSET. Current there is no kernel code will deliver a signal
   with additional data, so kernel should be as stable as before,
   a ksiginfo can carry more information, for example, allow signal to
   be delivered but throw away siginfo data if memory is not enough.
   SIGKILL and SIGSTOP have fast path in sigqueue_add, because they can
   not be caught or masked.
   The sigqueue() syscall allows user code to queue a signal to target
   process, if resource is unavailable, EAGAIN will be returned as
   specification said.
   Just before thread exits, signal queue memory will be freed by
   sigqueue_flush.
   Current, all signals are allowed to be queued, not only realtime signals.

Earlier patch reviewed by: jhb, deischen
Tested on: i386, amd64
2005-10-14 12:43:47 +00:00
Maxim Sobolev
c035ac04e6 Propagate error code of kern_execve() to the caller properly.
PR:		81670
Submitted by:	Andrew Bliznak <andriko.b@gmail.com>
Pointy hat to:	sobomax
2005-08-01 17:35:48 +00:00
John Baldwin
813a5e14ec Move MODULE_DEPEND() statements for SYSVIPC dependencies to linux_ipc.c
so that they aren't duplicated 3 times and are also in the same file as
the code that depends on the SYSVIPC modules.
2005-07-29 19:40:39 +00:00
John Baldwin
ac5ee935dd Regen. 2005-07-13 20:35:09 +00:00
John Baldwin
8683e7fdc1 Make a pass through all the compat ABIs sychronizing the MP safe flags
with the master syscall table as well as marking several ABI wrapper
functions safe.

MFC after:	1 week
2005-07-13 20:32:42 +00:00
Xin LI
60baed3742 Remove the CPU_ENABLE_SSE option from the i386 and pc98 architectures,
as they are already default for I686_CPU for almost 3 years, and
CPU_DISABLE_SSE always disables it.  On the other hand, CPU_ENABLE_SSE
does not work for I486_CPU and I586_CPU.

This commit has:
	- Removed the option from conf/options.*
	- Removed the option and comments from MD NOTES files
	- Simplified the CPU_ENABLE_SSE ifdef's so they don't
	  deal with CPU_ENABLE_SSE from kernel configuration. (*)

For most users, this commit should be largely no-op.  If you used to
place CPU_ENABLE_SSE into your kernel configuration for some reason,
it is time to remove it.

(*) The ifdef's of CPU_ENABLE_SSE are not removed at this point, since
    we need to change it to !defined(CPU_DISABLE_SSE) && defined(I686_CPU),
    not just !defined(CPU_DISABLE_SSE), if we really want to do so.

Discussed on:	-arch
Approved by:	re (scottl)
2005-07-02 20:06:44 +00:00
Maxim Sobolev
ded18ff2ab Regen after addition of linux_getpriority wrapper.
PR:		kern/81951
Submitted by:	Andriy Gapon <avg@icyb.net.ua>
MFC after:	1 week
2005-06-08 20:47:30 +00:00
Maxim Sobolev
bc165ab0fe Properly convert FreeBSD priority values into Linux values in the
getpriority(2) syscall.

PR:		kern/81951
Submitted by:	Andriy Gapon <avg@icyb.net.ua>
2005-06-08 20:41:28 +00:00
Robert Watson
3984b2328c Rebuild generated system call definition files following the addition of
the audit event field to the syscalls.master file format.

Submitted by:	wsalamon
Obtained from:	TrustedBSD Project
2005-05-30 15:20:21 +00:00
Robert Watson
f3596e3370 Introduce a new field in the syscalls.master file format to hold the
audit event identifier associated with each system call, which will
be stored by makesyscalls.sh in the sy_auevent field of struct sysent.
For now, default the audit identifier on all system calls to AUE_NULL,
but in the near future, other BSM event identifiers will be used.  The
mapping of system calls to event identifiers is many:one due to
multiple system calls that map to the same end functionality across
compatibility wrappers, ABI wrappers, etc.

Submitted by:	wsalamon
Obtained from:	TrustedBSD Project
2005-05-30 15:09:18 +00:00
Matthew N. Dodd
73c730a694 Add support for O_NOFOLLOW and O_DIRECT to Linux fcntl() F_GETFL/F_SETFL. 2005-04-13 04:31:43 +00:00
John Baldwin
98df9218da - Change the vm_mmap() function to accept an objtype_t parameter specifying
the type of object represented by the handle argument.
- Allow vm_mmap() to map device memory via cdev objects in addition to
  vnodes and anonymous memory.  Note that mmaping a cdev directly does not
  currently perform any MAC checks like mapping a vnode does.
- Unbreak the DRM getbufs ioctl by having it call vm_mmap() directly on the
  cdev the ioctl is acting on rather than trying to find a suitable vnode
  to map from.

Reviewed by:	alc, arch@
2005-04-01 20:00:11 +00:00
Maxim Sobolev
ecab0de7c1 Regen after addition of linux_nosys handler. 2005-03-07 00:23:58 +00:00
Maxim Sobolev
e3478fe000 Handle unimplemented syscall by instantly returning ENOSYS instead of sending
signal first and only then returning ENOSYS to match what real linux does.

PR:		kern/74302
Submitted by:	Travis Poppe <tlp@LiquidX.org>
2005-03-07 00:18:06 +00:00
Maxim Sobolev
4b1783363f In linux emulation layer try to detect attempt to use linux_clone() to
create kernel threads and call rfork(2) with RFTHREAD flag set in this case,
which puts parent and child into the same threading group. As a result
all threads that belong to the same program end up in the same threading
group.

This is similar to what linuxthreads port does, though in this case we don't
have a luxury of having access to the source code and there is no definite
way to differentiate linux_clone() called for threading purposes from other
uses, so that we have to resort to heuristics.

Allow SIGTHR to be delivered between all processes in the same threading
group previously it has been blocked for s[ug]id processes.

This also should improve locking of the same file descriptor from different
threads in programs running under linux compat layer.

PR:			kern/72922
Reported by:		Andriy Gapon <avg@icyb.net.ua>
Idea suggested by:	rwatson
2005-03-03 16:57:55 +00:00
John Baldwin
0311233ecc Use linux_emul_convpath() rather than linux_emul_find() as
linux_emul_find() is going away.
2005-02-07 18:37:51 +00:00
John Baldwin
d9e9747163 Use the LCONVPATHEXIST() macro rather than it's exact expansion to be
consistent.
2005-02-07 18:37:13 +00:00
David Schultz
2a51b9b0aa When running Linux binaries, set up the initial FPU state as Linux
would.

PR:	28966
2005-02-06 17:29:20 +00:00
Maxim Sobolev
610ecfe035 o Split out kernel part of execve(2) syscall into two parts: one that
copies arguments into the kernel space and one that operates
  completely in the kernel space;

o use kernel-only version of execve(2) to kill another stackgap in
  linuxlator/i386.

Obtained from:  DragonFlyBSD (partially)
MFC after:      2 weeks
2005-01-29 23:12:00 +00:00
Maxim Sobolev
84569dff34 o Move copyin()/copyout() out of i386_{get,set}_ldt() and
i386_{get,set}_ioperm() and make those APIs visible in the kernel namespace;

o use i386_{get,set}_ldt() and i386_{get,set}_ioperm() instead of sysarch()
  in the linuxlator, which allows to kill another two stackgaps.

MFC after:	2 weeks
2005-01-26 13:59:46 +00:00
Warner Losh
86cb007f9f /* -> /*- for copyright notices, minor format tweaks as necessary 2005-01-06 22:18:23 +00:00
David Schultz
d3adf76902 Axe the semblance of support for PECOFF and Linux a.out core dumps. 2004-11-27 06:46:45 +00:00
David Schultz
0ef5c36ff1 Maintain the broken state of backwards compatibilty for a.out (and
PECOFF!) core dumps.  None of the old versions of gdb I tried were
able to read a.out core dumps before or after this change.

Reviewed by:	arch@
2004-11-20 02:32:04 +00:00
David Schultz
46ec41ecb4 Fix the following race:
1. Process p1 is currently being swapped in.
  2. Process p2 calls linux_ptrace(PTRACE_GETFPXREGS, p1_pid, ...)
  3. After acquiring a reference to FIRST_THREAD_IN_PROC(p1),
     p2 blocks in faultin() while p1 finishes being swapped in.
     This means p2 won't get back the lock on p1 until after p1's
     threads are runnable.
  4. After p1 is swapped in, the first thread in p1 exits.
  5. p2 now uses its dangling reference to p1's first thread.
2004-10-01 05:01:00 +00:00
Doug Rabson
bd263739c1 Regen. 2004-09-06 09:33:30 +00:00
Doug Rabson
1bc85c0dea Add a few stub syscalls to get TransGaming's winex a bit closer to
working.
2004-09-06 09:32:59 +00:00
Julian Elischer
2630e4c90c Give setrunqueue() and sched_add() more of a clue as to
where they are coming from and what is expected from them.

MFC after:	2 days
2004-09-01 02:11:28 +00:00
John Baldwin
ef36ad6921 Correct the arguments to kern_sigaltstack() as they were reversed.
PR:		kern/68079
Submitted by:	Georg-W. Koltermann gwk at rahn-koltermann dot de
2004-08-24 20:52:52 +00:00
John Baldwin
8a7aa72dec Regenerate after fcntl() wrappers were marked MP safe. 2004-08-24 20:24:34 +00:00
John Baldwin
2ca25ab53e Fix the ABI wrappers to use kern_fcntl() rather than calling fcntl()
directly.  This removes a few more users of the stackgap and also marks
the syscalls using these wrappers MP safe where appropriate.

Tested on:	i386 with linux acroread5
Compiled on:	i386, alpha LINT
2004-08-24 20:21:21 +00:00
Tim J. Robbins
0e73a96209 Add a new type, l_uintptr_t, which is an unsigned integer type with the
same width as a pointer under Linux. Add two new macros, PTRIN and PTROUT,
which convert between l_uintptr_t and native pointers.
2004-08-16 07:05:44 +00:00
Poul-Henning Kamp
3e019deaed Do a pass over all modules in the kernel and make them return EOPNOTSUPP
for unknown events.

A number of modules return EINVAL in this instance, and I have left
those alone for now and instead taught MOD_QUIESCE to accept this
as "didn't do anything".
2004-07-15 08:26:07 +00:00
Stefan Farfeleder
5908d366fb Consistently use __inline instead of __inline__ as the former is an empty macro
in <sys/cdefs.h> for compilers without support for inline.
2004-07-04 16:11:03 +00:00
David E. O'Brien
13a0a973c6 Add casts so all these quantities are a constant type. 2004-06-24 02:24:39 +00:00
Tim J. Robbins
f99619a0dc Change the types of vn_rdwr_inchunks()'s len and aresid arguments to
size_t and size_t *, respectively. Update callers for the new interface.
This is a better fix for overflows that occurred when dumping segments
larger than 2GB to core files.
2004-06-05 02:18:28 +00:00
Bruce M Simpson
cffd5a3eb4 Use the BSD madvise() syscall implementation for Linux binary emulation,
instead of treating it as an unimplemented syscall. This appears to make
StarOffice 7.0 Linux binaries work according to submitter; also tested
with nvidia driver by submitter.

Submitted by:	Matthias Schuendehuette
2004-03-28 21:43:27 +00:00
John Baldwin
e5592f4dd9 Regenerate. 2004-03-15 22:44:35 +00:00
John Baldwin
216db4ad00 - Mark ABI syscalls that call wait4() MP safe as recent changes to
the kernel wait4() made these all panic() implementations otherwise.
- The i386 linux_ptrace() syscall is MP safe.  Alpha was already marked
  MP safe.
2004-03-15 22:43:49 +00:00
John Baldwin
0804ed5acc Regen. 2004-02-04 22:00:44 +00:00
John Baldwin
c3b612d935 The following compat syscalls are now mpsafe: linux_getrlimit(),
linux_setrlimit(), linux_old_getrlimit(), osf1_getrlimit(),
osf1_setrlimit(), svr4_sys_ulimit(), svr4_sys_setrlimit(),
svr4_sys_getrlimit(), svr4_sys_setrlimit64(), svr4_sys_getrlimit64(),
ibcs2_sysconf(), and ibcs2_ulimit().
2004-02-04 21:57:00 +00:00
John Baldwin
91d5354a2c Locking for the per-process resource limits structure.
- struct plimit includes a mutex to protect a reference count.  The plimit
  structure is treated similarly to struct ucred in that is is always copy
  on write, so having a reference to a structure is sufficient to read from
  it without needing a further lock.
- The proc lock protects the p_limit pointer and must be held while reading
  limits from a process to keep the limit structure from changing out from
  under you while reading from it.
- Various global limits that are ints are not protected by a lock since
  int writes are atomic on all the archs we support and thus a lock
  wouldn't buy us anything.
- All accesses to individual resource limits from a process are abstracted
  behind a simple lim_rlimit(), lim_max(), and lim_cur() API that return
  either an rlimit, or the current or max individual limit of the specified
  resource from a process.
- dosetrlimit() was renamed to kern_setrlimit() to match existing style of
  other similar syscall helper functions.
- The alpha OSF/1 compat layer no longer calls getrlimit() and setrlimit()
  (it didn't used the stackgap when it should have) but uses lim_rlimit()
  and kern_setrlimit() instead.
- The svr4 compat no longer uses the stackgap for resource limits calls,
  but uses lim_rlimit() and kern_setrlimit() instead.
- The ibcs2 compat no longer uses the stackgap for resource limits.  It
  also no longer uses the stackgap for accessing sysctl's for the
  ibcs2_sysconf() syscall but uses kernel_sysctl() instead.  As a result,
  ibcs2_sysconf() no longer needs Giant.
- The p_rlimit macro no longer exists.

Submitted by:	mtm (mostly, I only did a few cleanups and catchups)
Tested on:	i386
Compiled on:	alpha, amd64
2004-02-04 21:52:57 +00:00
David Xu
9b778a1611 Make sigaltstack as per-threaded, because per-process sigaltstack state
is useless for threaded programs, multiple threads can not share same
stack.
The alternative signal stack is private for thread, no lock is needed,
the orignal P_ALTSTACK is now moved into td_pflags and renamed to
TDP_ALTSTACK.
For single thread or Linux clone() based threaded program, there is no
semantic changed, because those programs only have one kernel thread
in every process.
2004-01-03 23:31:29 +00:00
David Xu
a30ec4b99c Make sigaltstack as per-threaded, because per-process sigaltstack state
is useless for threaded programs, multiple threads can not share same
stack.
The alternative signal stack is private for thread, no lock is needed,
the orignal P_ALTSTACK is now moved into td_pflags and renamed to
TDP_ALTSTACK.
For single thread or Linux clone() based threaded program, there is no
semantic changed, because those programs only have one kernel thread
in every process.

Reviewed by: deischen, dfr
2004-01-03 02:02:26 +00:00
Bruce Evans
ff22c670d9 Sorted includes. Removed duplicates exposed by this. 2003-12-29 06:51:10 +00:00
Peter Wemm
bdbbbb1bb9 GC unused 'syshide' override to /dev/null. This was here to disable
the output of the namespc column.  Its functionality was removed some time
ago, but the overrides and the namespc column remained.
2003-12-24 00:32:07 +00:00
Peter Wemm
ff7a52b4b5 Regen (should be a NOP except for rcsid changes) 2003-12-23 03:55:06 +00:00
Peter Wemm
eca6e6472d GC unused third namespace column. 2003-12-23 03:54:40 +00:00
Peter Wemm
9b68618df0 Add an additional field to the elf brandinfo structure to support
quicker exec-time replacement of the elf interpreter on an emulation
environment where an entire /compat/* tree isn't really warranted.
2003-12-23 02:42:39 +00:00
Maxim Sobolev
d09c47acd9 Pull latest changes from OpenBSD:
- improve sysinfo(2) syscall;
- add dummy fadvise64(2) syscall;
- add dummy *xattr(2) family of syscalls;
- add protos for the syscalls 222-225, 238-249 and 253-267;
- add exit_group(2) syscall, which is currently just wired to exit(2).

Obtained from:  OpenBSD
MFC after:      2 weeks
2003-11-16 15:07:10 +00:00
John Baldwin
fab73bc221 Regen. 2003-11-07 21:36:35 +00:00
John Baldwin
572e11ac18 Sync up MP safe flags with global syscalls.master for the first time. This
includes read(), write(), close(), linux_setuid16(), linux_getuid16(),
linux_pause(), linux_nice(), linux_kill(), dup(), linux_pipe(),
linux_setgid16(), linux_getgid16(), linux_signal(), linux_geteuid16(),
linux_getegid16(), acct(), setpgid(), umask(), dup2(), getppid(),
getpgrp(), setsid(), linux_sigaction(), linux_sgetmask(), linux_ssetmask(),
linux_setreuid16(), linux_setregid16(), linux_sigsuspend(), getrusage(),
gettimeofday(), linux_getgroups16(), linux_setgroups16(), getpriority(),
setpriority(), linux_sigreturn(), linux_clone(), linux_sigprocmask(),
linux_getsid(), mlock(), munlock(), mlockall(), munlockall(),
sched_setparam(), sched_getparam(), linux_sched_setscheduler(),
linux_sched_getscheduler(), linux_sched_get_priority_max(),
linux_sched_get_priority_min(), sched_rr_get_interval(),
linux_setresuid16(), linux_getresuid16(), linux_setresgid16(),
linux_getresgid16(), linux_rt_sigaction(), linux_rt_sigprocmask(),
linux_rt_sigsuspend(), geteuid(), getegid(), setreuid(), setregid(),
linux_getgroups(), linux_setgroups(), setresuid(), getresuid(),
setresgid(), getresgid(), setuid(), and setgid().
2003-11-07 21:36:14 +00:00
Peter Wemm
c460ac3a00 Add sysentvec->sv_fixlimits() hook so that we can catch cases on 64 bit
systems where the data/stack/etc limits are too big for a 32 bit process.

Move the 5 or so identical instances of ELF_RTLD_ADDR() into imgact_elf.c.

Supply an ia32_fixlimits function.  Export the clip/default values to
sysctl under the compat.ia32 heirarchy.

Have mmap(0, ...) respect the current p->p_limits[RLIMIT_DATA].rlim_max
value rather than the sysctl tweakable variable.  This allows mmap to
place mappings at sensible locations when limits have been reduced.

Have the imgact_elf.c ld-elf.so.1 placement algorithm use the same
method as mmap(0, ...) now does.

Note that we cannot remove all references to the sysctl tweakable
maxdsiz etc variables because /etc/login.conf specifies a datasize
of 'unlimited'.  And that causes exec etc to fail since it can no
longer find space to mmap things.
2003-09-25 01:10:26 +00:00
Bruce Evans
2cf50c62a6 Restored non-egregious casts so that this file compiles on i386's with
64-bit longs again.
2003-09-07 13:23:45 +00:00
David Xu
0e2a4d3aeb Rename P_THREADED to P_SA. P_SA means a process is using scheduler
activations.
2003-06-15 00:31:24 +00:00
David E. O'Brien
27e0099c4a Use __FBSDID(). 2003-06-02 16:56:40 +00:00
John Baldwin
90af4afacb - Merge struct procsig with struct sigacts.
- Move struct sigacts out of the u-area and malloc() it using the
  M_SUBPROC malloc bucket.
- Add a small sigacts_*() API for managing sigacts structures: sigacts_alloc(),
  sigacts_free(), sigacts_copy(), sigacts_share(), and sigacts_shared().
- Remove the p_sigignore, p_sigacts, and p_sigcatch macros.
- Add a mutex to struct sigacts that protects all the members of the struct.
- Add sigacts locking.
- Remove Giant from nosys(), kill(), killpg(), and kern_sigaction() now
  that sigacts is locked.
- Several in-kernel functions such as psignal(), tdsignal(), trapsignal(),
  and thread_stopped() are now MP safe.

Reviewed by:	arch@
Approved by:	re (rwatson)
2003-05-13 20:36:02 +00:00
Matthew N. Dodd
598d45be84 Provide exec_linux_setregs() to override exec_setregs().
Linux initializes %gs to 0.  Mimic this behavior.

Submitted by:	 Christian Zander <zander@minion.de>
Reviewed by:	 jake
Approved by:	 re
2003-05-11 21:51:11 +00:00
John Baldwin
eeec6bab2e Prefer the proc lock to sched_lock when testing PS_INMEM now that it is
safe to do so.
2003-04-22 20:01:56 +00:00
John Baldwin
9eb78fcfd9 Synchronize the two linux_clone() implementations which includes a few
minor cleanups in both.
2003-04-18 20:54:41 +00:00
John Baldwin
e5567180fb Don't drop the proc lock just to reacquire it after a few simple assignment
statements.  Just hold the lock the entire time.
2003-04-17 22:18:07 +00:00
John Baldwin
f265002902 Sync up with changes to ptrace() and use P_SHOULDSTOP instead of
a duplicate P_TRACED check.

Submitted by:	marcel
2003-04-15 16:29:39 +00:00
Jeff Roberson
4093529dee - Move p->p_sigmask to td->td_sigmask. Signal masks will be per thread with
a follow on commit to kern_sig.c
 - signotify() now operates on a thread since unmasked pending signals are
   stored in the thread.
 - PS_NEEDSIGCHK moves to TDF_NEEDSIGCHK.
2003-03-31 22:49:17 +00:00
Jeff Roberson
1bf4700bff - Change trapsignal() to accept a thread and not a proc.
- Change all consumers to pass in a thread.

Right now this does not cause any functional changes but it will be important
later when signals can be delivered to specific threads.
2003-03-31 22:02:38 +00:00
John Baldwin
0f9d6538bb Add missing includes from previous commit.
Reported by:	des
2003-03-27 18:18:35 +00:00
John Baldwin
35eb8c5aa2 Add a cleanup function to destroy the osname_lock and call it on module
unload.

Submitted by:	gallatin
Reported by:	Martin Karlsson <mk-freebsd@bredband.net>
2003-03-26 18:29:44 +00:00
Matthew N. Dodd
91d631e536 Print the return value from mmap() in the DEBUG case. 2003-03-25 20:02:55 +00:00
John Baldwin
43cf129c99 Sync up linux and svr compat elf fixup functions for exec(). These
functions are now all basically identical except that alpha linux uses
Elf64 arguments and svr4 and i386 linux use Elf32.  The fixups include
changing the first argument to be a register_t ** to match the prototype
for fixup functions, asserting that the process in the image_params struct
is always curproc and removing unnecessary locking to read credentials as a
result, and a few style fixes.
2003-03-21 19:49:34 +00:00
Poul-Henning Kamp
b4b138c27f Including <sys/stdint.h> is (almost?) universally only to be able to use
%j in printfs, so put a newsted include in <sys/systm.h> where the printf
prototype lives and save everybody else the trouble.
2003-03-18 08:45:25 +00:00
Dag-Erling Smørgrav
1d062e2be8 Clean up whitespace and remove register keyword. 2003-03-03 09:17:12 +00:00
Dag-Erling Smørgrav
4b7ef73d71 More caddr_t removal, in conjunction with copy{in,out}(9) this time.
Also clean up some egregious casts and incorrect use of sizeof.
2003-03-03 09:14:26 +00:00
Alexander Kabaev
ba873f4c18 Correctly map SIGSYS signal to/from Linux.
Submitted by:   "Georg-W. Koltermann" <g.w.k@web.de>
2003-02-24 16:16:45 +00:00
Tim J. Robbins
8fcac23f33 Regen from syscalls.master 1.50. 2003-02-20 13:34:15 +00:00
Tim J. Robbins
c7edd68c94 Mark linux_getpid(), linux_getuid() and linux_getgid() as MPSAFE. 2003-02-20 13:32:48 +00:00
Warner Losh
a163d034fa Back out M_* changes, per decision of the TRB.
Approved by: trb
2003-02-19 05:47:46 +00:00
Tim J. Robbins
d97ccc7e54 Mark linux_sigpending() as MPSAFE. 2003-02-16 02:31:05 +00:00
Tim J. Robbins
a5ea48d458 Regen from syscalls.master 1.49. 2003-02-16 02:28:35 +00:00
Hajimu UMEMOTO
ca26842e2a Add IPv6 support for Linuxlator.
Reviewed by:	dwmalone
MFC after:	10 days
2003-02-03 17:43:20 +00:00
Alfred Perlstein
44956c9863 Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.
Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
2003-01-21 08:56:16 +00:00
Marcel Moolenaar
99d45c5f9d bzero() the sigframe before we fill it. This was not done at all in
linux_rt_sendsig() and only done for the fpstate in linux_sendsig().
2002-11-02 07:41:04 +00:00
Mark Murray
b07cd97ea8 Style(9). Make some function declarations consistent with the rest,
and remove some nearby extraneous {}'s.
2002-10-19 11:57:38 +00:00
Bruce Evans
b45bbfc36c Fixed syntax errors and printf format errors. 2002-10-12 13:48:21 +00:00
Maxim Sobolev
3ad9c842d2 - Add support for IPC_64 extensions into shmctl(2), semctl(2) and msgctl(2);
- add wrappers for mmap2(2) and ftruncate64(2) system calls;
- don't spam console with printf's when VFAT_READDIR_BOTH ioctl(2) is invoked;
- add support for SOUND_MIXER_READ_STEREODEVS ioctl(2);
- make msgctl(IPC_STAT) and IPC_SET actually working by converting from
  BSD msqid_ds to Linux and vice versa;
- properly return EINVAL if semget(2) is called with nsems being negative.

Reviewed by:	marcel
Approved by:	marcel
Tested with:	LSB runtime test
2002-10-11 11:43:09 +00:00
Scott Long
316ec49abd Some kernel threads try to do significant work, and the default KSTACK_PAGES
doesn't give them enough stack to do much before blowing away the pcb.
This adds MI and MD code to allow the allocation of an alternate kstack
who's size can be speficied when calling kthread_create.  Passing the
value 0 prevents the alternate kstack from being created.  Note that the
ia64 MD code is missing for now, and PowerPC was only partially written
due to the pmap.c being incomplete there.
Though this patch does not modify anything to make use of the alternate
kstack, acpi and usb are good candidates.

Reviewed by:	jake, peter, jhb
2002-10-02 07:44:29 +00:00
Jonathan Mini
044af7c357 Back out last commit. Linux uses the old 4.3BSD sockaddr format. 2002-09-24 07:03:01 +00:00
Jonathan Mini
d7f94a7a0e Don't use compatability syscall wrappers in emulation code.
This is needed for the COMPAT_FREEBSD3 option split.

Reviewed by:	alfred, jake
2002-09-23 06:17:54 +00:00
Jake Burkholder
05ba50f522 Use the fields in the sysentvec and in the vm map header in place of the
constants VM_MIN_ADDRESS, VM_MAXUSER_ADDRESS, USRSTACK and PS_STRINGS.
This is mainly so that they can be variable even for the native abi, based
on different machine types.  Get stack protections from the sysentvec too.
This makes it trivial to map the stack non-executable for certain abis, on
machines that support it.
2002-09-21 22:07:17 +00:00
Bruce Evans
e9a6d3b44c Removed unused includes. Sorted includes. This is part of removing
includes of <sys/user.h> for its pollution only.  <sys/user.h> wasn't
even used for its pollution here.
2002-09-15 17:45:10 +00:00
Julian Elischer
71fad9fdee Completely redo thread states.
Reviewed by:	davidxu@freebsd.org
2002-09-11 08:13:56 +00:00
Peter Wemm
a9f9df5daf Tidy up some loose ends that bde pointed out. caddr_t bad, ok?
Move fill_kinfo_proc to before we copy the results instead of after
the copy and too late.

There is still more to do here.
2002-09-07 22:31:44 +00:00
Peter Wemm
99a17113cd The true value of how the kernel was configured for KSTACK_PAGES was not
available at module compile time.  Do not #include the bogus
opt_kstack_pages.h at this point and instead refer to the variables that
are also exported via sysctl.
2002-09-07 22:15:47 +00:00
Juli Mallett
9d05b77d4f Diff reduction in comments for filling the siginfo structure - refer to
filling in the POSIX parts, when doing the same thing in every port of
FreeBSD.
2002-09-07 18:56:18 +00:00
Bruce Evans
8a1a68fa84 Include <machine/pcb.h> instead of depending on namespace pollution in
<sys/user.h>.
2002-09-07 14:32:22 +00:00
Peter Wemm
f7749f924c Automatically enable CPU_ENABLE_SSE (detect and enable SSE instructions)
if compiling with I686_CPU as a target.  CPU_DISABLE_SSE will prevent
this from happening and will guarantee the code is not compiled in.

I am still not happy with this, but gcc is now generating code that uses
these instructions if you set CPUTYPE to p3/p4 or athlon-4/mp/xp or higher.
2002-09-07 07:02:12 +00:00
Peter Wemm
7646aefc21 Supposedly linux has added a 6th syscall arg register (%ebp). I am not
100% sure if this is enough, but it will not harm anything.
2002-09-07 04:59:49 +00:00
Peter Wemm
a9148ab103 Give this a self contained a.out coredump routine.
XXX freebsd-aout coredumps for a linux-aout binary is a bit pointless.
2002-09-07 01:29:21 +00:00
Bruce Evans
4db068aabc Include <sys/systm.h> for the definition of offsetof() instead of depending
on the definition being misplaced in <sys/types.h>.  The definition probably
belongs in <sys/stddef.h>.
2002-09-05 12:58:57 +00:00
Ian Dowse
012e544f12 Split up ptrace() into a wrapper that does the copying to and from
user space and a kern_ptrace() implementation. Use the kern_*()
version in the Linux emulation code to remove more stack gap uses.

Approved by:	des
2002-09-05 01:02:50 +00:00
Ian Dowse
206a5d3a0c Use the new kern_* functions to avoid the need to store arguments
in the stack gap. This converts most VFS and signal related system
calls, as well as select().

Discussed on:	-arch
Approved by:	marcel
2002-09-01 22:30:27 +00:00
Jake Burkholder
f36ba45234 Added fields for VM_MIN_ADDRESS, PS_STRINGS and stack protections to
sysentvec.  Initialized all fields of all sysentvecs, which will allow
them to be used instead of constants in more places.  Provided stack
fixup routines for emulations that previously used the default.
2002-09-01 21:41:24 +00:00
Jeff Roberson
619eb6e579 - Hold the vnode lock throughout execve.
- Set VV_TEXT in the top level execve code.
 - Fixup the image activators to deal with the newly locked vnode.
2002-08-13 06:55:28 +00:00
Jeff Roberson
e6e370a7fe - Replace v_flag with v_iflag and v_vflag
- v_vflag is protected by the vnode lock and is used when synchronization
   with VOP calls is needed.
 - v_iflag is protected by interlock and is used for dealing with vnode
   management issues.  These flags include X/O LOCK, FREE, DOOMED, etc.
 - All accesses to v_iflag and v_vflag have either been locked or marked with
   mp_fixme's.
 - Many ASSERT_VOP_LOCKED calls have been added where the locking was not
   clear.
 - Many functions in vfs_subr.c were restructured to provide for stronger
   locking.

Idea stolen from:	BSD/OS
2002-08-04 10:29:36 +00:00
Peter Wemm
3ebc124838 Infrastructure tweaks to allow having both an Elf32 and an Elf64 executable
handler in the kernel at the same time.  Also, allow for the
exec_new_vmspace() code to build a different sized vmspace depending on
the executable environment.  This is a big help for execing i386 binaries
on ia64.   The ELF exec code grows the ability to map partial pages when
there is a page size difference, eg: emulating 4K pages on 8K or 16K
hardware pages.

Flesh out the i386 emulation support for ia64.  At this point, the only
binary that I know of that fails is cvsup, because the cvsup runtime
tries to execute code in pages not marked executable.

Obtained from:  dfr (mostly, many tweaks from me).
2002-07-20 02:56:12 +00:00
Robert Drehmel
aaaefc6b56 Enable emulation of the F_GETLK64, F_SETLK64, and F_SETLKW64
lock commands arguments to linux_fcntl64().
2002-07-09 15:57:12 +00:00
Julian Elischer
e602ba25fd Part 1 of KSE-III
The ability to schedule multiple threads per process
(one one cpu) by making ALL system calls optionally asynchronous.
to come: ia64 and power-pc patches, patches for gdb, test program (in tools)

Reviewed by:	Almost everyone who counts
	(at various times, peter, jhb, matt, alfred, mini, bernd,
	and a cast of thousands)

	NOTE: this is still Beta code, and contains lots of debugging stuff.
	expect slight instability in signals..
2002-06-29 17:26:22 +00:00
Jens Schweikhardt
21dc7d4f57 Fix typo in the BSD copyright: s/withough/without/
Spotted and suggested by:	des
MFC after:	3 weeks
2002-06-02 20:05:59 +00:00
Marcel Moolenaar
1b5aeb4347 o Fix race condition caused by doing ptrace() for permission
checking, followed by a lookup of the process. Do not call
   ptrace() for permission checking, but do it inline.
   Spotted by: rwatson

o  While here, copy-in arguments before we lock. This fixes
   a possible permanent lock.

Reviewed by: rwatson
2002-05-19 19:35:36 +00:00
Marcel Moolenaar
c444f61706 Hook up the new linux_ptrace implementation.
PR: 33299
Submitted by: Alexander N. Kabaev <ak03@gte.com>
2002-05-19 01:27:14 +00:00
Marcel Moolenaar
9ed93e32bc Regen (linux_ptrace)
PR: 33299
2002-05-19 01:23:33 +00:00
Marcel Moolenaar
6b5a528e88 Sparkling new implementation of linux_ptrace. Slight tweaking by
yours truly.

PR: 33299
Submitted by: Alexander N. Kabaev <ak03@gte.com>
2002-05-19 01:21:55 +00:00
Eric Melville
1b20ff34a3 Spell "separate" correctly. 2002-04-05 00:04:56 +00:00
Bruce Evans
79065dba2a Moved signal handling and rescheduling from userret() to ast() so that
they aren't in the usual path of execution for syscalls and traps.
The main complication for this is that we have to set flags to control
ast() everywhere that changes the signal mask.

Avoid locking in userret() in most of the remaining cases.

Submitted by:	luoqi (first part only, long ago, reorganized by me)
Reminded by:	dillon
2002-04-04 17:49:48 +00:00
John Baldwin
44731cab3b Change the suser() API to take advantage of td_ucred as well as do a
general cleanup of the API.  The entire API now consists of two functions
similar to the pre-KSE API.  The suser() function takes a thread pointer
as its only argument.  The td_ucred member of this thread must be valid
so the only valid thread pointers are curthread and a few kernel threads
such as thread0.  The suser_cred() function takes a pointer to a struct
ucred as its first argument and an integer flag as its second argument.
The flag is currently only used for the PRISON_ROOT flag.

Discussed on:	smp@
2002-04-01 21:31:13 +00:00
Bruce Evans
bda2a3af25 Fixed some style bugs in the removal of __P(()). Continuation lines
were not outdented to preserve non-KNF lining up of code with parentheses.
Switch to KNF formatting.
2002-03-24 04:09:05 +00:00
Alfred Perlstein
89c9a48352 Remove __P. 2002-03-20 07:51:46 +00:00
Alan Cox
89734883fa Eliminate unnecessary calls to grow_stack() and useracc() from linux_sendsig()
and linux_rt_sendsig().  (See i386/i386/machdep.c revisions 1.503 and 1.504.)
2002-03-19 04:54:30 +00:00
Peter Wemm
6aea67779a Fix format warning.
Submitted by:	LINT, -Werror
2002-02-27 23:21:46 +00:00
John Baldwin
a854ed9893 Simple p_ucred -> td_ucred changes to start using the per-thread ucred
reference.
2002-02-27 18:32:23 +00:00
Julian Elischer
079b7badea Pre-KSE/M3 commit.
this is a low-functionality change that changes the kernel to access the main
thread of a process via the linked list of threads rather than
assuming that it is embedded in the process. It IS still embeded there
but remove all teh code that assumes that in preparation for the next commit
which will actually move it out.

Reviewed by: peter@freebsd.org, gallatin@cs.duke.edu, benno rice,
2002-02-07 20:58:47 +00:00
Bruce Evans
846ac2266b Clear the single-step flag for signal handlers. This fixes bogus trace
traps on the first instruction of signal handlers.

In trap.c:syscall(), fake a trace trap if the single-step flag was set
on entry to the kernel, not if it will be set on exit from the kernel.
This fixes bogus trace traps after the last instruction of signal handlers.

gdb-4.18 (the version in FreeBSD) still has problems with the program in
the PR.  These seem to be due to bugs in gdb and not in FreeBSD, and are
fixed in gdb-5.1 (the distribution version).

PR:		33262
Tested by:	k Macy <kip_macy@yahoo.com>
MFC after:	1 day
2002-01-10 11:49:55 +00:00
Pierre Beyssac
27a828fcb6 Convert BSD trap codes to i386.
Submitted by:	F. Gouget <fgouget@free.fr>
2001-11-20 09:39:31 +00:00
Dag-Erling Smørgrav
a08d68de5b Eliminate the prefix parameter to linux_emul_find(), which was always
linux_emul_path anyway.  Linux_emul_find() has interesting bugs in its
prefix handling (which luckily are not currently exploitable); this
commit is preliminary to an attempt at cleaning it up.

Approved by:	marcel
2001-10-27 11:15:19 +00:00
Marcel Moolenaar
4c1e3817c4 Implement linux_chown and linux_lchown. The fchown syscall maps
directly to the native syscall, because no filename handling
needs to be done.

Tested by: Martin Blapp <mb@imp.ch>
2001-10-16 06:15:36 +00:00
Marcel Moolenaar
2bf1eed95b o Change prototype of linux_lchown and linux_chown so that the
argument names match those on Alpha.
o  Map the fchown directly to FreeBSD. Since the old version of
   fchown is also mapped to the native fchown, give the new one
   type NODEF.

Tested by: Martin Blapp <mb@imp.ch>
2001-10-16 06:11:11 +00:00
Dag-Erling Smørgrav
268aeb1ed3 In FreeBSD's ifreq, ifr_ifru.ifru_flags is an array of two chars, while Linux
defines it as a short.  Change that to an array of one short so that FreeBSD's
ifr_flags macro will work (it evaluates to ifr_ifru.ifru_flags[0]).
2001-10-15 20:06:34 +00:00
John Baldwin
fa78c35ad2 Oops, these already included sys/lock.h, they just did so after
sys/mutex.h which is too late.
2001-10-11 18:25:57 +00:00
John Baldwin
7106ca0d1a Add missing includes of sys/lock.h. 2001-10-11 17:52:20 +00:00
Paul Saab
cbc89bfbfe Make MAXTSIZ, DFLDSIZ, MAXDSIZ, DFLSSIZ, MAXSSIZ, SGROWSIZ loader
tunable.

Reviewed by:	peter
MFC after:	2 weeks
2001-10-10 23:06:54 +00:00
Marcel Moolenaar
745190dc80 Regen: Stop using linux_getpgid(). Use the native getpgid()
instead.

PR: kern/21402
2001-09-28 01:32:27 +00:00
Marcel Moolenaar
52e9761e22 Stop using linux_getpgid(). The implementation at this time is
broken and fixing it only creates a duplicate of what is already
in the FreeBSD kernel. Therefore, map the syscall directly to
getpgid().

PR: kern/21402
Submitted by: Christian Weisgerber <naddy@mips.inka.de>
2001-09-28 01:30:59 +00:00
Robert Watson
41c42188c8 o Modify access control checks in linux_iopl() to use securelevel_gt()
rather than direct variable checks.  (Yet another API to perform
  direct hardware I/O.)

Obtained from:	TrustedBSD Project
2001-09-26 20:22:38 +00:00
John Baldwin
2509e6c20b Add a lock assertion to linux_sendsig() to match other sendsig functions. 2001-09-17 17:22:31 +00:00
Michael Reifenberger
b8febfd1f2 Add a wrapper for linux_getsid -> getsid Syscall. 2001-09-15 09:57:30 +00:00
Michael Reifenberger
a6e5348e22 Implement LINUX_[SEM|IPC]_[STAT|INFO]
to make /compat/linux/usr/bin/ipcs -s happy.

PR:		kern/29698 (part)
Reviewed by:	audit
2001-09-15 09:50:38 +00:00
Julian Elischer
b40ce4165d KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after:    ha ha ha ha
2001-09-12 08:38:13 +00:00
Marcel Moolenaar
e061a6ca19 Fix LINT breakage caused by previous commit. The linux_rt_sendsig
and linux_sendsig functions guarded their debugging output with
ldebug(sigreturn). This has been mistaken for a cut-n-paste bug,
and was replaced by ldebug(rt_sendsig) and ldebug(sendsig) resp.
Since the sendsig functions are not syscalls, this brokei any
build that defines DEBUG.

The fix maps both functions to the unused syscall 0 so that they
can be enabled/disabled independently from sigreturn, but not
independently from each other.
2001-09-10 07:00:17 +00:00
Marcel Moolenaar
5002a60f9b Round of cleanups and enhancements. These include (in random order):
o  Introduce private types for use in linux syscalls for two reasons:
   1. establish type independence for ease in porting and,
   2. provide a visual queue as to which syscalls have proper
      prototypes to further cleanup the i386/alpha split.
   Linuxulator types are prefixed by 'l_'. void and char have not
   been "virtualized".

o  Provide dummy functions for all syscalls and remove dummy functions
   or implementations of truely obsolete syscalls.

o  Sanitize the shm*, sem* and msg* syscalls.

o  Make a first attempt to implement the linux_sysctl syscall. At this
   time it only returns one MIB (KERN_VERSION), but most importantly,
   it tells us when we need to add additional sysctls :-)

o  Bump the kenel version up to 2.4.2 (this is not the same as the
   KERN_VERSION MIB, BTW).

o  Implement new syscalls, of which most are specific to i386. Our
   syscall table is now up to date with Linux 2.4.2. Some highlights:
   -  Implement the 32-bit uid_t and gid_t bases syscalls.
   -  Implement a couple of 64-bit file size/offset bases syscalls.

o  Fix or improve numerous syscalls and prototypes.

o  Reduce style(9) violations while I'm here. Especially indentation
   inconsistencies within the same file are addressed. Re-indenting
   did not obfuscate actual changes to the extend that it could not
   be combined.

NOTE: I spend some time testing these changes and found that if there
      were regressions, they were not caused by these changes AFAICT.
      It was observed that installing a RH 7.1 runtime environment
      did make matters worse. Hangs and/or reboots have been observed
      with and without these changes, so when it failed to make life
      better in cases it doesn't look like it made it worse.
2001-09-08 19:07:04 +00:00
Marcel Moolenaar
f981a79ed9 o Introduce private types for use in linux syscalls for two reasons:
1. establish type independence for ease in porting and,
   2. provide a visual queue as to which syscalls have proper
      prototypes to further cleanup the i386/alpha split.
   Linuxulator types are prefixed by 'l_'. void and char have not
   been "virtualized".

o  Remove dummy functions for syscalls that are now truely
   unimplemented.

o  Rename syscalls so they match the names used in the Linux kernel.
   Also, provide more accurate prototypes. This generally improves
   cross-referencing and reduces head-scratching.

o  Provide seperate implementations for the 16-bit uid_t and gid_t
   based syscalls as Linux used to have. The new 32-bit uid_t and
   gid_t based syscalls now map to their FreeBSD equivalents.

o  Fix the linux_ipc syscall so that it doesn't force the shm*, sem*
   and msg* syscalls to have the same syscall. The prototypes for
   these syscalls now match the those used on Alpha. While here,
   add the same kludge for MSGRCV as is present in the Linux kernel.

o  Implement the following syscalls:
     linux_stat64, linux_lstat64 and linux_fstat64
     linux_sysctl

o  Added syscalls numbered 198 - 221. This include:
     - the 32-bit uid_t and gid_t bases syscalls
     - 64-bit file offset/size based syscalls
2001-09-08 18:48:40 +00:00
John Baldwin
df53e91c18 Call sendsig() with the proc lock held and return with it held. 2001-09-06 22:20:41 +00:00
Matthew Dillon
257d198890 Synchronize syscalls.master(s) with recent Giant pushdown work 2001-09-01 19:36:48 +00:00
Matthew Dillon
356861db03 Remove the MPSAFE keyword from the parser for syscalls.master.
Instead introduce the [M] prefix to existing keywords.  e.g.
MSTD is the MP SAFE version of STD.  This is prepatory for a
massive Giant lock pushdown.  The old MPSAFE keyword made
syscalls.master too messy.

Begin comments MP-Safe procedures with the comment:
/*
 * MPSAFE
 */
This comments means that the procedure may be called without
Giant held (The procedure itself may still need to obtain
Giant temporarily to do its thing).

sv_prepsyscall() is now MP SAFE and assumed to be MP SAFE
sv_transtrap() is now MP SAFE and assumed to be MP SAFE

ktrsyscall() and ktrsysret() are now MP SAFE (Giant Pushdown)
trapsignal() is now MP SAFE (Giant Pushdown)

Places which used to do the if (mtx_owned(&Giant)) mtx_unlock(&Giant)
test in syscall[2]() in */*/trap.c now do not.  Instead they
explicitly unlock Giant if they previously obtained it, and then
assert that it is no longer held to catch broken system calls.

Rebuild syscall tables.
2001-08-30 18:50:57 +00:00
Jim Pirzyk
814c95264f Added the linux_sysinfo function to implement sysinfo(2).
PR:		kern/27759
Reviewed by:	marcel
Approved by:	marcel
MFC after:	1 week
2001-07-23 06:22:10 +00:00
Jim Pirzyk
3d39316d2b Added the proper arguments the sysinfo system call
PR:		kern/27759
Reviewed by:	marcel
Approved by:	marcel
Obtained from:	Linux man page sysinfo(2)
MFC after:	1 week
2001-07-23 06:17:34 +00:00
John Baldwin
6be523bca7 Add a new MI pointer to the process' trapframe p_frame instead of using
various differently named pointers buried under p_md.

Reviewed by:	jake (in principle)
2001-06-29 11:10:41 +00:00
Peter Wemm
f41325db5f With this commit, I hereby pronounce gensetdefs past its use-by date.
Replace the a.out emulation of 'struct linker_set' with something
a little more flexible.  <sys/linker_set.h> now provides macros for
accessing elements and completely hides the implementation.

The linker_set.h macros have been on the back burner in various
forms since 1998 and has ideas and code from Mike Smith (SET_FOREACH()),
John Polstra (ELF clue) and myself (cleaned up API and the conversion
of the rest of the kernel to use it).

The macros declare a strongly typed set.  They return elements with the
type that you declare the set with, rather than a generic void *.

For ELF, we use the magic ld symbols (__start_<setname> and
__stop_<setname>).  Thanks to Richard Henderson <rth@redhat.com> for the
trick about how to force ld to provide them for kld's.

For a.out, we use the old linker_set struct.

NOTE: the item lists are no longer null terminated.  This is why
the code impact is high in certain areas.

The runtime linker has a new method to find the linker set
boundaries depending on which backend format is in use.

linker sets are still module/kld unfriendly and should never be used
for anything that may be modular one day.

Reviewed by:	eivind
2001-06-13 10:58:39 +00:00