with the sendsig code in the MD area. It is not safe to assume that all
the register conventions will be the same. Also, the way of producing
32 bit code (.code32 directives) in this file is amd64 specific.
The split-up code is derived from the ia64 code originally.
Note that I have only compile-tested this, not actually run-tested it.
The ia64 side of the force is missing some significant chunks of signal
delivery code.
purpose and the resulting vattr structure was ignored. In addition,
the VOP_GETATTR call was made with no vnode lock held, resulting in
vnode locking violation panic with debug kernels.
Reported by: truckman
Approved by: re@ (rwatson)
- improve sysinfo(2) syscall;
- add dummy fadvise64(2) syscall;
- add dummy *xattr(2) family of syscalls;
- add protos for the syscalls 222-225, 238-249 and 253-267;
- add exit_group(2) syscall, which is currently just wired to exit(2).
Obtained from: OpenBSD
MFC after: 2 weeks
is highly MD in an emulation environment since it operates on the host
environment. Although the setregs functions are really for exec support
rather than signals, they deal with the same sorts of context and include
files. So I put it there rather than create yet another file.
1.36 +73 -60 src/sys/compat/linux/linux_ipc.c
1.83 +102 -48 src/sys/kern/sysv_shm.c
1.8 +4 -0 src/sys/sys/syscallsubr.h
That change was intended to support vmware3, but
wantrem parameter is useless because vmware3 uses SYSV shared memory
to talk with X server and X server is native application.
The patch worked because check for wantrem was not valid
(wantrem and SHMSEG_REMOVED was never checked for SHMSEG_ALLOCATED segments).
Add kern.ipc.shm_allow_removed (integer, rw) sysctl (default 0) which when set
to 1 allows to return removed segments in
shm_find_segment_by_shmid() and shm_find_segment_by_shmidx().
MFC after: 1 week
if_xname, if_dname, and if_dunit. if_xname is the name of the interface
and if_dname/unit are the driver name and instance.
This change paves the way for interface renaming and enhanced pseudo
device creation and configuration symantics.
Approved By: re (in principle)
Reviewed By: njl, imp
Tested On: i386, amd64, sparc64
Obtained From: NetBSD (if_xname)
structures come out the right size.
Fix the ones that broke. stat32 had some missing fields from the end
and statfs32 was broken due to the strange definition of MNAMELEN
(which is dependent on sizeof(long))
I'm not sure if this fixes any actual problems or not.
- Return NULL instead of returning memory outside of the stackgap
in stackgap_alloc() (FreeBSD-SA-00:42.linux)
- Check for stackgap_alloc() returning NULL in svr4_emul_find(),
and clean_pipe().
- Avoid integer overflow on large nfds argument in svr4_sys_poll()
- Reject negative nbytes argument in svr4_sys_getdents()
- Don't copy out past the end of the struct componentname
pathname buffer in svr4_sys_resolvepath()
- Reject out-of-range signal numbers in svr4_sys_sigaction(),
svr4_sys_signal(), and svr4_sys_kill().
- Don't malloc() user-specified lengths in show_ioc() and
show_strbuf(), place arbitrary limits instead.
- Range-check lengths in si_listen(), ti_getinfo(), ti_bind(),
svr4_do_putmsg(), svr4_do_getmsg(), svr4_stream_ti_ioctl().
Some fixes obtain from OpenBSD.
- Allocate storage for uap->msg always because it is copyin()'ed in
native sendmsg().
- Convert sockopt level from Linux to FreeBSD after native recvmsg() calling.
- Some cleanups.
Tested with: Oracle 9i shared server connection mode.
MFC after: 1 week
systems where the data/stack/etc limits are too big for a 32 bit process.
Move the 5 or so identical instances of ELF_RTLD_ADDR() into imgact_elf.c.
Supply an ia32_fixlimits function. Export the clip/default values to
sysctl under the compat.ia32 heirarchy.
Have mmap(0, ...) respect the current p->p_limits[RLIMIT_DATA].rlim_max
value rather than the sysctl tweakable variable. This allows mmap to
place mappings at sensible locations when limits have been reduced.
Have the imgact_elf.c ld-elf.so.1 placement algorithm use the same
method as mmap(0, ...) now does.
Note that we cannot remove all references to the sysctl tweakable
maxdsiz etc variables because /etc/login.conf specifies a datasize
of 'unlimited'. And that causes exec etc to fail since it can no
longer find space to mmap things.
with 64-bit longs again. This was fixed in rev.1.42 but the fix
rotted non-fatally in rev.1.105 and fatally in rev.1.137.
Many more non-egregrious casts are strictly required for conversions
from semi-opaque types to pointers, but we avoid most of them by using
types that are almost certain to be compatible with uintptr_t for
representing pointers (e.g., vm_offset_t). Here we don't really want
the u_longs, but we have them because a.out.h and its support code
doesn't use typedefs (it uses unsigned in V7 and unsigned long in
FreeBSD) and is too obsolete to fix now.
from the ia32 specific stuff. Some of this still needs to move to the MI
freebsd32 area, and some needs to move to the MD area. This is still
work-in-progress.
like we have on other platforms. Move savectx() to <machine/pcb.h>.
A lot of files got these MD prototypes through the indirect inclusion
of <machine/cpu.h> and now need to include <machine/md_var.h>. The
number of which is unexpectedly large...
osf1_misc.c especially is tricky because szsigcode is redefined in
one of the osf1 header files. Reordering of the include files was
needed.
linprocfs.c now needs an explicit extern declaration.
Tested with: LINT
- cut the version string at the newline, suppressing information about
who built the kernel and in what directory. Most of this information
was already lost to truncation.
- on i386, return the precise CPU class (if known) rather than just
"i386". Linux software which uses this information to select
which binary to run often does not know what to make of "i386".
contain the filedescriptor number on opens from userland.
The index is used rather than a "struct file *" since it conveys a bit
more information, which may be useful to in particular fdescfs and /dev/fd/*
For now pass -1 all over the place.
paging space and how much of it is in use (in pages).
Use this interface from the Linuxolator instead of groping around in the
internals of the swap_pager.
the VMIN and VTIME members of the c_cc array. These members are not
special control characters. By not excluding these members we
changed the noncanonical mode input processing when both members
were 0 on entry (=LINUX_POSIX_VDISABLE) as we would remap them to 255
(=_POSIX_VDISABLE). See termios(4) case A for how that screws up
your terminal I/O.
PR: 23173
Originator: Bjarne Blichfeldt <bbl@dk.damgaard.com>
Patch by: Boris Nikolaus <bn@dali.tellique.de> (original submission)
Philipp Mergenthaler <philipp.mergenthaler@stud.uni-karlsruhe.de>
Reminders by: Joseph Holland King <gte743n@cad.gatech.edu>
MFC after: 5 days
Several of the subtypes have an associated vnode which is used for
stuff like the f*() functions.
By giving the vnode a speparate field, a number of checks for the specific
subtype can be replaced simply with a check for f_vnode != NULL, and
we can later free f_data up to subtype specific use.
At this point in time, f_data still points to the vnode, so any code I
might have overlooked will still work.
having their stack at the 512GB mark. Give 4GB of user VM space for 32
bit apps. Note that this is significantly more than on i386 which gives
only about 2.9GB of user VM to a process (1GB for kernel, plus page
table pages which eat user VM space).
Approved by: re (blanket)
load_gs() calls into a single place that is less likely to go wrong.
Eliminate the per-process context switching of MSR_GSBASE, because it
should be constant for a single cpu. Instead, save/restore it during
the loading of the new %gs selector for the new process.
Approved by: re (amd64/* blanket)
stolen from the ia64/ia32 code (indeed there was a repocopy), but I've
redone the MD parts and added and fixed a few essential syscalls. It
is sufficient to run i386 binaries like /bin/ls, /usr/bin/id (dynamic)
and p4. The ia64 code has not implemented signal delivery, so I had
to do that.
Before you say it, yes, this does need to go in a common place. But
we're in a freeze at the moment and I didn't want to risk breaking ia64.
I will sort this out after the freeze so that the common code is in a
common place.
On the AMD64 side, this required adding segment selector context switch
support and some other support infrastructure. The %fs/%gs etc code
is hairy because loading %gs will clobber the kernel's current MSR_GSBASE
setting. The segment selectors are not used by the kernel, so they're only
changed at context switch time or when changing modes. This still needs
to be optimized.
Approved by: re (amd64/* blanket)
- Move struct sigacts out of the u-area and malloc() it using the
M_SUBPROC malloc bucket.
- Add a small sigacts_*() API for managing sigacts structures: sigacts_alloc(),
sigacts_free(), sigacts_copy(), sigacts_share(), and sigacts_shared().
- Remove the p_sigignore, p_sigacts, and p_sigcatch macros.
- Add a mutex to struct sigacts that protects all the members of the struct.
- Add sigacts locking.
- Remove Giant from nosys(), kill(), killpg(), and kern_sigaction() now
that sigacts is locked.
- Several in-kernel functions such as psignal(), tdsignal(), trapsignal(),
and thread_stopped() are now MP safe.
Reviewed by: arch@
Approved by: re (rwatson)
argument to the functions shm{at,ctl}1 and shm_find_segment_by_shmid{x}.
The BSD semantics didn't allow the usage of shared segment after
being marked for removal through IPC_RMID.
The patch involves the following functions:
- shmat
- shmctl
- shm_find_segment_by_shmid
- shm_find_segment_by_shmidx
- linux_shmat
- linux_shmctl
Submitted by: Orlando Bassotto <orlando.bassotto@ieo-research.it>
Reviewed by: marcel
do for newstat_copyout().
Lie about disk drives which are character devices
in FreeBSD but block devices under Linux.
PR: 37227
Submitted by: Vladimir B. Grebenschikov <vova@sw.ru>
Reviewed by: phk
MFC after: 2 weeks
FreeBSD flags instead of just adding one to the Linux flags. This should
be identical to the previous version except that I have at least one report
of this patch fixing problems people were having with Linux apps after my
last commit to this file. It is safer to use the switch then to make
assumptions about the flag values anyways, esp. since we currently use
MD defines for the values of the flags and this is MI code.
Tested by: Michael Class <michael_class@gmx.net>
kern_sigprocmask() in the various binary compatibility emulators.
- Replace calls to sigsuspend(), sigaltstack(), sigaction(), and
sigprocmask() that used the stackgap with calls to the corresponding
kern_sig*() functions instead without using the stackgap.
by allprison_mtx), a unique prison/jail identifier field, two path
fields (pr_path for reporting and pr_root vnode instance) to store
the chroot() point of each jail.
o Add jail_attach(2) to allow a process to bind to an existing jail.
o Add change_root() to perform the chroot operation on a specified
vnode.
o Generalize change_dir() to accept a vnode, and move namei() calls
to callers of change_dir().
o Add a new sysctl (security.jail.list) which is a group of
struct xprison instances that represent a snapshot of active jails.
Reviewed by: rwatson, tjr
a follow on commit to kern_sig.c
- signotify() now operates on a thread since unmasked pending signals are
stored in the thread.
- PS_NEEDSIGCHK moves to TDF_NEEDSIGCHK.
functions are now all basically identical except that alpha linux uses
Elf64 arguments and svr4 and i386 linux use Elf32. The fixups include
changing the first argument to be a register_t ** to match the prototype
for fixup functions, asserting that the process in the image_params struct
is always curproc and removing unnecessary locking to read credentials as a
result, and a few style fixes.
but I decided that it was important for this patch to not bit-rot, and
since it is mainly moving code around, the total amount of entropy is
epsilon /phk)
This is a patch to move the common parts of linux_getcwd() back into
kern/vfs_cache.c so that the standard FreeBSD libc getcwd() can use it's
extended functionality. The linux syscall linux_getcwd() in
compat/linux/linux_getcwd.c has been rewritten to use it too. It should
be possible to simplify libc's getcwd() after this. No doubt this code
needs some cleaning up, since I've left in the sysctl variables I used
for debugging.
PR: 48169
Submitted by: James Whitwell <abacau@yahoo.com.au>
take a thread instead of a proc for their first argument.
- Add a mutex to protect the system-wide Linux osname, osrelease, and
oss_version variables.
- Change linux_get_prison() to take a thread instead of a proc for its
first argument and to use td_ucred rather than p_ucred. This is ok
because a thread's prison does not change even though it's ucred might.
- Also, change linux_get_prison() to return a struct prison * instead of
a struct linux_prison * since it returns with the struct prison locked
and this makes it easier to safely unlock the prison when we are done
messing with it.
sched_lock around accesses to p_stats->p_timer[] to avoid a potential
race with hardclock. getitimer(), setitimer() and the realitexpire()
callout are now Giant-free.
so be more careful about calling stackgap_init.
Tested by: Fred Souza <fred@storming.org>
2) Linux_sendmsg was forgetting to fill out the bsd_args struct.
Reviewed by: ume
3) The args to linux_connect have differently named types on alpha and
i386, so add a cast to stop gcc complaining.
Spotted by: peter
pointer types, and remove a huge number of casts from code using it.
Change struct xfile xf_data to xun_data (ABI is still compatible).
If we need to add a #define for f_data and xf_data we can, but I don't
think it will be necessary. There are no operational changes in this
commit.
code, make the emulator use it.
Rename unsupported_msg() to unimplemented_syscall(). Rename some arguments
for clarity
Fixup grammar.
Requested by: bde
the same as fcntl() except that it supports the new 64-bit file
locking commands (LINUX_F_GETLK64 etc) that use the `flock64'
structure. We had been interpreting all flock structures passed to
fcntl64() as `struct flock64' instead of only the ones from F_*64
commands.
The glibc in linux_base-7 uses fcntl64() by default, but the bug
was often non-fatal since the misinterpretation typically only
causes junk to appear in the `l_len' field and most junk values are
accepted as valid range lengths. The result is occasional EINVAL
errors from F_SETLK and a few bytes after the supplied `struct
flock' getting clobbered during F_GETLK.
PR: kern/37656
Reviewed by: marcel
Approved by: re
MFC after: 1 week
(1) Permit userland applications to request a change of label atomic
with an execve() via mac_execve(). This is required for the
SEBSD port of SELinux/FLASK. Attempts to invoke this without
MAC compiled in result in ENOSYS, as with all other MAC system
calls. Complexity, if desired, is present in policy modules,
rather than the framework.
(2) Permit policies to have access to both the label of the vnode
being executed as well as the interpreter if it's a shell
script or related UNIX nonsense. Because we can't hold both
vnode locks at the same time, cache the interpreter label.
SEBSD relies on this because it supports secure transitioning
via shell script executables. Other policies might want to
take both labels into account during an integrity or
confidentiality decision at execve()-time.
Approved by: re
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
describes an image activation instance. Instead, make use of the
existing fname structure entry, and introduce two new entries,
userspace_argv, and userspace_envv. With the addition of
mac_execve(), this divorces the image structure from the specifics
of the execve() system call, removes a redundant pointer, etc.
No semantic change from current behavior, but it means that the
structure doesn't depend on syscalls.master-generated includes.
There seems to be some redundant initialization of imgact entries,
which I have maintained, but which could probably use some cleaning
up at some point.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
It is never used. I left it there from pre-KSE days as I didn't know
if I'd need it or not but now I know I don't.. It's functionality
is in TDI_IWAIT in the thread.
This is for the not-quite-ready signal/fpu abi stuff. It may not see
the light of day, but I'm certainly not going to be able to validate it
when getting shot in the foot due to syscall number conflicts.
execve_secure() system call, which permits a process to pass in a label
for a label change during exec. This permits SELinux to change the
label for the resulting exec without a race following a manual label
change on the process. Because this interface uses our general purpose
MAC label abstraction, we call it execve_mac(), and wrap our port of
SELinux's execve_secure() around it with appropriate sid mappings.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
- add wrappers for mmap2(2) and ftruncate64(2) system calls;
- don't spam console with printf's when VFAT_READDIR_BOTH ioctl(2) is invoked;
- add support for SOUND_MIXER_READ_STEREODEVS ioctl(2);
- make msgctl(IPC_STAT) and IPC_SET actually working by converting from
BSD msqid_ds to Linux and vice versa;
- properly return EINVAL if semget(2) is called with nsems being negative.
Reviewed by: marcel
Approved by: marcel
Tested with: LSB runtime test
checks from the MAC tree: allow policies to perform access control
for the ability of a process to send and receive data via a socket.
At some point, we might also pass in additional address information
if an explicit address is requested on send.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
gets signals operating based on a TailQ, and is good enough to run X11,
GNOME, and do job control. There are some intricate parts which could be
more refined to match the sigset_t versions, but those require further
evaluation of directions in which our signal system can expand and contract
to fit our needs.
After this has been in the tree for a while, I will make in kernel API
changes, most notably to trapsignal(9) and sendsig(9), to use ksiginfo
more robustly, such that we can actually pass information with our
(queued) signals to the userland. That will also result in using a
struct ksiginfo pointer, rather than a signal number, in a lot of
kern_sig.c, to refer to an individual pending signal queue member, but
right now there is no defined behaviour for such.
CODAFS is unfinished in this regard because the logic is unclear in
some places.
Sponsored by: New Gold Technology
Reviewed by: bde, tjr, jake [an older version, logic similar]
constants VM_MIN_ADDRESS, VM_MAXUSER_ADDRESS, USRSTACK and PS_STRINGS.
This is mainly so that they can be variable even for the native abi, based
on different machine types. Get stack protections from the sysentvec too.
This makes it trivial to map the stack non-executable for certain abis, on
machines that support it.
NVIDIA API calls; more specifically, it adds an ioctl() handler for
the range of possible NVIDIA ioctl numbers.
Submitted by: Christian Zander <zander@minion.de>
available at module compile time. Do not #include the bogus
opt_kstack_pages.h at this point and instead refer to the variables that
are also exported via sysctl.
sysentvec. Initialized all fields of all sysentvecs, which will allow
them to be used instead of constants in more places. Provided stack
fixup routines for emulations that previously used the default.
linux_emul_find() that does not use stack gap storage but instead
always returns the resulting path in a malloc'd kernel buffer.
Implement linux_emul_find() in terms of this function. Also add
LCONVPATH* macros that wrap linux_emul_convpath in the same way
that the CHECKALT* macros wrap linux_emul_find().
compat code. Clean up accounting for multiple segments. Part 1/2.
Submitted by: Andrey Alekseyev <uitm@zenon.net> (with some modifications)
MFC after: 3 days
accept an 'active_cred' argument reflecting the credential of the thread
initiating the ioctl operation.
- Change fo_ioctl() to accept active_cred; change consumers of the
fo_ioctl() interface to generally pass active_cred from td->td_ucred.
- In fifofs, initialize filetmp.f_cred to ap->a_cred so that the
invocations of soo_ioctl() are provided access to the calling f_cred.
Pass ap->a_td->td_ucred as the active_cred, but note that this is
required because we don't yet distinguish file_cred and active_cred
in invoking VOP's.
- Update kqueue_ioctl() for its new argument.
- Update pipe_ioctl() for its new argument, pass active_cred rather
than td_ucred to MAC for authorization.
- Update soo_ioctl() for its new argument.
- Update vn_ioctl() for its new argument, use active_cred rather than
td->td_ucred to authorize VOP_IOCTL() and the associated VOP_GETATTR().
Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs
fo_read() and fo_write(): explicitly use the cred argument to fo_poll()
as "active_cred" using the passed file descriptor's f_cred reference
to provide access to the file credential. Add an active_cred
argument to fo_stat() so that implementers have access to the active
credential as well as the file credential. Generally modify callers
of fo_stat() to pass in td->td_ucred rather than fp->f_cred, which
was redundantly provided via the fp argument. This set of modifications
also permits threads to perform these operations on behalf of another
thread without modifying their credential.
Trickle this change down into fo_stat/poll() implementations:
- badfo_poll(), badfo_stat(): modify/add arguments.
- kqueue_poll(), kqueue_stat(): modify arguments.
- pipe_poll(), pipe_stat(): modify/add arguments, pass active_cred to
MAC checks rather than td->td_ucred.
- soo_poll(), soo_stat(): modify/add arguments, pass fp->f_cred rather
than cred to pru_sopoll() to maintain current semantics.
- sopoll(): moidfy arguments.
- vn_poll(), vn_statfile(): modify/add arguments, pass new arguments
to vn_stat(). Pass active_cred to MAC and fp->f_cred to VOP_POLL()
to maintian current semantics.
- vn_close(): rename cred to file_cred to reflect reality while I'm here.
- vn_stat(): Add active_cred and file_cred arguments to vn_stat()
and consumers so that this distinction is maintained at the VFS
as well as 'struct file' layer. Pass active_cred instead of
td->td_ucred to MAC and to VOP_GETATTR() to maintain current semantics.
- fifofs: modify the creation of a "filetemp" so that the file
credential is properly initialized and can be used in the socket
code if desired. Pass ap->a_td->td_ucred as the active
credential to soo_poll(). If we teach the vnop interface about
the distinction between file and active credentials, we would use
the active credential here.
Note that current inconsistent passing of active_cred vs. file_cred to
VOP's is maintained. It's not clear why GETATTR would be authorized
using active_cred while POLL would be authorized using file_cred at
the file system level.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs
make a series of modifications to the credential arguments relating
to file read and write operations to cliarfy which credential is
used for what:
- Change fo_read() and fo_write() to accept "active_cred" instead of
"cred", and change the semantics of consumers of fo_read() and
fo_write() to pass the active credential of the thread requesting
an operation rather than the cached file cred. The cached file
cred is still available in fo_read() and fo_write() consumers
via fp->f_cred. These changes largely in sys_generic.c.
For each implementation of fo_read() and fo_write(), update cred
usage to reflect this change and maintain current semantics:
- badfo_readwrite() unchanged
- kqueue_read/write() unchanged
pipe_read/write() now authorize MAC using active_cred rather
than td->td_ucred
- soo_read/write() unchanged
- vn_read/write() now authorize MAC using active_cred but
VOP_READ/WRITE() with fp->f_cred
Modify vn_rdwr() to accept two credential arguments instead of a
single credential: active_cred and file_cred. Use active_cred
for MAC authorization, and select a credential for use in
VOP_READ/WRITE() based on whether file_cred is NULL or not. If
file_cred is provided, authorize the VOP using that cred,
otherwise the active credential, matching current semantics.
Modify current vn_rdwr() consumers to pass a file_cred if used
in the context of a struct file, and to always pass active_cred.
When vn_rdwr() is used without a file_cred, pass NOCRED.
These changes should maintain current semantics for read/write,
but avoid a redundant passing of fp->f_cred, as well as making
it more clear what the origin of each credential is in file
descriptor read/write operations.
Follow-up commits will make similar changes to other file descriptor
operations, and modify the MAC framework to pass both credentials
to MAC policy modules so they can implement either semantic for
revocation.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs
SVR4 emulation relating to readdir() and fd_revoke(). All other
services appear to be implemented by simply wrapping existing
FreeBSD native system call implementations, so don't require local
instrumentation in the emulator module.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs
struct mount is not cached as *mp at this point, so use
vp->v_mount directly, following the check that it's non-NULL.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs
- v_vflag is protected by the vnode lock and is used when synchronization
with VOP calls is needed.
- v_iflag is protected by interlock and is used for dealing with vnode
management issues. These flags include X/O LOCK, FREE, DOOMED, etc.
- All accesses to v_iflag and v_vflag have either been locked or marked with
mp_fixme's.
- Many ASSERT_VOP_LOCKED calls have been added where the locking was not
clear.
- Many functions in vfs_subr.c were restructured to provide for stronger
locking.
Idea stolen from: BSD/OS
kernel access control.
Invoke appropriate MAC entry points for a number of VFS-related
operations in the Linux ABI module. In particular, handle uselib
in a manner similar to open() (more work is probably needed here),
as well as handle statfs(), and linux readdir()-like calls.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs
handler in the kernel at the same time. Also, allow for the
exec_new_vmspace() code to build a different sized vmspace depending on
the executable environment. This is a big help for execing i386 binaries
on ia64. The ELF exec code grows the ability to map partial pages when
there is a page size difference, eg: emulating 4K pages on 8K or 16K
hardware pages.
Flesh out the i386 emulation support for ia64. At this point, the only
binary that I know of that fails is cvsup, because the cvsup runtime
tries to execute code in pages not marked executable.
Obtained from: dfr (mostly, many tweaks from me).
The ability to schedule multiple threads per process
(one one cpu) by making ALL system calls optionally asynchronous.
to come: ia64 and power-pc patches, patches for gdb, test program (in tools)
Reviewed by: Almost everyone who counts
(at various times, peter, jhb, matt, alfred, mini, bernd,
and a cast of thousands)
NOTE: this is still Beta code, and contains lots of debugging stuff.
expect slight instability in signals..
there to protect fdrop() (which in turn can call vrele()), however,
fdrop_locked() grabs Giant for us, so we do not have to.
Reviewed by: jhb
Inspired by: alc
o Add a mutex (sb_mtx) to struct sockbuf. This protects the data in a
socket buffer. The mutex in the receive buffer also protects the data
in struct socket.
o Determine the lock strategy for each members in struct socket.
o Lock down the following members:
- so_count
- so_options
- so_linger
- so_state
o Remove *_locked() socket APIs. Make the following socket APIs
touching the members above now require a locked socket:
- sodisconnect()
- soisconnected()
- soisconnecting()
- soisdisconnected()
- soisdisconnecting()
- sofree()
- soref()
- sorele()
- sorwakeup()
- sotryfree()
- sowakeup()
- sowwakeup()
Reviewed by: alfred
pointer instead of a proc pointer and require the process pointed to
by the second argument to be locked. We now use the thread ucred reference
for the credential checks in p_can*() as a result. p_canfoo() should now
no longer need Giant.
yields incorrect behaviour. The hardwiring was present in the very
first commit that implemented msgrcv() (revision 1.4) and hasn't been
changed since. The native implementation was complete at that time,
so there doesn't seem to be a reason for the hardwiring from a
technical point of view.
Submitted by: Reinier Bezuidenhout <rbezuide@yahoo.com>
VOP_OPEN() and doing lots of manual checking. This would further
centralize use of the name functions, and once the MAC code is integrated,
meaning few extraneous MAC checks scattered all over the place. I don't
have time to fix this now, but want to make sure it doesn't get
forgotten. Anyone interested in fixing this should feel free.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs
without a few patches for the rest of the kernel to allow the image
activator to override exec_copyout_strings and setregs.
None of the syscall argument translation has been done. Possibly, this
translation layer can be shared with any platform that wants to support
running ILP32 binaries on an LP64 host (e.g. sparc32 binaries?)
is called.
- Change sysctl_out_proc() to require that the process is locked when it
is called and to drop the lock before it returns. If this proves too
complex we can change sysctl_out_proc() to simply acquire the lock at
the very end and have the calling code drop the lock right after it
returns.
- Lock the process we are going to export before the p_cansee() in the
loop in sysctl_kern_proc() and hold the lock until we call
sysctl_out_proc().
- Don't call p_cansee() on the process about to be exported twice in
the aforementioned loop.
they aren't in the usual path of execution for syscalls and traps.
The main complication for this is that we have to set flags to control
ast() everywhere that changes the signal mask.
Avoid locking in userret() in most of the remaining cases.
Submitted by: luoqi (first part only, long ago, reorganized by me)
Reminded by: dillon
general cleanup of the API. The entire API now consists of two functions
similar to the pre-KSE API. The suser() function takes a thread pointer
as its only argument. The td_ucred member of this thread must be valid
so the only valid thread pointers are curthread and a few kernel threads
such as thread0. The suser_cred() function takes a pointer to a struct
ucred as its first argument and an integer flag as its second argument.
The flag is currently only used for the PRISON_ROOT flag.
Discussed on: smp@
There is still some locations where the PROC lock should be held
in order to prevent inconsistent views from outside (like the
proc->p_fd fix for kern/vfs_syscalls.c:checkdirs()) that can be
fixed later.
Submitted by: Jonathan Mini <mini@haikugeek.com>
New locks are:
- pgrpsess_lock which locks the whole pgrps and sessions,
- pg_mtx which protects the pgrp members, and
- s_mtx which protects the session members.
Please refer to sys/proc.h for the coverage of these locks.
Changes on the pgrp/session interface:
- pgfind() needs the pgrpsess_lock held.
- The caller of enterpgrp() is responsible to allocate a new pgrp and
session.
- Call enterthispgrp() in order to enter an existing pgrp.
- pgsignal() requires a pgrp lock held.
Reviewed by: jhb, alfred
Tested on: cvsup.jp.FreeBSD.org
(which is a quad-CPU machine running -current)
this is a low-functionality change that changes the kernel to access the main
thread of a process via the linked list of threads rather than
assuming that it is embedded in the process. It IS still embeded there
but remove all teh code that assumes that in preparation for the next commit
which will actually move it out.
Reviewed by: peter@freebsd.org, gallatin@cs.duke.edu, benno rice,
interfaces we encounter. In Linux, all addresses are returned for
which gifconf handlers are installed. This boils down to AF_DECnet
and AF_INET. We care mostly about AF_INET for now. Adding additional
families is simple enough.
Returning the addresses is important for RPC clients to function
properly. Andrew found in some reference code that the logic that
handles the retransmission looks for an interface that's up and has
an AF_INET address. This obviously failed as we didn't return any
addresses at all.
Note also that with this change we don't return interfaces that don't
have AF_INET addresses, whereas before we returned any interface
present in the system. This is in line with what Linux does (modulo
interfaces with only AF_DECnet addresses of course :-)
Reported by: "Andrew Atrens" <atrens@nortelnetworks.com>
MFC after: 1 week
getpid, getuid, getgid and pipe, since they bootstrapped from
OSF/1 and never cleaned up. Switch to the native syscalls
on alpha so that the above functions work
MFC after: 7 days
itself, it's used outside the Linuxulator. Reimplement the
function so that its behaviour matches the current renaming
scheme. It's probably better to formalize these interdependencies.
Seigo Tanimura (tanimura) posted the initial delta.
I've polished it quite a bit reducing the need for locking and
adapting it for KSE.
Locks:
1 mutex in each filedesc
protects all the fields.
protects "struct file" initialization, while a struct file
is being changed from &badfileops -> &pipeops or something
the filedesc should be locked.
1 mutex in each struct file
protects the refcount fields.
doesn't protect anything else.
the flags used for garbage collection have been moved to
f_gcflag which was the FILLER short, this doesn't need
locking because the garbage collection is a single threaded
container.
could likely be made to use a pool mutex.
1 sx lock for the global filelist.
struct file * fhold(struct file *fp);
/* increments reference count on a file */
struct file * fhold_locked(struct file *fp);
/* like fhold but expects file to locked */
struct file * ffind_hold(struct thread *, int fd);
/* finds the struct file in thread, adds one reference and
returns it unlocked */
struct file * ffind_lock(struct thread *, int fd);
/* ffind_hold, but returns file locked */
I still have to smp-safe the fget cruft, I'll get to that asap.
properly translate the interface name passed to us, make sure
we also translate correctly before we return the list of
interfaces with the SIOCGIFCONF ioctl. It is common to use
the interface names returned by that ioctl in further ioctls,
such as SIOCGIFFLAGS.
Remove linux_ifname as it is no longer used. Also remove
ifname_bsd_to_linux as it cannot be used anymore now that
linux_ifname is removed (was deadcode anyway).
Reported and tested by: Andrew Atrens <atrens@nortelnetworks.com>
use the internal index number as the unit number to compare with.
The first ethernet interface in Linux is called "eth0", whereas
our internal index starts wth 1 and is not unique to ethernet
interfaces (lo0 has index 1 for example). Instead, use a function-
local index number that starts with 0 and is incremented only
for ethernet interfaces. This way the unit number will match the
n-th ethernet interface in the system, which is exactly what it
means in Linux.
Tested by: Glenn Johnson <gjohnson@srrc.ars.usda.gov>
MFC after: 3 days
mutable contents of struct prison (hostname, securelevel, refcount,
pr_linux, ...)
o Generally introduce mtx_lock()/mtx_unlock() calls throughout kern/
so as to enforce these protections, in particular, in kern_mib.c
protection sysctl access to the hostname and securelevel, as well as
kern_prot.c access to the securelevel for access control purposes.
o Rewrite linux emulator abstractions for accessing per-jail linux
mib entries (osname, osrelease, osversion) so that they don't return
a pointer to the text in the struct linux_prison, rather, a copy
to an array passed into the calls. Likewise, update linprocfs to
use these primitives.
o Update in_pcb.c to always use prison_getip() rather than directly
accessing struct prison.
Reviewed by: jhb
to the code for translating socket and private ioctls:
- Only perform socket ioctl translation if the file descriptor is a
socket.
- Treat socket ioctls on non-sockets specially, and for now assume
that these are directed at a tap/vmnet device, so translate the
ioctl numbers as appropriate (the way if_tap abuses some socket
ioctls to pass non-ifreq data is utterly bogus, but this is how
VMware on FreeBSD has always "worked"; I will deal with this
later).
- Add (untested) support for translating SIOCSIFADDR.
- In all cases where we fail to translate an ioctl, return ENOIOCTL
so that other handlers have a chance to do the translation.
This should fix the "/dev/vmnet1: Invalid argument" errors that
users of VMware were experiencing, though I have only verified this
on RELENG_4.
Submitted by: des (mostly)
MFC after: 3 days
vnodes. This will hopefully serve as a base from which we can
expand the MP code. We currently do not attempt to obtain any
mutex or SX locks, but the door is open to add them when we nail
down exactly how that part of it is going to work.
sysctl_req', which describes in-progress sysctl requests. This permits
sysctl handlers to have access to the current thread, permitting work
on implementing td->td_ucred, migration of suser() to using struct
thread to derive the appropriate ucred, and allowing struct thread to be
passed down to other code, such as network code where td is not currently
available (and curproc is used).
o Note: netncp and netsmb are not updated to reflect this change, as they
are not currently KSE-adapted.
Reviewed by: julian
Obtained from: TrustedBSD Project
linux_emul_path anyway. Linux_emul_find() has interesting bugs in its
prefix handling (which luckily are not currently exploitable); this
commit is preliminary to an attempt at cleaning it up.
Approved by: marcel
in bind() and connect(). Linux doesn't care if the length of the
sockaddr matches its address family; FreeBSD does. This fixes the
known issues with the resolver in linux_base-7.
which may or may not return something which is partially right.
Disable the "devices" file until we find out what this is needed for,
and what exactly those apps need.
This will allow cdevsw to become static again.
Approved by: DES
Add some missing break statements in the socket ioctl switch.
Check the return value from copyin() / copyout().
Fix some disorderings and misindentations.
Support a couple more socket ioctls.
Add missing break statements.
that appeared to be very different from the MI version. These
differences were mostly bogus and caused by copying octal
definitions and write them as hexadecimal values without doing
any base conversion (ie 010 was copied to 0x10). After filtering
out these differences, any remaining (real) incompatibilities
have been merged into the MI header file to make them more visible.
While here, fix the termios <-> termio conversion WRT to the c_cc
field for Alpha. The termios values do not match the termio values
and thus prevents us from copying.
By eliminating the Alpha MD copy of linux_ioctl.h we also fixed
the recent build breakage caused by putting new bits in the MI
header and not in the MD header.
Also slightly change the name translation policy - only rename interfaces
that have the IFF_BROADCAST flag set. This is not perfect, but is closer to
how Linux names network interfaces.
to work, but haven't really due to subtle differences in structs etc.
This is still not perfect (some ioctls are still known not to work, while
others haven't been tested at all), but it's enough to get Debian's ifconfig
to produce relatively sane output.
More work will be needed to get all ioctls (or at least a reasonable subset)
working, and to support the Cisco Aironet config tool mentioned in the PR.
PR: 26546
Submitted by: Doug Ambrisko <ambrisko@ambrisko.com>
We still have to account for a copyin. Make sure the copyin will
succeed by passing the FreeBSD syscall a pointer to userspace,
albeit one that's automagically mapped into kernel space.
Reported by: mr, Mitsuru IWASAKI <iwasaki@jp.FreeBSD.org>
Tested by: Mitsuru IWASAKI <iwasaki@jp.FreeBSD.org>
macro. The commit log clearly states that the index given to the
macro is one higher than previously used to index the array. This
wasn't represented in the code and resulted in kernel page faults.
Reported by: Andrew Atrens <atrens@nortelnetworks.com>
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.
Sorry john! (your next MFC will be a doosie!)
Reviewed by: peter@freebsd.org, dillon@freebsd.org
X-MFC after: ha ha ha ha