Commit Graph

249 Commits

Author SHA1 Message Date
Konstantin Belousov
ab02d85f0d Map PIE binaries at non-zero base address.
Discussed with:	bz
Reviewed by:	kan
Tested by:	bz (i386, amd64), bsam (linux)
MFC after:	some time
2009-10-10 15:33:01 +00:00
Konstantin Belousov
5b33842a9b Do not map segments of zero length.
Discussed with:	bz
Reviewed by:	kan
Tested by:	bz (i386, amd64), bsam (linux)
MFC after:	some time
2009-10-10 15:28:52 +00:00
Bjoern A. Zeeb
925c8b5b04 Print a warning in case we cannot add more brandinfo because
we would overflow the MAX_BRANDS sized array.

Reviewed by:	kib
MFC After:	1 month
2009-10-03 10:50:00 +00:00
Bjoern A. Zeeb
ecc2fda872 Make sure FreeBSD binaries without .note.ABI-tag section work
correctly and do not match a colliding Debian GNU/kFreeBSD
brandinfo statements.
For this mark the Debian GNU/kFreeBSD brandinfo that it must have
an .note.ABI-tag section and ignore the old EI_OSABI brandinfo
when comparing a possibly colliding set of options.

Due to SYSINIT we add the brandinfo in a non-deterministic order,
so native FreeBSD is not always first. We may want to consider
to force native FreeBSD to come first as well.

The only way a problem could currently be noticed is when running an
i386 binary without the .note.ABI-tag on amd64 and the Debian GNU/kFreeBSD
brandinfo  was matched first,  as the fallback to ld-elf32.so.1 does
not exist in that case.

Reported and tested by:	ticso
In collaboration with:	kib
MFC after:		3 days
2009-08-30 14:38:17 +00:00
Bjoern A. Zeeb
89ffc202d6 Fix handling of .note.ABI-tag section for GNU systems [1].
Handle GNU/Linux according to LSB Core Specification 4.0,
Chapter 11. Object Format, 11.8. ABI note tag.

Also check the first word of desc, not only name, according to
glibc abi-tags specification to distinguish between Linux and
kFreeBSD.

Add explicit handling for Debian GNU/kFreeBSD, which runs
on our kernels as well [2].

In {amd64,i386}/trap.c, when checking osrel of the current process,
also check the ABI to not change the signal behaviour for Linux
binary processes, now that we save an osrel version for all three
from the lists above in struct proc [2].

These changes make it possible to run FreeBSD, Debian GNU/kFreeBSD
and Linux binaries on the same machine again for at least i386 and
amd64, and no longer break kFreeBSD which was detected as GNU(/Linux).

PR:		kern/135468
Submitted by:	dchagin [1] (initial patch)
Suggested by:	kib [2]
Tested by:	Petr Salinger (Petr.Salinger seznam.cz) for kFreeBSD
Reviewed by:	kib
MFC after:	3 days
2009-08-24 16:19:47 +00:00
Dmitry Chagin
cd899aad76 Fix KBI breakage by r190520 which affects older linux.ko binaries:
1) Move the new field (brand_note) to the end of the Brandinfo structure.
2) Add a new flag BI_BRAND_NOTE that indicates that the brand_note pointer
   is valid.
3) Use the brand_note field if the flag BI_BRAND_NOTE is set and as old
   modules won't have the flag set, so the new field brand_note would be
   ignored.

Suggested by:	jhb
Reviewed by:	jhb
Approved by:	kib (mentor)
MFC after:	6 days
2009-04-05 09:27:19 +00:00
Konstantin Belousov
267c52fc98 Fix several issues with parsing the notes for ELF objects.
Badly formed ELF note may cause the caclulated pointer to the next note
to point both after the note region, that was checked in the code, but
also to point before the region, that was not checked [1]. Remember the
first note location in note0 and leap out if the note is not between
note0 and note_end.

In the similar way, badly formed note may cause infinite loop by
pointing next note into the same or previous note. Guard against this by
limiting amount of loop iterations by arbitrary choosen big number.

For clarity, check the calculated note alignment in each iteration.

Reported by:	Chris Palmer <chris noncombatant org> [1]
PR:	kern/132886
Reviewed and tested by:	dchagin
MFC after:	3 days
2009-03-22 13:42:41 +00:00
Konstantin Belousov
3ff063577b Supply AT_EXECPATH auxinfo entry to the interpreter, both for native and
compat32 binaries.

Tested by:	pho
Reviewed by:	kan
2009-03-17 12:53:28 +00:00
Konstantin Belousov
429f5a589b Use the properly sized types for ELF object header and program headers.
This fixes osrel fetching from the FreeBSD branding note for the 64bit
platforms.

Reported by:	swell.k gmail com
Reviewed by:	dchagin
Tested by:	dchagin, swell.k gmail com
2009-03-17 09:50:40 +00:00
Dmitry Chagin
32c01de21c Implement new way of branding ELF binaries by looking to a
".note.ABI-tag" section.

The search order of a brand is changed, now first of all the
".note.ABI-tag" is looked through.

Move code which fetch osreldate for ELF binary to check_note() handler.

PR:		118473
Approved by:	kib (mentor)
2009-03-13 16:40:51 +00:00
Robert Watson
95c807cf5e When a statically linked binary is executed (or at least, one without
an interpreter definition in its program header), set the auxiliary
ELF argument AT_BASE to 0 rather than to the address that we would
have mapped the interpreter at if there had been one.

The ELF ABI specifications appear to be ambiguous as to the desired
behavior in this situation, as they define AT_BASE as the base address
of the interpreter, but do not mention what to do if there is none.
On Solaris, AT_BASE will be set to the base address of the static
binary if there is no interpreter, and on Linux, AT_BASE is set to 0.
We go with the Linux semantics as they are of more immediate utility
and allow the early runtime environment to know that the kernel has
not mapped an interpreter, but because AT_PHDR points at the ELF
header for the running binary, it is still possible to retrieve all
required mapping information when the process starts should it be
required.  Either approach would be preferable to our current behavior
of passing a pointer to an unmapped region of user memory as AT_BASE.

MFC after:	3 weeks
2009-01-25 12:07:43 +00:00
Peter Wemm
a3ac8c94cf Remove sysctl debug.elf_trace and the trace field in auxargs. They go
nowhere.  It used to be the equivalent of $LD_DEBUG in rtld-elf.
Elf_Auxargs is an internal structure.
2008-12-17 16:54:29 +00:00
Warner Losh
35c2a5a852 Minor style(9) nit. 2008-12-17 16:25:20 +00:00
Konstantin Belousov
6f3475454e Remove two remnant uses of AT_DEBUG. 2008-12-17 13:13:35 +00:00
Konstantin Belousov
387ad99800 If the ABI-overriden interpreter was not loaded, do not set
have_interp to TRUE. This allows the code in image activator to try
/libexec/ld-elf.so.1 as interpreter when newinterp is not found to
execute.

Reviewed by:	peter
MFC after:	2 weeks (together with r175105)
2008-10-08 11:11:36 +00:00
John Baldwin
ccd3953e5f Go back to using the process command name (p_comm) for the file name and
command line arguments stored in the note at the beginning of a core dump
instead of the current thread name.

Reviewed by:	julian
2008-05-15 03:07:34 +00:00
Jeff Roberson
6617724c5f Remove kernel support for M:N threading.
While the KSE project was quite successful in bringing threading to
FreeBSD, the M:N approach taken by the kse library was never developed
to its full potential.  Backwards compatibility will be provided via
libmap.conf for dynamically linked binaries and static binaries will
be broken.
2008-03-12 10:12:01 +00:00
Attilio Rao
22db15c06f VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in
conjuction with 'thread' argument passing which is always curthread.
Remove the unuseful extra-argument and pass explicitly curthread to lower
layer functions, when necessary.

KPI results broken by this change, which should affect several ports, so
version bumping and manpage update will be further committed.

Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>
2008-01-13 14:44:15 +00:00
Attilio Rao
cb05b60a89 vn_lock() is currently only used with the 'curthread' passed as argument.
Remove this argument and pass curthread directly to underlying
VOP_LOCK1() VFS method. This modify makes the code cleaner and in
particular remove an annoying dependence helping next lockmgr() cleanup.
KPI results, obviously, changed.

Manpage and FreeBSD_version will be updated through further commits.

As a side note, would be valuable to say that next commits will address
a similar cleanup about VFS methods, in particular vop_lock1 and
vop_unlock.

Tested by:	Diego Sardina <siarodx at gmail dot com>,
		Andrea Di Pasquale <whyx dot it at gmail dot com>
2008-01-10 01:10:58 +00:00
Peter Wemm
4113f8d741 Fall back to the binary-specified interpreter (ld-elf.so.1) if the
ABI override binary isn't found.  This could probably be smoother, but
it is what I did in p4 change #126891 on 2007/09/27.  It should solve
the "ld-elf32.so.1"-in-chroot problem.
2008-01-05 08:35:56 +00:00
Konstantin Belousov
f231de478e Implement fetching of the __FreeBSD_version from the ELF ABI-tag note.
The value is read into the p_osrel member of the struct proc. p_osrel
is set to 0 for the binaries without the note.

MFC after:	3 days
2007-12-04 12:28:07 +00:00
Konstantin Belousov
93d1c72883 Check for the program headers alignment of the ELF images before
dereferencing. Unaligned access could cause panic on strict alignment
architectures.

Reviewed by:	marcel, marius (also tested on sparc64, thanks !)
MFC after:	3 days
2007-12-04 12:21:27 +00:00
Julian Elischer
e01eafef2a A bunch more files that should probably print out a thread name
instead of a process name.
2007-11-14 06:51:33 +00:00
Konstantin Belousov
89b57fcf01 Fix for the panic("vm_thread_new: kstack allocation failed") and
silent NULL pointer dereference in the i386 and sparc64 pmap_pinit()
when the kmem_alloc_nofault() failed to allocate address space. Both
functions now return error instead of panicing or dereferencing NULL.

As consequence, vmspace_exec() and vmspace_unshare() returns the errno
int. struct vmspace arg was added to vm_forkproc() to avoid dealing
with failed allocation when most of the fork1() job is already done.

The kernel stack for the thread is now set up in the thread_alloc(),
that itself may return NULL. Also, allocation of the first process
thread is performed in the fork1() to properly deal with stack
allocation failure. proc_linkup() is separated into proc_linkup()
called from fork1(), and proc_linkup0(), that is used to set up the
kernel process (was known as swapper).

In collaboration with:	Peter Holm
Reviewed by:	jhb
2007-11-05 11:36:16 +00:00
John Baldwin
19059a13ed Rework the support for ABIs to override resource limits (used by 32-bit
processes under 64-bit kernels).  Previously, each 32-bit process overwrote
its resource limits at exec() time.  The problem with this approach is that
the new limits affect all child processes of the 32-bit process, including
if the child process forks and execs a 64-bit process.  To fix this, don't
ovewrite the resource limits during exec().  Instead, sv_fixlimits() is
now replaced with a different function sv_fixlimit() which asks the ABI to
sanitize a single resource limit.  We then use this when querying and
setting resource limits.  Thus, if a 32-bit process sets a limit, then
that new limit will be inherited by future children.  However, if the
32-bit process doesn't change a limit, then a future 64-bit child will
see the "full" 64-bit limit rather than the 32-bit limit.

MFC is tentative since it will break the ABI of old linux.ko modules (no
other modules are affected).

MFC after:	1 week
2007-05-14 22:40:04 +00:00
Xin LI
4f506694bb Use FOREACH_PROC_IN_SYSTEM instead of using its unrolled form. 2007-01-17 14:58:53 +00:00
Alan Cox
976a87a284 Add vm map and object locking to each_writable_segment().
Noticed by: jhb@
MFC after: 3 weeks
2006-11-19 23:38:59 +00:00
Alan Cox
e5e6093ba9 Avoid a vm object reference leak in a rarely used code path.
An executable contains at most one PT_INTERP program header.  Therefore,
the loop that searches for it can terminate after it is found rather than
iterating over the entire set of program headers.

Eliminate an unneeded initialization.

Reviewed by: tegge
2006-01-21 20:11:49 +00:00
Maxim Sobolev
d49b21093c Fix breakage introduced in the previous commit. 2005-12-26 22:32:52 +00:00
Maxim Sobolev
900b28f9f6 Remove kern.elf32.can_exec_dyn sysctl. Instead extend Brandinfo structure
with flags bitfield and set BI_CAN_EXEC_DYN flag for all brands that usually
allow executing elf dynamic binaries (aka shared libraries). When it is
requested to execute ET_DYN elf image check if this flag is on after we
know the elf brand allowing execution if so.

PR:		kern/87615
Submitted by:	Marcin Koziej <creep@desk.pl>
2005-12-26 21:23:57 +00:00
Alan Cox
60bb39431a Maintain the lock on the vnode for most of exec_elfN_imgact().
Specifically, it is required for the I/O that may be performed by
elfN_load_section().

Avoid an obscure deadlock in the a.out, elf, and gzip image
activators.  Add a comment describing why the deadlock does not occur
in the common case and how it might occur in less usual circumstances.

Eliminate an unused variable from exec_aout_imgact().

In collaboration with: tegge
2005-12-24 04:57:50 +00:00
Alan Cox
373d1a3f8c Maintain the vnode lock throughout elfN_load_file() rather than releasing
it and reacquiring it in vrele().  Consequently, there is no reason to
increase the reference count on the vm object caching the file's pages.
Reviewed by: tegge

Eliminate unused parameters to elfN_load_file().
2005-12-21 18:58:40 +00:00
Alan Cox
ff6f03c7cd Eliminate an unneeded (vm_prot_t) parameter from two functions. Eliminate
unnecessary uses of a local variable.

Reviewed by: tegge
2005-12-20 23:42:18 +00:00
Alan Cox
044bbbb523 Correct a long-standing problem in elfN_map_insert(): In order to copy a
page to user space, the user space mapping must allow write access.

In collaboration with: tegge@
MFC after: 3 weeks
2005-12-17 19:40:47 +00:00
Alan Cox
584716b08a Style: The second argument to vm_map_find() should be NULL instead of 0. 2005-12-16 19:14:25 +00:00
Alan Cox
da61b9a69e Use sf_buf_alloc() instead of vm_map_find() on exec_map to create the
ephemeral mappings that are used as the source for three copy
operations from kernel space to user space.  There are two reasons for
making this change: (1) Under heavy load exec_map can fill up causing
vm_map_find() to fail.  When it fails, the nascent process is aborted
(SIGABRT).  Whereas, this reimplementation using sf_buf_alloc()
sleeps.  (2) Although it is possible to sleep on vm_map_find()'s
failure until address space becomes available (see kmem_alloc_wait()),
using sf_buf_alloc() is faster.  Furthermore, the reimplementation
uses a CPU private mapping, avoiding a TLB shootdown on
multiprocessors.

Problem uncovered by: kris@
Reviewed by: tegge@
MFC after: 3 weeks
2005-12-16 18:34:14 +00:00
Olivier Houchard
481a1fe19e Add a new sysctl, kern.elf[32|64].can_exec_dyn. When set to 1, one can
execute a ET_DYN binary (shared object).
This does not make much sense, but some linux scripts expect to be able to
execute /lib/ld-linux.so.2 (ldd comes to mind).
The sysctl defaults to 0.

MFC after:	3 days
2005-11-14 22:24:00 +00:00
Robert Watson
5f419982c2 Back out alpha/alpha/trap.c:1.124, osf1_ioctl.c:1.14, osf1_misc.c:1.57,
osf1_signal.c:1.41, amd64/amd64/trap.c:1.291, linux_socket.c:1.60,
svr4_fcntl.c:1.36, svr4_ioctl.c:1.23, svr4_ipc.c:1.18, svr4_misc.c:1.81,
svr4_signal.c:1.34, svr4_stat.c:1.21, svr4_stream.c:1.55,
svr4_termios.c:1.13, svr4_ttold.c:1.15, svr4_util.h:1.10,
ext2_alloc.c:1.43, i386/i386/trap.c:1.279, vm86.c:1.58,
unaligned.c:1.12, imgact_elf.c:1.164, ffs_alloc.c:1.133:

Now that Giant is acquired in uprintf() and tprintf(), the caller no
longer leads to acquire Giant unless it also holds another mutex that
would generate a lock order reversal when calling into these functions.
Specifically not backed out is the acquisition of Giant in nfs_socket.c
and rpcclnt.c, where local mutexes are held and would otherwise violate
the lock order with Giant.

This aligns this code more with the eventual locking of ttys.

Suggested by:	bde
2005-09-28 07:03:03 +00:00
Robert Watson
84d2b7df26 Add GIANT_REQUIRED and WITNESS sleep warnings to uprintf() and tprintf(),
as they both interact with the tty code (!MPSAFE) and may sleep if the
tty buffer is full (per comment).

Modify all consumers of uprintf() and tprintf() to hold Giant around
calls into these functions.  In most cases, this means adding an
acquisition of Giant immediately around the function.  In some cases
(nfs_timer()), it means acquiring Giant higher up in the callout.

With these changes, UFS no longer panics on SMP when either blocks are
exhausted or inodes are exhausted under load due to races in the tty
code when running without Giant.

NB: Some reduction in calls to uprintf() in the svr4 code is probably
desirable.

NB: In the case of nfs_timer(), calling uprintf() while holding a mutex,
or even in a callout at all, is a bad idea, and will generate warnings
and potential upset.  This needs to be fixed, but was a problem before
this change.

NB: uprintf()/tprintf() sleeping is generally a bad ideas, as is having
non-MPSAFE tty code.

MFC after:	1 week
2005-09-19 16:51:43 +00:00
Christian S.J. Peron
68ff2a4397 Improve the MP safeness associated with the creation of symbolic
links and the execution of ELF binaries. Two problems were found:

1) The link path wasn't tagged as being MP safe and thus was not properly
   protected.
2) The ELF interpreter vnode wasnt being locked in namei(9) and thus was
   insufficiently protected.

This commit makes the following changes:

-Sets the MPSAFE flag in NDINIT for symbolic link paths
-Sets the MPSAFE flag in NDINIT and introduce a vfslocked variable which
 will be used to instruct VFS_UNLOCK_GIANT to unlock Giant if it has been
 picked up.
-Drop in an assertion into vfs_lookup which ensures that if the MPSAFE
 flag is NOT set, that we have picked up giant. If not panic (if WITNESS
 compiled into the kernel). This should help us find conditions where vnode
 operations are in-sufficiently protected.

This is a RELENG_6 candidate.

Discussed with:	jeff
MFC after:	4 days
2005-09-15 15:03:48 +00:00
Peter Wemm
62919d788b Jumbo-commit to enhance 32 bit application support on 64 bit kernels.
This is good enough to be able to run a RELENG_4 gdb binary against
a RELENG_4 application, along with various other tools (eg: 4.x gcore).
We use this at work.

ia32_reg.[ch]: handle the 32 bit register file format, used by ptrace,
	procfs and core dumps.
procfs_*regs.c: vary the format of proc/XXX/*regs depending on the client
	and target application.
procfs_map.c: Don't print a 64 bit value to 32 bit consumers, or their
	sscanf fails.  They expect an unsigned long.
imgact_elf.c: produce a valid 32 bit coredump for 32 bit apps.
sys_process.c: handle 32 bit consumers debugging 32 bit targets.  Note
	that 64 bit consumers can still debug 32 bit targets.

IA64 has got stubs for ia32_reg.c.

Known limitations: a 5.x/6.x gdb uses get/setcontext(), which isn't
implemented in the 32/64 wrapper yet.  We also make a tiny patch to
gdb pacify it over conflicting formats of ld-elf.so.1.

Approved by:	re
2005-06-30 07:49:22 +00:00
Olivier Houchard
77f30ffff8 Don't set the default of kern.fallback_elf_brand to FreeBSD for arm, as
binutils now do the job for us
2005-05-24 22:21:44 +00:00
Jeff Roberson
e11a45c9e5 - Neither of our image formats require Giant now that the vm and vfs have
been locked.
2005-05-03 10:51:38 +00:00
Alan Cox
9f65fb13aa Remove GIANT_REQUIRED from elfN_load_section(). 2005-04-03 07:57:47 +00:00
Maxim Sobolev
610ecfe035 o Split out kernel part of execve(2) syscall into two parts: one that
copies arguments into the kernel space and one that operates
  completely in the kernel space;

o use kernel-only version of execve(2) to kill another stackgap in
  linuxlator/i386.

Obtained from:  DragonFlyBSD (partially)
MFC after:      2 weeks
2005-01-29 23:12:00 +00:00
Poul-Henning Kamp
8516dd18e1 Don't use VOP_GETVOBJECT, use vp->v_object directly. 2005-01-25 00:40:01 +00:00
Olivier Houchard
e0370a187c On arm, set the default elf brand to FreeBSD, until the binutils do it for us. 2004-09-23 23:29:24 +00:00
Marcel Moolenaar
4da47b2fec Add __elfN(dump_thread). This function is called from __elfN(coredump)
to allow dumping per-thread machine specific notes. On ia64 we use this
function to flush the dirty registers onto the backingstore before we
write out the PRSTATUS notes.

Tested on: alpha, amd64, i386, ia64 & sparc64
Not tested on: arm, powerpc
2004-08-11 02:35:06 +00:00
Doug Rabson
cfaf7e60cc Make sure that AT_PHDR has a useful value even for static programs. 2004-08-08 09:48:10 +00:00
Marcel Moolenaar
1f7a1baa37 After maintaining previous behaviour in writing out the core notes, it's
time now to break with the past: do not write the PID in the first note.
Rationale:
1.  [impact of the breakage] Process IDs in core files serve no immediate
    purpose to the debugger itself. They are only useful to relate a core
    file to a process. This can provide context to the person looking at
    the core file, provided one keeps track of this. Overall, not having
    the PID in the core file is only in very rare occasions unfortunate.
2.  [reason of the breakage] Having one PRSTATUS note contain the PID,
    while all others contain the LWPID of the corresponding kernel thread
    creates an irregularity for the debugger that cannot easily be worked
    around. This is caused by libthread_db correlating user thread IDs to
    kernel thread (aka LWP) IDs and thus aware of the actual LWPIDs.

Update comments accordingly.
2004-07-18 20:28:07 +00:00
Marcel Moolenaar
247aba2474 Allocate TIDs in thread_init() and deallocate them in thread_fini().
The overhead of unconditionally allocating TIDs (and likewise,
unconditionally deallocating them), is amortized across multiple
thread creations by the way UMA makes it possible to have type-stable
storage.
Previously the cost was kept down by having threads created as part
of a fork operation use the process' PID as the TID. While this had
some nice properties, it also introduced complexity in the way TIDs
were allocated. Most importantly, by using the type-stable storage
that UMA gives us this was also unnecessary.

This change affects how core dumps are created and in particular how
the PRSTATUS notes are dumped. Since we don't have a thread with a
TID equalling the PID, we now need a different way to preserve the
old and previous behavior. We do this by having the given thread (i.e.
the thread passed to the core dump code in td) dump it's state first
and fill in pr_pid with the actual PID. All other threads will have
pr_pid contain their TIDs. The upshot of all this is that the debugger
will now likely select the right LWP (=TID) as the initial thread.

Credits to: julian@ for spotting how we can utilize UMA.
Thanks to: all who provided julian@ with test results.
2004-06-26 18:58:22 +00:00
Tim J. Robbins
f99619a0dc Change the types of vn_rdwr_inchunks()'s len and aresid arguments to
size_t and size_t *, respectively. Update callers for the new interface.
This is a better fix for overflows that occurred when dumping segments
larger than 2GB to core files.
2004-06-05 02:18:28 +00:00
Tim J. Robbins
2b471bc616 Back out workaround for vn_rdwr_inchunks()'s INT_MAX length limitation
after discussions with bde; vn_rdwr_inchunks() itself should be fixed.
2004-06-05 02:00:12 +00:00
Tim J. Robbins
16e6d16299 Write segments to core dump files in maximally-sized chunks that neither
exceed vn_rdwr_inchunks()'s INT_MAX length limitation nor span a block
boundary. This fixes dumping segments larger than 2GB.

PR:	67546
2004-06-04 06:30:16 +00:00
Alan Cox
59c8bc40ce Utilize sf_buf_alloc() rather than pmap_qenter() (and sometimes
kmem_alloc_wait()) for mapping the image header.  On all machines with a
direct virtual-to-physical mapping and SMP/HTT i386s, this is a clear win.
2004-04-23 03:01:40 +00:00
Marcel Moolenaar
ece267ba58 Do not assume that the initial thread (i.e. the thread with the ID
equal to the process ID) is still present when we dump a core. It
already may have been destroyed. In that case we would end up
dereferencing a NULL pointer, so specifically test for that as well.

Reported & tested by: Dan Nelson <dnelson@allantgroup.com>
2004-04-08 06:37:00 +00:00
Marcel Moolenaar
8c9b7b2c84 Create NT_PRSTATUS and NT_FPREGSET notes for each and every thread
in the process. This is required for proper debugging of corefiles
created by 1:1 or M:N threaded processes. Add an XXX comment where
we should actually call a function that dumps MD specific notes.
An example of a MD specific note is the NT_PRXFPREG note for SSE
registers.

Since BFD creates non-annotated pseudo-sections for the first PRSTATUS
and FPREGSET notes (non-annotated in the sense that the name of the
section does not contain the pid/tid), make sure those sections describe
the initial thread of the process (i.e. the thread which tid equals the
pid). This is not strictly necessary, but makes sure that tools that use
the non-annotated section names will not change behaviour due to this
change.

The practical upshot of this all is that one can see the threads in
the debugger when looking at a corefile. For 1:1 threading this means
that *all* threads are visible.
2004-04-03 20:25:41 +00:00
Jacques Vidrine
3dc19c4677 Verify more bits of the ELF header: the program header table
entry size and the ELF version.  Also, avoid a potential integer
overflow when determining whether the ELF header fits entirely
within the first page.

Reviewed by:	jdp

A panic when attempting to execute an ELF binary with a bogus program
header table entry size was

Reported by:	Christer Öberg <christer.oberg@texonet.com>
2004-03-18 16:33:05 +00:00
John Baldwin
91d5354a2c Locking for the per-process resource limits structure.
- struct plimit includes a mutex to protect a reference count.  The plimit
  structure is treated similarly to struct ucred in that is is always copy
  on write, so having a reference to a structure is sufficient to read from
  it without needing a further lock.
- The proc lock protects the p_limit pointer and must be held while reading
  limits from a process to keep the limit structure from changing out from
  under you while reading from it.
- Various global limits that are ints are not protected by a lock since
  int writes are atomic on all the archs we support and thus a lock
  wouldn't buy us anything.
- All accesses to individual resource limits from a process are abstracted
  behind a simple lim_rlimit(), lim_max(), and lim_cur() API that return
  either an rlimit, or the current or max individual limit of the specified
  resource from a process.
- dosetrlimit() was renamed to kern_setrlimit() to match existing style of
  other similar syscall helper functions.
- The alpha OSF/1 compat layer no longer calls getrlimit() and setrlimit()
  (it didn't used the stackgap when it should have) but uses lim_rlimit()
  and kern_setrlimit() instead.
- The svr4 compat no longer uses the stackgap for resource limits calls,
  but uses lim_rlimit() and kern_setrlimit() instead.
- The ibcs2 compat no longer uses the stackgap for resource limits.  It
  also no longer uses the stackgap for accessing sysctl's for the
  ibcs2_sysconf() syscall but uses kernel_sysctl() instead.  As a result,
  ibcs2_sysconf() no longer needs Giant.
- The p_rlimit macro no longer exists.

Submitted by:	mtm (mostly, I only did a few cleanups and catchups)
Tested on:	i386
Compiled on:	alpha, amd64
2004-02-04 21:52:57 +00:00
Peter Wemm
9b68618df0 Add an additional field to the elf brandinfo structure to support
quicker exec-time replacement of the elf interpreter on an emulation
environment where an entire /compat/* tree isn't really warranted.
2003-12-23 02:42:39 +00:00
Peter Wemm
c460ac3a00 Add sysentvec->sv_fixlimits() hook so that we can catch cases on 64 bit
systems where the data/stack/etc limits are too big for a 32 bit process.

Move the 5 or so identical instances of ELF_RTLD_ADDR() into imgact_elf.c.

Supply an ia32_fixlimits function.  Export the clip/default values to
sysctl under the compat.ia32 heirarchy.

Have mmap(0, ...) respect the current p->p_limits[RLIMIT_DATA].rlim_max
value rather than the sysctl tweakable variable.  This allows mmap to
place mappings at sensible locations when limits have been reduced.

Have the imgact_elf.c ld-elf.so.1 placement algorithm use the same
method as mmap(0, ...) now does.

Note that we cannot remove all references to the sysctl tweakable
maxdsiz etc variables because /etc/login.conf specifies a datasize
of 'unlimited'.  And that causes exec etc to fail since it can no
longer find space to mmap things.
2003-09-25 01:10:26 +00:00
David E. O'Brien
677b542ea2 Use __FBSDID(). 2003-06-11 00:56:59 +00:00
Marcel Moolenaar
a063facbf6 Fix ia32 compat on ia64. Recent ia64 MD changes caused the garbage on
the stack to be changed in a way incompatible with elf32_map_insert()
where we used data_buf without initializing it for when the partial
mapping resulting in a misaligned image (typical when the page size
implied by the image is not the same as the page size in use by the
kernel). Since data_buf is passed by reference to vm_map_find(), the
compiler cannot warn about it.

While here, move all local variables to the top of the function.
2003-05-31 19:55:05 +00:00
Warner Losh
a163d034fa Back out M_* changes, per decision of the TRB.
Approved by: trb
2003-02-19 05:47:46 +00:00
Alfred Perlstein
44956c9863 Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.
Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
2003-01-21 08:56:16 +00:00
Jake Burkholder
e548a1d4c8 - Provide backwards compatibility for kern.fallback_elf_brand.
- Use the generic elf type macros in imgact_elf.h instead of ifdefing the
  entire contents of the header.
2003-01-05 03:48:14 +00:00
Jake Burkholder
a360a43dd5 Improve the way that an elf image activator for an alternate word size is
included in the kernel.  Include imgact_elf.c in conf/files,  instead of
both imgact_elf32.c and imgact_elf64.c, which will use the default word
size for an architecture as defined in machine/elf.h.  Architectures that
wish to build an additional image activator for an alternate word size can
include either imgact_elf32.c or imgact_elf64.c in files.${ARCH}, which
allows it to be dependent on MD options instead of solely on architecture.

Glanced at by:	peter
2003-01-04 22:07:48 +00:00
Marcel Moolenaar
551d79e177 Fix multiple registration of the elf_legacy_coredump sysctl variable.
The duplication is caused by the fact that imgact_elf.c is included
by both imgact_elf32.c and imgact_elf64.c and both are compiled by
default on ia64. Consequently, we have two seperate copies of the
elf_legacy_coredump variable due to them being declared static, and
two entries for the same sysctl in the linker set, both referencing
the unique copy of the elf_legacy_coredump variable. Since the second
sysctl cannot be registered, one of the elf_legacy_coredump variables
can not be tuned (if ordering still holds, it's the ELF64 related one).

The only solution is to create two different sysctl variables, just
like the elf<32|64>_trace sysctl variables. This unfortunately is an
(user) interface change, but unavoidable. Thus, on ELF32 platforms
the sysctl variable is called elf32_legacy_coredump and on ELF64
platforms it is called elf64_legacy_coredump. Platforms that have
both ELF formats have both sysctl variables.

These variables should probably be retired sooner rather than later.
2002-12-21 01:15:39 +00:00
Matthew Dillon
fa7dd9c5bc Change the way ELF coredumps are handled. Instead of unconditionally
skipping read-only pages, which can result in valuable non-text-related
data not getting dumped, the ELF loader and the dynamic loader now mark
read-only text pages NOCORE and the coredump code only checks (primarily) for
complete inaccessibility of the page or NOCORE being set.

Certain applications which map large amounts of read-only data will
produce much larger cores.  A new sysctl has been added,
debug.elf_legacy_coredump, which will revert to the old behavior.

This commit represents collaborative work by all parties involved.
The PR contains a program demonstrating the problem.

PR:		kern/45994
Submitted by:	"Peter Edwards" <pmedwards@eircom.net>, Archie Cobbs <archie@dellroad.org>
Reviewed by:	jdp, dillon
MFC after:	7 days
2002-12-16 19:24:43 +00:00
Robert Watson
6d7bdc8def Assign value of NULL to imgp->execlabel when imgp is initialized
in the ELF code.  Missed in earlier merge from the MAC tree.

Approved by:	re
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-11-08 20:49:50 +00:00
Robert Watson
450ffb4427 Remove reference to struct execve_args from struct imgact, which
describes an image activation instance.  Instead, make use of the
existing fname structure entry, and introduce two new entries,
userspace_argv, and userspace_envv.  With the addition of
mac_execve(), this divorces the image structure from the specifics
of the execve() system call, removes a redundant pointer, etc.
No semantic change from current behavior, but it means that the
structure doesn't depend on syscalls.master-generated includes.

There seems to be some redundant initialization of imgact entries,
which I have maintained, but which could probably use some cleaning
up at some point.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-11-05 01:59:56 +00:00
Alexander Kabaev
96725dd01a Handle binaries with arbitrary number PT_LOAD sections, not only
ones with one text and one data section.

The text and data rlimit checks still needs to be fixed to properly
accout for additional sections.

Reviewed by:	peter (slightly different patch version)
2002-10-23 01:57:39 +00:00
Robert Drehmel
e80fb43467 Use strlcpy() instead of strncpy() to copy NUL terminated strings
for safety and consistency.
2002-10-17 20:03:38 +00:00
Jake Burkholder
05ba50f522 Use the fields in the sysentvec and in the vm map header in place of the
constants VM_MIN_ADDRESS, VM_MAXUSER_ADDRESS, USRSTACK and PS_STRINGS.
This is mainly so that they can be variable even for the native abi, based
on different machine types.  Get stack protections from the sysentvec too.
This makes it trivial to map the stack non-executable for certain abis, on
machines that support it.
2002-09-21 22:07:17 +00:00
Peter Wemm
d0ca7c29dc Do not blow up when we walk off the end of the brands list.
Found by:	kris, jake
2002-09-08 02:17:44 +00:00
Matthew Dillon
21c2d0479c Alright, fix the problems with the elf loader for the Alpha. It turns
out that there is no easy way to discern the difference between a text
segment and a data segment through the read-only OR execute attribute
in the elf segment header, so revert the algorithm to what it was before.

Neither can we account for multiple data load segments in the vmspace
structure (at least not without more work), due to assumptions obreak()
makes in regards to the data start and data size fields.

Retain RLIMIT_VMEM checking by using a local variable to track the
total bytes of data being loaded.

Reviewed by:	peter
X-MFC after:	ASAP
2002-09-04 04:42:12 +00:00
Peter Wemm
9782ecbab0 Make the text segment locating heuristics from rev 1.121 more reliable
so that it works on the Alpha.  This defines the segment that the entry
point exists in as 'text' and any others (usually one) as data.

Submitted by: tmm
Tested on: i386, alpha
2002-09-03 21:18:17 +00:00
Matthew Dillon
05ef87980a Grammer cleanup 2002-09-02 17:27:30 +00:00
Jake Burkholder
5fe3ed629a Moved elf brand identification into a function. Fully identify the
brand early in the process of loading an elf file, so that we can
identify the sysentvec, and so that we do not continue if we do not
have a brand (and thus a sysentvec).  Use the values in the sysentvec
for the page size and vm ranges unconditionally, since they are all
filled in now.
2002-09-02 04:50:57 +00:00
Jake Burkholder
8cf034521b Fixed more indentation bugs. 2002-09-02 02:41:26 +00:00
Matthew Dillon
cac4515267 Implement data, text, and vmem limit checking in the elf loader and svr4
compat code.  Clean up accounting for multiple segments.  Part 1/2.

Submitted by:	Andrey Alekseyev <uitm@zenon.net> (with some modifications)
MFC after:	3 days
2002-08-30 18:09:46 +00:00
Jake Burkholder
81f223ca02 Fixed most indentation bugs. 2002-08-25 22:36:52 +00:00
Jake Burkholder
ca0387ef9f Fixed placement of operators. Wrapped long lines. 2002-08-25 20:48:45 +00:00
Jake Burkholder
fd559a8a39 Fixed white space around operators, casts and reserved words.
Reviewed by:	md5
2002-08-24 22:55:16 +00:00
Jake Burkholder
a7cddfed7f return x; -> return (x);
return(x); -> return (x);

Reviewed by:	md5
2002-08-24 22:01:40 +00:00
Robert Watson
9ca435893b In order to better support flexible and extensible access control,
make a series of modifications to the credential arguments relating
to file read and write operations to cliarfy which credential is
used for what:

- Change fo_read() and fo_write() to accept "active_cred" instead of
  "cred", and change the semantics of consumers of fo_read() and
  fo_write() to pass the active credential of the thread requesting
  an operation rather than the cached file cred.  The cached file
  cred is still available in fo_read() and fo_write() consumers
  via fp->f_cred.  These changes largely in sys_generic.c.

For each implementation of fo_read() and fo_write(), update cred
usage to reflect this change and maintain current semantics:

- badfo_readwrite() unchanged
- kqueue_read/write() unchanged
  pipe_read/write() now authorize MAC using active_cred rather
  than td->td_ucred
- soo_read/write() unchanged
- vn_read/write() now authorize MAC using active_cred but
  VOP_READ/WRITE() with fp->f_cred

Modify vn_rdwr() to accept two credential arguments instead of a
single credential: active_cred and file_cred.  Use active_cred
for MAC authorization, and select a credential for use in
VOP_READ/WRITE() based on whether file_cred is NULL or not.  If
file_cred is provided, authorize the VOP using that cred,
otherwise the active credential, matching current semantics.

Modify current vn_rdwr() consumers to pass a file_cred if used
in the context of a struct file, and to always pass active_cred.
When vn_rdwr() is used without a file_cred, pass NOCRED.

These changes should maintain current semantics for read/write,
but avoid a redundant passing of fp->f_cred, as well as making
it more clear what the origin of each credential is in file
descriptor read/write operations.

Follow-up commits will make similar changes to other file descriptor
operations, and modify the MAC framework to pass both credentials
to MAC policy modules so they can implement either semantic for
revocation.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-15 20:55:08 +00:00
Jeff Roberson
619eb6e579 - Hold the vnode lock throughout execve.
- Set VV_TEXT in the top level execve code.
 - Fixup the image activators to deal with the newly locked vnode.
2002-08-13 06:55:28 +00:00
Jeff Roberson
e6e370a7fe - Replace v_flag with v_iflag and v_vflag
- v_vflag is protected by the vnode lock and is used when synchronization
   with VOP calls is needed.
 - v_iflag is protected by interlock and is used for dealing with vnode
   management issues.  These flags include X/O LOCK, FREE, DOOMED, etc.
 - All accesses to v_iflag and v_vflag have either been locked or marked with
   mp_fixme's.
 - Many ASSERT_VOP_LOCKED calls have been added where the locking was not
   clear.
 - Many functions in vfs_subr.c were restructured to provide for stronger
   locking.

Idea stolen from:	BSD/OS
2002-08-04 10:29:36 +00:00
Peter Wemm
3ebc124838 Infrastructure tweaks to allow having both an Elf32 and an Elf64 executable
handler in the kernel at the same time.  Also, allow for the
exec_new_vmspace() code to build a different sized vmspace depending on
the executable environment.  This is a big help for execing i386 binaries
on ia64.   The ELF exec code grows the ability to map partial pages when
there is a page size difference, eg: emulating 4K pages on 8K or 16K
hardware pages.

Flesh out the i386 emulation support for ia64.  At this point, the only
binary that I know of that fails is cvsup, because the cvsup runtime
tries to execute code in pages not marked executable.

Obtained from:  dfr (mostly, many tweaks from me).
2002-07-20 02:56:12 +00:00
Jeff Roberson
0b2ed1aef7 Clean up execve locking:
- Grab the vnode object early in exec when we still have the vnode lock.
 - Cache the object in the image_params.
 - Make use of the cached object in imgact_*.c
2002-07-06 07:00:01 +00:00
Jens Schweikhardt
21dc7d4f57 Fix typo in the BSD copyright: s/withough/without/
Spotted and suggested by:	des
MFC after:	3 weeks
2002-06-02 20:05:59 +00:00
Alfred Perlstein
4d77a549fe Remove __P. 2002-03-19 21:25:46 +00:00
John Baldwin
a854ed9893 Simple p_ucred -> td_ucred changes to start using the per-thread ucred
reference.
2002-02-27 18:32:23 +00:00
Mark Peek
bf43c504c9 Remove whitespace at end of line. 2001-12-16 17:21:16 +00:00
Paul Saab
cbc89bfbfe Make MAXTSIZ, DFLDSIZ, MAXDSIZ, DFLSSIZ, MAXSSIZ, SGROWSIZ loader
tunable.

Reviewed by:	peter
MFC after:	2 weeks
2001-10-10 23:06:54 +00:00
Matthew Dillon
3418ebebfe Make uio_yield() a global. Call uio_yield() between chunks
in vn_rdwr_inchunks(), allowing other processes to gain an exclusive
lock on the vnode.  Specifically: directory scanning, to avoid a race to the
root directory, and multiple child processes coring simultaniously so they
can figure out that some other core'ing child has an exclusive adv lock and
just exit instead.

This completely fixes performance problems when large programs core.  You
can have hundreds of copies (forked children) of the same binary core all
at once and not notice.

MFC after:	3 days
2001-09-26 06:54:32 +00:00
Julian Elischer
b40ce4165d KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after:    ha ha ha ha
2001-09-12 08:38:13 +00:00
Matthew Dillon
06ae1e91c4 This brings in a Yahoo coredump patch from Paul, with additional mods by
me (addition of vn_rdwr_inchunks).  The problem Yahoo is solving is that
if you have large process images core dumping, or you have a large number of
forked processes all core dumping at the same time, the original coredump code
would leave the vnode locked throughout.  This can cause the directory vnode
to get locked up, which can cause the parent directory vnode to get locked
up, and so on all the way to the root node, locking the entire machine up
for extremely long periods of time.

This patch solves the problem in two ways.  First it uses an advisory
non-blocking lock to abort multiple processes trying to core to the same
file.  Second (my contribution) it chunks up the writes and uses bwillwrite()
to avoid holding the vnode locked while blocking in the buffer cache.

Submitted by:	ps
Reviewed by:	dillon
MFC after:	2 weeks
2001-09-08 20:02:33 +00:00
Peter Wemm
ef4181d98e For ia64, set the default elf brand to be FreeBSD. This is temporarily
necessary only for as long as we're using a linux toolchain.
2001-09-02 12:23:08 +00:00
Brian Somers
546a92c4d4 OR M_WAITOK with M_ZERO in malloc()s args for clarity. 2001-08-28 23:58:32 +00:00