Commit Graph

215 Commits

Author SHA1 Message Date
Edward Tomasz Napierala
afcc55f318 All the racct_*() calls need to happen with the proc locked. Fixing this
won't happen before 9.0.  This commit adds "#ifdef RACCT" around all the
"PROC_LOCK(p); racct_whatever(p, ...); PROC_UNLOCK(p)" instances, in order
to avoid useless locking/unlocking in kernels built without "options RACCT".
2011-07-06 20:06:44 +00:00
Jonathan Anderson
12bc222e57 Add some checks to ensure that Capsicum is behaving correctly, and add some
more explicit comments about what's going on and what future maintainers
need to do when e.g. adding a new operation to a sys_machdep.c.

Approved by: mentor(rwatson), re(bz)
2011-06-30 10:56:02 +00:00
Edward Tomasz Napierala
1ba5ad4210 Add accounting for most of the memory-related resources.
Sponsored by:	The FreeBSD Foundation
Reviewed by:	kib (earlier version)
2011-04-05 20:23:59 +00:00
Matthew D Fleming
08b163fa51 Put the general logic for being a CPU hog into a new function
should_yield().  Use this in various places.  Encapsulate the common
case of check-and-yield into a new function maybe_yield().

Change several checks for a magic number of iterations to use
should_yield() instead.

MFC after:	1 week
2011-02-02 16:35:10 +00:00
Konstantin Belousov
26d8f3e11d Use the same expression to report stack protection mode for AT_STACKEXEC
as the expression used by exec_new_vmspace().
2011-01-08 18:41:19 +00:00
Konstantin Belousov
291c06a127 In elf image activator, read and apply the stack protection mode from
PT_GNU_STACK program header, if present and enabled. Two new sysctls
are provided, kern.elf32.nxstack and kern.elf64.nxstack, that allow to
enable PT_GNU_STACK for ABIs of specified bitsize, if ABI decided to
support shared page.

Inform rtld about access mode of the stack initial mapping by
AT_STACKPROT aux vector.

At the moment, the default is disabled, waiting for the usermode
support bits.
2011-01-08 16:30:59 +00:00
Konstantin Belousov
ed167eaa80 Collect code to translate between vm_prot_t and p_flags into helper
functions.

MFC after:	1 week
2011-01-08 16:02:14 +00:00
Attilio Rao
7f08176ee8 Add the ability for GDB to printout the thread name along with other
thread specific informations.

In order to do that, and in order to avoid KBI breakage with existing
infrastructure the following semantic is implemented:
- For live programs, a new member to the PT_LWPINFO is added (pl_tdname)
- For cores, a new ELF note is added (NT_THRMISC) that can be used for
  storing thread specific, miscellaneous, informations. Right now it is
  just popluated with a thread name.

GDB, then, retrieves the correct informations from the corefile via the
BFD interface, as it groks the ELF notes and create appropriate
pseudo-sections.

Sponsored by:	Sandvine Incorporated
Tested by:	gianni
Discussed with:	dim, kan, kib
MFC after:	2 weeks
2010-11-22 14:42:13 +00:00
Konstantin Belousov
ee235befcb Supply some useful information to the started image using ELF aux vectors.
In particular, provide pagesize and pagesizes array, the canary value
for SSP use, number of host CPUs and osreldate.

Tested by:	marius (sparc64)
MFC after:	1 month
2010-08-17 08:55:45 +00:00
Alfred Perlstein
fba6b1af2e Don't leak core_buf or gzfile if doing a compressed core file and we
hit an error condition.

Obtained from: Juniper Networks
2010-04-30 03:13:24 +00:00
Nathan Whitehorn
a0ea661f5e Add the ELF relocation base to struct image_params. This will be
required to correctly relocate the executable entry point's function
descriptor on powerpc64.
2010-03-25 14:31:26 +00:00
Nathan Whitehorn
920acedb80 Change the way text_addr and data_addr are computed to use the
executable status of segments instead of detecting the main text segment
by which segment contains the program entry point. This affects
obreak() and is required for correct operation of that function
on 64-bit PowerPC systems. The previous behavior was apparently
required only for the Alpha, which is no longer supported.

Reviewed by:	jhb
Tested on:	amd64, sparc64, powerpc
2010-03-25 14:21:22 +00:00
Nathan Whitehorn
841c0c7ec7 Provide groundwork for 32-bit binary compatibility on non-x86 platforms,
for upcoming 64-bit PowerPC and MIPS support. This renames the COMPAT_IA32
option to COMPAT_FREEBSD32, removes some IA32-specific code from MI parts
of the kernel and enhances the freebsd32 compatibility code to support
big-endian platforms.

Reviewed by:	kib, jhb
2010-03-11 14:49:06 +00:00
Alfred Perlstein
8b325009a3 put calls to gzclose() under ifdef COMPRESS_USER_CORES to prevent
undefined symbols on kernels without this option.

Reported by: Alexander Best
2010-03-04 21:53:45 +00:00
Alfred Perlstein
e722820434 Merge projects/enhanced_coredumps (r204346) into HEAD:
Enhanced process coredump routines.

  This brings in the following features:
  1) Limit number of cores per process via the %I coredump formatter.
  Example:
    if corefilename is set to %N.%I.core AND num_cores = 3, then
    if a process "rpd" cores, then the corefile will be named
    "rpd.0.core", however if it cores again, then the kernel will
    generate "rpd.1.core" until we hit the limit of "num_cores".

    this is useful to get several corefiles, but also prevent filling
    the machine with corefiles.

  2) Encode machine hostname in core dump name via %H.

  3) Compress coredumps, useful for embedded platforms with limited space.
    A sysctl kern.compress_user_cores is made available if turned on.

    To enable compressed coredumps, the following config options need to be set:
    options COMPRESS_USER_CORES
    device zlib   # brings in the zlib requirements.
    device gzio   # brings in the kernel vnode gzip output module.

  4) Eventhandlers are fired to indicate coredumps in progress.

  5) The imgact sv_coredump routine has grown a flag to pass in more
  state, currently this is used only for passing a flag down to compress
  the coredump or not.

  Note that the gzio facility can be used for generic output of gzip'd
  streams via vnodes.

Obtained from: Juniper Networks
Reviewed by: kan
2010-03-02 06:58:58 +00:00
Konstantin Belousov
7564c4ad9a If ET_DYN binary has non-zero base address for some reason, honour it
and do not relocate the binary to ET_DYN_LOAD_ADDR. This allows for the
binary author to influence address map of the process. In particular,
when the binary is actually an interpeter, this allows to have almost
usual process address map.

Communicate the relocation bias of the mapping for interpeter-less
ET_DYN binary, that is interperter itself, in AT_BASE aux entry. This
way, rtld is able to find its dynamic structure and relocate itself.
Note that mapbase in the rtld is still wrong and requires further
fixing.

Reported and tested by:	rwatson
Discussed with:	kan
MFC after:	3 days
2009-10-18 12:57:48 +00:00
Konstantin Belousov
ab02d85f0d Map PIE binaries at non-zero base address.
Discussed with:	bz
Reviewed by:	kan
Tested by:	bz (i386, amd64), bsam (linux)
MFC after:	some time
2009-10-10 15:33:01 +00:00
Konstantin Belousov
5b33842a9b Do not map segments of zero length.
Discussed with:	bz
Reviewed by:	kan
Tested by:	bz (i386, amd64), bsam (linux)
MFC after:	some time
2009-10-10 15:28:52 +00:00
Bjoern A. Zeeb
925c8b5b04 Print a warning in case we cannot add more brandinfo because
we would overflow the MAX_BRANDS sized array.

Reviewed by:	kib
MFC After:	1 month
2009-10-03 10:50:00 +00:00
Bjoern A. Zeeb
ecc2fda872 Make sure FreeBSD binaries without .note.ABI-tag section work
correctly and do not match a colliding Debian GNU/kFreeBSD
brandinfo statements.
For this mark the Debian GNU/kFreeBSD brandinfo that it must have
an .note.ABI-tag section and ignore the old EI_OSABI brandinfo
when comparing a possibly colliding set of options.

Due to SYSINIT we add the brandinfo in a non-deterministic order,
so native FreeBSD is not always first. We may want to consider
to force native FreeBSD to come first as well.

The only way a problem could currently be noticed is when running an
i386 binary without the .note.ABI-tag on amd64 and the Debian GNU/kFreeBSD
brandinfo  was matched first,  as the fallback to ld-elf32.so.1 does
not exist in that case.

Reported and tested by:	ticso
In collaboration with:	kib
MFC after:		3 days
2009-08-30 14:38:17 +00:00
Bjoern A. Zeeb
89ffc202d6 Fix handling of .note.ABI-tag section for GNU systems [1].
Handle GNU/Linux according to LSB Core Specification 4.0,
Chapter 11. Object Format, 11.8. ABI note tag.

Also check the first word of desc, not only name, according to
glibc abi-tags specification to distinguish between Linux and
kFreeBSD.

Add explicit handling for Debian GNU/kFreeBSD, which runs
on our kernels as well [2].

In {amd64,i386}/trap.c, when checking osrel of the current process,
also check the ABI to not change the signal behaviour for Linux
binary processes, now that we save an osrel version for all three
from the lists above in struct proc [2].

These changes make it possible to run FreeBSD, Debian GNU/kFreeBSD
and Linux binaries on the same machine again for at least i386 and
amd64, and no longer break kFreeBSD which was detected as GNU(/Linux).

PR:		kern/135468
Submitted by:	dchagin [1] (initial patch)
Suggested by:	kib [2]
Tested by:	Petr Salinger (Petr.Salinger seznam.cz) for kFreeBSD
Reviewed by:	kib
MFC after:	3 days
2009-08-24 16:19:47 +00:00
Dmitry Chagin
cd899aad76 Fix KBI breakage by r190520 which affects older linux.ko binaries:
1) Move the new field (brand_note) to the end of the Brandinfo structure.
2) Add a new flag BI_BRAND_NOTE that indicates that the brand_note pointer
   is valid.
3) Use the brand_note field if the flag BI_BRAND_NOTE is set and as old
   modules won't have the flag set, so the new field brand_note would be
   ignored.

Suggested by:	jhb
Reviewed by:	jhb
Approved by:	kib (mentor)
MFC after:	6 days
2009-04-05 09:27:19 +00:00
Konstantin Belousov
267c52fc98 Fix several issues with parsing the notes for ELF objects.
Badly formed ELF note may cause the caclulated pointer to the next note
to point both after the note region, that was checked in the code, but
also to point before the region, that was not checked [1]. Remember the
first note location in note0 and leap out if the note is not between
note0 and note_end.

In the similar way, badly formed note may cause infinite loop by
pointing next note into the same or previous note. Guard against this by
limiting amount of loop iterations by arbitrary choosen big number.

For clarity, check the calculated note alignment in each iteration.

Reported by:	Chris Palmer <chris noncombatant org> [1]
PR:	kern/132886
Reviewed and tested by:	dchagin
MFC after:	3 days
2009-03-22 13:42:41 +00:00
Konstantin Belousov
3ff063577b Supply AT_EXECPATH auxinfo entry to the interpreter, both for native and
compat32 binaries.

Tested by:	pho
Reviewed by:	kan
2009-03-17 12:53:28 +00:00
Konstantin Belousov
429f5a589b Use the properly sized types for ELF object header and program headers.
This fixes osrel fetching from the FreeBSD branding note for the 64bit
platforms.

Reported by:	swell.k gmail com
Reviewed by:	dchagin
Tested by:	dchagin, swell.k gmail com
2009-03-17 09:50:40 +00:00
Dmitry Chagin
32c01de21c Implement new way of branding ELF binaries by looking to a
".note.ABI-tag" section.

The search order of a brand is changed, now first of all the
".note.ABI-tag" is looked through.

Move code which fetch osreldate for ELF binary to check_note() handler.

PR:		118473
Approved by:	kib (mentor)
2009-03-13 16:40:51 +00:00
Robert Watson
95c807cf5e When a statically linked binary is executed (or at least, one without
an interpreter definition in its program header), set the auxiliary
ELF argument AT_BASE to 0 rather than to the address that we would
have mapped the interpreter at if there had been one.

The ELF ABI specifications appear to be ambiguous as to the desired
behavior in this situation, as they define AT_BASE as the base address
of the interpreter, but do not mention what to do if there is none.
On Solaris, AT_BASE will be set to the base address of the static
binary if there is no interpreter, and on Linux, AT_BASE is set to 0.
We go with the Linux semantics as they are of more immediate utility
and allow the early runtime environment to know that the kernel has
not mapped an interpreter, but because AT_PHDR points at the ELF
header for the running binary, it is still possible to retrieve all
required mapping information when the process starts should it be
required.  Either approach would be preferable to our current behavior
of passing a pointer to an unmapped region of user memory as AT_BASE.

MFC after:	3 weeks
2009-01-25 12:07:43 +00:00
Peter Wemm
a3ac8c94cf Remove sysctl debug.elf_trace and the trace field in auxargs. They go
nowhere.  It used to be the equivalent of $LD_DEBUG in rtld-elf.
Elf_Auxargs is an internal structure.
2008-12-17 16:54:29 +00:00
Warner Losh
35c2a5a852 Minor style(9) nit. 2008-12-17 16:25:20 +00:00
Konstantin Belousov
6f3475454e Remove two remnant uses of AT_DEBUG. 2008-12-17 13:13:35 +00:00
Konstantin Belousov
387ad99800 If the ABI-overriden interpreter was not loaded, do not set
have_interp to TRUE. This allows the code in image activator to try
/libexec/ld-elf.so.1 as interpreter when newinterp is not found to
execute.

Reviewed by:	peter
MFC after:	2 weeks (together with r175105)
2008-10-08 11:11:36 +00:00
John Baldwin
ccd3953e5f Go back to using the process command name (p_comm) for the file name and
command line arguments stored in the note at the beginning of a core dump
instead of the current thread name.

Reviewed by:	julian
2008-05-15 03:07:34 +00:00
Jeff Roberson
6617724c5f Remove kernel support for M:N threading.
While the KSE project was quite successful in bringing threading to
FreeBSD, the M:N approach taken by the kse library was never developed
to its full potential.  Backwards compatibility will be provided via
libmap.conf for dynamically linked binaries and static binaries will
be broken.
2008-03-12 10:12:01 +00:00
Attilio Rao
22db15c06f VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in
conjuction with 'thread' argument passing which is always curthread.
Remove the unuseful extra-argument and pass explicitly curthread to lower
layer functions, when necessary.

KPI results broken by this change, which should affect several ports, so
version bumping and manpage update will be further committed.

Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>
2008-01-13 14:44:15 +00:00
Attilio Rao
cb05b60a89 vn_lock() is currently only used with the 'curthread' passed as argument.
Remove this argument and pass curthread directly to underlying
VOP_LOCK1() VFS method. This modify makes the code cleaner and in
particular remove an annoying dependence helping next lockmgr() cleanup.
KPI results, obviously, changed.

Manpage and FreeBSD_version will be updated through further commits.

As a side note, would be valuable to say that next commits will address
a similar cleanup about VFS methods, in particular vop_lock1 and
vop_unlock.

Tested by:	Diego Sardina <siarodx at gmail dot com>,
		Andrea Di Pasquale <whyx dot it at gmail dot com>
2008-01-10 01:10:58 +00:00
Peter Wemm
4113f8d741 Fall back to the binary-specified interpreter (ld-elf.so.1) if the
ABI override binary isn't found.  This could probably be smoother, but
it is what I did in p4 change #126891 on 2007/09/27.  It should solve
the "ld-elf32.so.1"-in-chroot problem.
2008-01-05 08:35:56 +00:00
Konstantin Belousov
f231de478e Implement fetching of the __FreeBSD_version from the ELF ABI-tag note.
The value is read into the p_osrel member of the struct proc. p_osrel
is set to 0 for the binaries without the note.

MFC after:	3 days
2007-12-04 12:28:07 +00:00
Konstantin Belousov
93d1c72883 Check for the program headers alignment of the ELF images before
dereferencing. Unaligned access could cause panic on strict alignment
architectures.

Reviewed by:	marcel, marius (also tested on sparc64, thanks !)
MFC after:	3 days
2007-12-04 12:21:27 +00:00
Julian Elischer
e01eafef2a A bunch more files that should probably print out a thread name
instead of a process name.
2007-11-14 06:51:33 +00:00
Konstantin Belousov
89b57fcf01 Fix for the panic("vm_thread_new: kstack allocation failed") and
silent NULL pointer dereference in the i386 and sparc64 pmap_pinit()
when the kmem_alloc_nofault() failed to allocate address space. Both
functions now return error instead of panicing or dereferencing NULL.

As consequence, vmspace_exec() and vmspace_unshare() returns the errno
int. struct vmspace arg was added to vm_forkproc() to avoid dealing
with failed allocation when most of the fork1() job is already done.

The kernel stack for the thread is now set up in the thread_alloc(),
that itself may return NULL. Also, allocation of the first process
thread is performed in the fork1() to properly deal with stack
allocation failure. proc_linkup() is separated into proc_linkup()
called from fork1(), and proc_linkup0(), that is used to set up the
kernel process (was known as swapper).

In collaboration with:	Peter Holm
Reviewed by:	jhb
2007-11-05 11:36:16 +00:00
John Baldwin
19059a13ed Rework the support for ABIs to override resource limits (used by 32-bit
processes under 64-bit kernels).  Previously, each 32-bit process overwrote
its resource limits at exec() time.  The problem with this approach is that
the new limits affect all child processes of the 32-bit process, including
if the child process forks and execs a 64-bit process.  To fix this, don't
ovewrite the resource limits during exec().  Instead, sv_fixlimits() is
now replaced with a different function sv_fixlimit() which asks the ABI to
sanitize a single resource limit.  We then use this when querying and
setting resource limits.  Thus, if a 32-bit process sets a limit, then
that new limit will be inherited by future children.  However, if the
32-bit process doesn't change a limit, then a future 64-bit child will
see the "full" 64-bit limit rather than the 32-bit limit.

MFC is tentative since it will break the ABI of old linux.ko modules (no
other modules are affected).

MFC after:	1 week
2007-05-14 22:40:04 +00:00
Xin LI
4f506694bb Use FOREACH_PROC_IN_SYSTEM instead of using its unrolled form. 2007-01-17 14:58:53 +00:00
Alan Cox
976a87a284 Add vm map and object locking to each_writable_segment().
Noticed by: jhb@
MFC after: 3 weeks
2006-11-19 23:38:59 +00:00
Alan Cox
e5e6093ba9 Avoid a vm object reference leak in a rarely used code path.
An executable contains at most one PT_INTERP program header.  Therefore,
the loop that searches for it can terminate after it is found rather than
iterating over the entire set of program headers.

Eliminate an unneeded initialization.

Reviewed by: tegge
2006-01-21 20:11:49 +00:00
Maxim Sobolev
d49b21093c Fix breakage introduced in the previous commit. 2005-12-26 22:32:52 +00:00
Maxim Sobolev
900b28f9f6 Remove kern.elf32.can_exec_dyn sysctl. Instead extend Brandinfo structure
with flags bitfield and set BI_CAN_EXEC_DYN flag for all brands that usually
allow executing elf dynamic binaries (aka shared libraries). When it is
requested to execute ET_DYN elf image check if this flag is on after we
know the elf brand allowing execution if so.

PR:		kern/87615
Submitted by:	Marcin Koziej <creep@desk.pl>
2005-12-26 21:23:57 +00:00
Alan Cox
60bb39431a Maintain the lock on the vnode for most of exec_elfN_imgact().
Specifically, it is required for the I/O that may be performed by
elfN_load_section().

Avoid an obscure deadlock in the a.out, elf, and gzip image
activators.  Add a comment describing why the deadlock does not occur
in the common case and how it might occur in less usual circumstances.

Eliminate an unused variable from exec_aout_imgact().

In collaboration with: tegge
2005-12-24 04:57:50 +00:00
Alan Cox
373d1a3f8c Maintain the vnode lock throughout elfN_load_file() rather than releasing
it and reacquiring it in vrele().  Consequently, there is no reason to
increase the reference count on the vm object caching the file's pages.
Reviewed by: tegge

Eliminate unused parameters to elfN_load_file().
2005-12-21 18:58:40 +00:00
Alan Cox
ff6f03c7cd Eliminate an unneeded (vm_prot_t) parameter from two functions. Eliminate
unnecessary uses of a local variable.

Reviewed by: tegge
2005-12-20 23:42:18 +00:00
Alan Cox
044bbbb523 Correct a long-standing problem in elfN_map_insert(): In order to copy a
page to user space, the user space mapping must allow write access.

In collaboration with: tegge@
MFC after: 3 weeks
2005-12-17 19:40:47 +00:00