Commit Graph

8616 Commits

Author SHA1 Message Date
Jeff Roberson
7499fd8de9 - Fix a problem that slipped through review; the stack member of the lockmgr
structure should have the lk_ prefix.
 - Add stack_print(lkp->lk_stack) to the information printed with
   lockmgr_printinfo().
2005-08-03 04:59:07 +00:00
Jeff Roberson
e8ddb61d38 - Replace the series of DEBUG_LOCKS hacks which tried to save the vn_lock
caller by saving the stack of the last locker/unlocker in lockmgr.  We
   also put the stack in KTR at the moment.

Contributed by:		Antoine Brodin <antoine.brodin@laposte.net>
2005-08-03 04:48:22 +00:00
Jeff Roberson
8d511e2a05 - Add support for saving stack traces and displaying them via printf(9)
and KTR.

Contributed by:		Antoine Brodin <antoine.brodin@laposte.net>
Concept code from:	Neal Fachan <neal@isilon.com>
2005-08-03 04:27:40 +00:00
David Xu
3c424d1447 In adjustrunqueue(), add code to handle thread migrating case for
ULE scheduler. In original code, local run queue of threaded ksegrp
is corrupted if adjustrunqueue() is called while thread is migrating.
2005-08-03 01:23:45 +00:00
Ruslan Ermilov
2319835713 Long overdue, keep up with mbuf.h,v 1.148. 2005-08-02 20:03:23 +00:00
Kelly Yancey
dcb5fef5db Make getsockopt(..., SOL_SOCKET, SO_ACCEPTCONN, ...) work per IEEE Std
1003.1 (POSIX).
2005-08-01 21:15:09 +00:00
David Xu
3d16f519b6 If a thread was removed from system run queue, kse_assign shouldn't
add it again.
2005-07-31 15:11:21 +00:00
Alexander Leidinger
32069af652 The resource_xxx routines in subr_hints.c are called before and after the
kenv environment in kern_environment.c switches to dynamic kenv. The prior
call sets the static variable hintp to the static hints in subr_hints.c
(hintmode==0).

However, changes to the environment are not detected by the resource_xxx
lookups after the change to dynamic kernel environment, so the lookup
routines only report the old stuff of hintmode==0, even after the change to
the dynamic kenv. This causes kenv users to see a different environment than
the kernel routines.

This is a problem in the mixer.c code that looks up initial mixer volume
settings from the hints: If the hints are dynamic and not from the
device.hints file, mixer.c doesn't see them, but kenv does.

The patch from the PR (modified to comply to the style of the function)
solves this.

PR:		83686
Submitted by:	Harry Coin <harrycoin@qconline.com>
2005-07-31 10:46:55 +00:00
Alexander Leidinger
3904769ba8 Add bounds checking to the setenv part of the kernel environment.
This has no security implications since only root is allowed to use
kenv(1) (and corrupt the kernel memory after adding too much variables
previous to this commit).

This is based upon the PR [1] mentioned below, but extended to check both
bounds (in case of an overflow of the counting variable) and to comply
to the style of the function. An overflow of the counting variable
shouldn't happen after adding the check for the upper bound, but better
safe than sorry (in case some other function in the kernel overwrites
random memory).

An interested soul may want to add a printf to notify root in case the
bounds are hit.

Also allocate KENV_SIZE+1 entries (the array is NULL-terminated), since
the comment for KENV_SIZE says it's the maximum number of environment
strings. [2]

PR:		83687 [1]
Submitted by:	Harry Coin <harrycoin@qconline.com> [1]
Submitted by:	Ariff Abdullah <skywizard@MyBSD.org.my> [2]
2005-07-31 10:28:35 +00:00
Joseph Koshy
fadcc6e201 Fail the module loading process if the currently executing kernel
was not compiled with 'options HWPMC_HOOKS' or if the compiled-in
version numbers of the kernel and module are out of sync.

Reported by:	cracauer
MFC after:	3 days
2005-07-30 09:02:42 +00:00
Paul Saab
1126349ae7 Ignore mutex asserts when we're dumping as well. This allows me
to panic a system from DDB when INVARIANTS is compiled into the
kernel on a scsi system.
2005-07-30 05:54:30 +00:00
Sam Leffler
ab8ab90c5b add m_align, a function to align any type of mbuf (i.e. it
is a superset of M_ALIGN and MH_ALIGN)

Reviewed by:	several
2005-07-30 01:32:16 +00:00
R. Imura
080e3a63b3 Change API of mb_copy_t in libmchain so that netsmb can handle
multibyte character share name correctly.

Reviewed by:	bp
2005-07-29 13:22:37 +00:00
George V. Neville-Neil
0d52d7b01a Fix for PR 83885.
Make sure that there actually is a next packet before setting
nextrecord to that field.

PR: 83885
Submitted by: hirose@comm.yamaha.co.jp
Obtained from:	Patch suggested in the PR
MFC after: 1 week
2005-07-28 10:10:01 +00:00
Pawel Jakub Dawidek
73864adbd4 Fix the way how "InUse" column in 'vmstat -m' output works:
- increase number of allocations count only on successfull malloc(9),
  so it doesn't confuse people;
- because we need to check if 'size > 0', hide 'mtsp->mts_memalloced += size;'
  under the check as well, as for size=0 it is of course a no-op;
- avoid critical_enter()/critical_exit() in case of failure in
  malloc_type_allocated() as there will be nothing to do.

OK'ed by:	rwatson
MFC after:	2 days
2005-07-27 23:17:31 +00:00
Xin LI
05a6b7ad62 Cast to uintptr_t when the compiler complains. This unbreaks ULE
scheduler breakage accompanied by the recent atomic_ptr() change.
2005-07-25 10:21:49 +00:00
Alan Cox
ec9c9e7363 Eliminate inconsistency in the setting of the B_DONE flag. Specifically,
make the b_iodone callback responsible for setting it if it is needed.
Previously, it was set unconditionally by bufdone() without holding
whichever lock is shared by the b_iodone callback and the corresponding
top-half function.  Consequently, in a race, the top-half function could
conclude that operation was done before the b_iodone callback finished.
See, for example, aio_physwakeup() and aio_fphysio().

Note: I don't believe that the other, more widely-used b_iodone callbacks
are affected.

Discussed with: jeff
Reviewed by: phk
MFC after: 2 weeks
2005-07-20 19:06:06 +00:00
Jeff Roberson
39b2406838 - Allow vnlru to drop giant if the filesystem does not require it. The
vnlru proc is extremely inefficient, potentially iteration over tens of
   thousands of vnodes without blocking.  Droping Giant allows other threads
   to preempt us although we should revisit the algorithm to fix the runtime
   problems especially since this may hold up all vnode allocations.
 - Remove the LK_NOWAIT from the VOP_LOCK in vlrureclaim.  This provides
   a natural blocking point to help alleviate the situation described above
   although it may not technically be desirable.
 - yield after we make a pass on all mount points to prevent us from
   blocking other threads which require Giant.

MFC after:	2 weeks
2005-07-20 01:43:27 +00:00
John Baldwin
ddf9c4f771 - Slightly reorder the events around the setting of PRS_ZOMBIE to be less
hokie and much more readable and expand the comment to explain why it is
  the way that it is.
- Close a race where one CPU could free the process belonging to a thread
  on another CPU that hasn't quite finished exiting yet but is beyond the
  point of setting the process state as PRS_ZOMBIE.

Reported and tested by:	ps (2)
MFC after:	3 days
2005-07-18 20:08:14 +00:00
Robert Watson
68352adfe7 Define four constants, MBUF_{,MEM,CLUSTER,PACKET,TAG}_MEM_NAME, which
are string names for their respective UMA zones and malloc types, and
are passed into uma_zcreate() and MALLOC_DEFINE().  Export them
outside of _KERNEL in mbuf.h so that netstat can reference them.

Change the names to improve consistency, with each zone/type
associated with the mbuf allocator being prefixed mbuf_.

MFC after:	1 week
2005-07-17 14:04:03 +00:00
John Baldwin
122eceef61 Convert the atomic_ptr() operations over to operating on uintptr_t
variables rather than void * variables.  This makes it easier and simpler
to get asm constraints and volatile keywords correct.

MFC after:	3 days
Tested on:	i386, alpha, sparc64
Compiled on:	ia64, powerpc, amd64
Kernel toolchain busted on:	arm
2005-07-15 18:17:59 +00:00
Robert Watson
4f8721d2a9 Correct build on 64-bit: cast u_int64_t to (unsigned long long) before
printfing as (unsigned long long).  32-bit build on i386 didn't notice
this.  Whoops.

Reported by:	arved
Tested by:	sledge
2005-07-14 15:21:18 +00:00
Robert Watson
cd814b2692 Introduce a new sysctl, kern.malloc_stats, which exports kernel malloc
statistics via a binary structure stream:

- Add structure 'malloc_type_stream_header', which defines a stream
  version, definition of MAXCPUS used in the stream, and a number of
  malloc_type records in the stream.

- Add structure 'malloc_type_header', which defines the name of the
  malloc type being reported on.

- When the sysctl is queried, return a stream header, followed by a
  series of type descriptions, each consisting of a type header
  followed by a series of MAXCPUS malloc_type_stats structures holding
  per-CPU allocation information.  Typical values of MAXCPUS will be 1
  (UP compiled kernel) and 16 (SMP compiled kernel).

This query mechanism allows user space monitoring tools to extract
memory allocation statistics in a machine-readable form, and to do so
at a per-CPU granularity, allowing monitoring of allocation patterns
across CPUs in order to better understand the distribution of work and
memory flow over multiple CPUs.

While here:

- Bump statistics width to uint64_t, and hard code using fixed-width
  type in order to be more sure about structure layout in the stream.
  We allocate and free a lot of memory.

- Add kmemcount, a counter of the number of registered malloc types,
  in order to avoid excessive manual counting of types.  Export via a
  new sysctl to allow user-space code to better size buffers.

- De-XXX comment on no longer maintaining the high watermark in old
  sysctl monitoring code.

A follow-up commit of libmemstat(3), a library to monitor kernel memory
allocation, will occur in the next few days.  Likewise, similar changes
to UMA.
2005-07-14 11:52:06 +00:00
Robert Watson
49bb6870cc Bump the module versions of the MAC Framework and MAC policy modules
from 2 (6.x) to 3 (7.x) to allow for future changes in the MAC policy
module ABI in 7.x.

Obtained from:	TrustedBSD Project
2005-07-14 10:46:03 +00:00
Robert Watson
d26dd2d99e When devfs cloning takes place, provide access to the credential of the
process that caused the clone event to take place for the device driver
creating the device.  This allows cloned device drivers to adapt the
device node based on security aspects of the process, such as the uid,
gid, and MAC label.

- Add a cred reference to struct cdev, so that when a device node is
  instantiated as a vnode, the cloning credential can be exposed to
  MAC.

- Add make_dev_cred(), a version of make_dev() that additionally
  accepts the credential to stick in the struct cdev.  Implement it and
  make_dev() in terms of a back-end make_dev_credv().

- Add a new event handler, dev_clone_cred, which can be registered to
  receive the credential instead of dev_clone, if desired.

- Modify the MAC entry point mac_create_devfs_device() to accept an
  optional credential pointer (may be NULL), so that MAC policies can
  inspect and act on the label or other elements of the credential
  when initializing the skeleton device protections.

- Modify tty_pty.c to register clone_dev_cred and invoke make_dev_cred(),
  so that the pty clone credential is exposed to the MAC Framework.

While currently primarily focussed on MAC policies, this change is also
a prerequisite for changes to allow ptys to be instantiated with the UID
of the process looking up the pty.  This requires further changes to the
pty driver -- in particular, to immediately recycle pty nodes on last
close so that the credential-related state can be recreated on next
lookup.

Submitted by:	Andrew Reisse <andrew.reisse@sparta.com>
Obtained from:	TrustedBSD Project
Sponsored by:	SPAWAR, SPARTA
MFC after:	1 week
MFC note:	Merge to 6.x, but not 5.x for ABI reasons
2005-07-14 10:22:09 +00:00
John Baldwin
2c65cb82ad Add a 'sysent' target that depends on the various files built from
syscalls.master for the master list and the Alpha/OSF1 compat ABI to be
consistent with all the other compat ABIs where 'make sysent' already
works.

MFC after:	3 days
2005-07-13 20:50:17 +00:00
David Xu
740fd64d65 Validate if the value written into {FS,GS}.base is a canonical
address, writting non-canonical address can cause kernel a panic,
by restricting base values to 0..VM_MAXUSER_ADDRESS, ensuring
only canonical values get written to the registers.

Reviewed by: peter, Josepha Koshy < joseph.koshy at gmail dot com >
Approved by: re (scottl)
2005-07-10 23:31:11 +00:00
John Baldwin
522ccb2381 Regen.
Approved by:	re (scottl)
2005-07-08 15:06:58 +00:00
John Baldwin
4acd2e73e5 Mark second instance of lchown() MP safe just like the first.
Approved by:	re (scottl)
2005-07-08 15:01:13 +00:00
John Baldwin
9f3157a254 Regenerate.
Approved by:	re (scottl)
2005-07-07 18:20:38 +00:00
John Baldwin
bcd9e0dd20 - Add two new system calls: preadv() and pwritev() which are like readv()
and writev() except that they take an additional offset argument and do
  not change the current file position.  In SAT speak:
  preadv:readv::pread:read and pwritev:writev::pwrite:write.
- Try to reduce code duplication some by merging most of the old
  kern_foov() and dofilefoo() functions into new dofilefoo() functions
  that are called by kern_foov() and kern_pfoov().  The non-v functions
  now all generate a simple uio on the stack from the passed in arguments
  and then call kern_foov().  For example, read() now just builds a uio and
  calls kern_readv() and pwrite() just builds a uio and calls kern_pwritev().

PR:		kern/80362
Submitted by:	Marc Olzheim marcolz at stack dot nl (1)
Approved by:	re (scottl)
MFC after:	1 week
2005-07-07 18:17:55 +00:00
Robert Watson
6758f88ea4 Add MAC Framework and MAC policy entry point mac_check_socket_create(),
which is invoked from socket() and socketpair(), permitting MAC
policy modules to control the creation of sockets by domain, type, and
protocol.

Obtained from:	TrustedBSD Project
Sponsored by:	SPARTA, SPAWAR
Approved by:	re (scottl)
Requested by:	SCC
2005-07-05 22:49:10 +00:00
Pawel Jakub Dawidek
c23c87bd93 Fix one "wrong b_bufobj" panic in reassignbuf() by moving VI_UNLOCK(vp)
below KASSERT()s, which means there was no real problem here, we just
needed better locking for assertions.

OK'ed by:	jeff
Approved by:	re (scottl)
2005-07-05 15:57:55 +00:00
Suleiman Souhlal
571dcd15e2 Fix the recent panics/LORs/hangs created by my kqueue commit by:
- Introducing the possibility of using locks different than mutexes
for the knlist locking. In order to do this, we add three arguments to
knlist_init() to specify the functions to use to lock, unlock and
check if the lock is owned. If these arguments are NULL, we assume
mtx_lock, mtx_unlock and mtx_owned, respectively.

- Using the vnode lock for the knlist locking, when doing kqueue operations
on a vnode. This way, we don't have to lock the vnode while holding a
mutex, in filt_vfsread.

Reviewed by:	jmg
Approved by:	re (scottl), scottl (mentor override)
Pointyhat to:	ssouhlal
Will be happy:	everyone
2005-07-01 16:28:32 +00:00
Joseph Koshy
151392465f MFP4:
- pmcstat(8) gprof output mode fixes:

  lib/libpmc/pmclog.{c,h}, sys/sys/pmclog.h:
  + Add a 'is_usermode' field to the PMCLOG_PCSAMPLE event
  + Add an 'entryaddr' field to the PMCLOG_PROCEXEC event,
    so that pmcstat(8) can determine where the runtime loader
    /libexec/ld-elf.so.1 is getting loaded.

  sys/kern/kern_exec.c:
  + Use a local struct to group the entry address of the image being
    exec()'ed and the process credential changed flag to the exec
    handling hook inside hwpmc(4).

  usr.sbin/pmcstat/*:
  + Support "-k kernelpath", "-D sampledir".
  + Implement the ELF bits of 'gmon.out' profile generation in a new
    file "pmcstat_log.c".  Move all log related functions to this
    file.
  + Move local definitions and prototypes to "pmcstat.h"

- Other bug fixes:
  + lib/libpmc/pmclog.c: correctly handle EOF in pmclog_read().
  + sys/dev/hwpmc_mod.c: unconditionally log a PROCEXIT event to all
    attached PMCs when a process exits.
  + sys/sys/pmc.h: correct a function prototype.
  + Improve usage checks in pmcstat(8).

Approved by:	re (blanket hwpmc)
2005-06-30 19:01:26 +00:00
Paul Saab
cff2e749e2 Use SCTL_MASK32 to determine that the sysctl call is from a 32bit
binary for kern.cp_time.

Approved by:	re
2005-06-30 17:17:29 +00:00
Peter Wemm
62919d788b Jumbo-commit to enhance 32 bit application support on 64 bit kernels.
This is good enough to be able to run a RELENG_4 gdb binary against
a RELENG_4 application, along with various other tools (eg: 4.x gcore).
We use this at work.

ia32_reg.[ch]: handle the 32 bit register file format, used by ptrace,
	procfs and core dumps.
procfs_*regs.c: vary the format of proc/XXX/*regs depending on the client
	and target application.
procfs_map.c: Don't print a 64 bit value to 32 bit consumers, or their
	sscanf fails.  They expect an unsigned long.
imgact_elf.c: produce a valid 32 bit coredump for 32 bit apps.
sys_process.c: handle 32 bit consumers debugging 32 bit targets.  Note
	that 64 bit consumers can still debug 32 bit targets.

IA64 has got stubs for ia32_reg.c.

Known limitations: a 5.x/6.x gdb uses get/setcontext(), which isn't
implemented in the 32/64 wrapper yet.  We also make a tiny patch to
gdb pacify it over conflicting formats of ld-elf.so.1.

Approved by:	re
2005-06-30 07:49:22 +00:00
Peter Wemm
48033188a6 Second part of commit for moving KDB_STOP_NMI from opt_global.h to
opt_kdb.h.

Found by:     kris
Approved by:  re
2005-06-30 03:38:10 +00:00
Peter Wemm
2de92a386e Conditionally weaken sys_generic.c rev 1.136 to allow certain dubious
ioctl numbers in backwards compatability mode.  eg: an IOC_IN ioctl with
a size of zero.  Traditionally this was what you did before IOC_VOID
existed, and we had some established users of this in the tree, namely
procfs.  Certain 3rd party drivers with binary userland components also
have this too.

This is necessary to have 4.x and 5.x binaries use these ioctl's.  We
found this at work when trying to run 4.x binaries.

Approved by:	re
2005-06-30 00:19:08 +00:00
Peter Wemm
f0c6706de9 Move the KDB_STOP_NMI option from opt_global.h to opt_kdb.h
Approved by:	re
2005-06-29 23:23:16 +00:00
Mike Silbersack
a7b844d2be Fix the false memory modified after free messages some users have been
reporting - in my previous change, I missed the case where a mbuf
from the packet zone was freed back to the mbuf/packet keg, where
it was subsequently put into the mbuf zone and found not to contain
the expected trash.  This change adds the necessary trash_dtor call inside
mb_fini_pack so that everything is correct.

Thanks for Bosko for finding the bug and showing me how secondary zones
work.

Approved by:	re (dwhite)
2005-06-29 08:18:26 +00:00
Dima Dorfman
1ee6b74603 Fix fdcheckstd to pass the file descriptor along through vn_open. When
opening a device, devfs_open needs the file descriptor to install its
own fileops. Failing to pass the file descriptor causes the vnode to
be returned with the regular vnops, which will cause a panic on the
first read or write because devfs_specops is not meant to support
those operations.

This bug caused a panic after exec'ing any set[ug]id program with
fds 0..2 closed (i.e., if any action had to be taken by fdcheckstd, we
would panic if the exec'd program ever tried to use any of those
descriptors).

Reviewed by:	phk
Approved by:	re (scottl)
2005-06-25 03:34:49 +00:00
Pawel Jakub Dawidek
400a74bff8 Close another information leak in ktrace(2): one was able to find active
process groups outside a jail, etc. by using ktrace(2).

OK'ed by:	rwatson
Approved by:	re (scottl)
MFC after:	1 week
2005-06-24 12:05:24 +00:00
Peter Wemm
4da0d332f4 Move HWPMC_HOOKS into its own opt_hwpmc_hooks.h file. It doesn't merit
being in opt_global.h and forcing a global recompile when only a few files
reference it.

Approved by:  re
2005-06-24 00:16:57 +00:00
Pawel Jakub Dawidek
06a137780b Actually only protect mount-point if security.jail.enforce_statfs is set to 2.
If we don't return statistics about requested file systems, system tools
may not work correctly or at all.

Approved by:	re (scottl)
2005-06-23 22:13:29 +00:00
John Baldwin
57dbcb11db Fix a typo in a comment.
Approved by:	re (scottl)
2005-06-23 21:55:43 +00:00
Mike Silbersack
121f050976 Change the mbuf, mbuf cluster, and mbuf packet allocation routines so that
the UMA "trash" allocator is used - this ensures that any writes to a freed
mbuf should provoke a panic.

Only enabled under INVARIANTS, of course.

Approved by:	re (scottl)
2005-06-23 04:33:39 +00:00
Pawel Jakub Dawidek
b0d9aedd28 Add missing unlock.
Pointy hat to:	pjd
Approved by:	re (dwhite)
2005-06-21 21:17:02 +00:00
John Baldwin
943928c905 Simplify the storming logic and remove a variable as a result.
Approved by:	re (dwhite)
2005-06-20 19:32:23 +00:00
Garance A Drosehn
bd3aace7e4 Fix a panic which could occur parsing #!-lines in a shell-script. If the
#!-line had multiple whitespace characters after the interpreter name, and
it did not have any options, then the code would do nasty things trying to
process a (non-existent) option-string which "ended before it began"...

Submitted by:	Morten Johansen
Approved by:	re (dwhite)
2005-06-19 02:21:03 +00:00