Commit Graph

8632 Commits

Author SHA1 Message Date
kan
9590889861 Do not drop the vnode interlock if vdropl is called on already doomed vnode.
vdropl callers expect it to return with interlock still being held.

MFC after:	2 days
2005-08-10 11:46:03 +00:00
rwatson
9025d9a2b8 Add an order between UDP inpcb locks and the IPv4 multicast address
list lock, as there has been a report that an alternative lock order
is getting introduced.  This should help ferret it out.

Reported by:	Ed Maste <emaste at phaedrus dot sandvine dot ca>
2005-08-09 13:27:50 +00:00
rwatson
5d770a09e8 Propagate rename of IFF_OACTIVE and IFF_RUNNING to IFF_DRV_OACTIVE and
IFF_DRV_RUNNING, as well as the move from ifnet.if_flags to
ifnet.if_drv_flags.  Device drivers are now responsible for
synchronizing access to these flags, as they are in if_drv_flags.  This
helps prevent races between the network stack and device driver in
maintaining the interface flags field.

Many __FreeBSD__ and __FreeBSD_version checks maintained and continued;
some less so.

Reviewed by:	pjd, bz
MFC after:	7 days
2005-08-09 10:20:02 +00:00
csjp
cbebbd7704 Drop in a WITNESS_WARN into SYSCTL_IN to make sure that we are
not holding any non-sleep-able-locks locks when copyin is called.
This gets executed un-conditionally since we have no function
to wire the buffer in this direction.

Pointed out by:	truckman
MFC after:	1 week
2005-08-08 21:06:42 +00:00
rwatson
daa1c89f45 Merge the dev_clone and dev_clone_cred event handlers into a single
event handler, dev_clone, which accepts a credential argument.
Implementors of the event can ignore it if they're not interested,
and most do.  This avoids having multiple event handler types and
fall-back/precedence logic in devfs.

This changes the kernel API for /dev cloning, and may affect third
party packages containg cloning kernel modules.

Requested by:	phk
MFC after:	3 days
2005-08-08 19:55:32 +00:00
csjp
b42a694c8b Check to see if we wired the user-supplied buffers in SYSCTL_OUT, if
the buffer has not been wired and we are holding any non-sleep-able locks,
drop a witness warning. If the buffer has not been wired, it is possible
that the writing of the data can sleep, especially if the page is not in
memory. This can result in a number of different locking issues, including
dead locks.

MFC after:	1 week
Discussed with:	rwatson
Reviewed by:	jhb
2005-08-08 18:54:35 +00:00
davidxu
fee0e762f4 Try best to keep a preempted thread at front of run queue, this seems
improved performance a bit for some workloads, but still seeing interactive
lagging unless cpu idling race is fixed.
2005-08-08 14:20:10 +00:00
grehan
449d4474b4 Export a routine, kobj_machdep_init(), that allows platforms
to use the kobj subsystem as soon at mutex_init() has been called
instead of having to wait for the SI_SUB_LOCK sysinit.

Reviewed by:	dfr
2005-08-07 02:20:35 +00:00
csjp
8d2abe7f00 Change the data type of the upper shared memory limits from a signed
integer to an unsigned long. This lifts variables like the maximum
number of pages available for shared memory from 2^31 to 2^32 on 32
bit architectures, and from 2^31 to 2^64 on 64 bit architectures.

It should be noted that this changes breaks ABI on 64 bit architectures
because the size of the shmmax, shmmin, shmmni, shmseg and shmall members
of the shminfo structure has changed.

Silence on:	current@
2005-08-06 07:20:18 +00:00
ssouhlal
1f4d3e95ef Holding a vnode doesn't prevent v_mount from disappearing (when the
vnode is inactivated), possibly leading to a NULL dereference when
checking if the mount wants knotes to be activated in the VOP hooks.
So, we add a new vnode flag VV_NOKNOTE that is only set in getnewvnode(),
if necessary, and check it when activating knotes.
Since the flags are not erased when a vnode is being held, we can safely
read them.

Reviewed by:	kris@
MFC after:	3 days
2005-08-06 01:42:04 +00:00
rwatson
7504160c1e Introduce in_multi_mtx, which will protect IPv4-layer multicast address
lists, as well as accessor macros.  For now, this is a recursive mutex
due code sequences where IPv4 multicast calls into IGMP calls into
ip_output(), which then tests for a multicast forwarding case.

For support macros in in_var.h to check multicast address lists, assert
that in_multi_mtx is held.

Acquire in_multi_mtx around iteration over the IPv4 multicast address
lists, such as in ip_input() and ip_output().

Acquire in_multi_mtx when manipulating the IPv4 layer multicast addresses,
as well as over the manipulation of ifnet multicast address lists in order
to keep the two layers in sync.

Lock down accesses to IPv4 multicast addresses in IGMP, or assert the
lock when performing IGMP join/leave events.

Eliminate spl's associated with IPv4 multicast addresses, portions of
IGMP that weren't previously expunged by IGMP locking.

Add in_multi_mtx, igmp_mtx, and if_addr_mtx lock order to hard-coded
lock order in WITNESS, in that order.

Problem reported by:	Ed Maste <emaste at phaedrus dot sandvine dot ca>
MFC after:		10 days
2005-08-03 19:29:47 +00:00
jeff
df3babd63b - Unlock before we call mac_destroy_vnode to prevent a lock order reversal.
Found by:	trhodes
2005-08-03 05:36:50 +00:00
jeff
8076188a59 - Use lockmgr_printinfo rather than rolling our own. This introduces a
slight problem by using printf instead of db_printf however
   'show lockedvnods' does the same so I believe it is ok for now.
2005-08-03 05:02:08 +00:00
jeff
a2347b7b51 - Fix a problem that slipped through review; the stack member of the lockmgr
structure should have the lk_ prefix.
 - Add stack_print(lkp->lk_stack) to the information printed with
   lockmgr_printinfo().
2005-08-03 04:59:07 +00:00
jeff
08fb635b55 - Replace the series of DEBUG_LOCKS hacks which tried to save the vn_lock
caller by saving the stack of the last locker/unlocker in lockmgr.  We
   also put the stack in KTR at the moment.

Contributed by:		Antoine Brodin <antoine.brodin@laposte.net>
2005-08-03 04:48:22 +00:00
jeff
4a761caec7 - Add support for saving stack traces and displaying them via printf(9)
and KTR.

Contributed by:		Antoine Brodin <antoine.brodin@laposte.net>
Concept code from:	Neal Fachan <neal@isilon.com>
2005-08-03 04:27:40 +00:00
davidxu
69017b27e9 In adjustrunqueue(), add code to handle thread migrating case for
ULE scheduler. In original code, local run queue of threaded ksegrp
is corrupted if adjustrunqueue() is called while thread is migrating.
2005-08-03 01:23:45 +00:00
ru
c7ff05fb06 Long overdue, keep up with mbuf.h,v 1.148. 2005-08-02 20:03:23 +00:00
kbyanc
7cdc25bbb9 Make getsockopt(..., SOL_SOCKET, SO_ACCEPTCONN, ...) work per IEEE Std
1003.1 (POSIX).
2005-08-01 21:15:09 +00:00
davidxu
811c17378d If a thread was removed from system run queue, kse_assign shouldn't
add it again.
2005-07-31 15:11:21 +00:00
netchild
63a34efe1a The resource_xxx routines in subr_hints.c are called before and after the
kenv environment in kern_environment.c switches to dynamic kenv. The prior
call sets the static variable hintp to the static hints in subr_hints.c
(hintmode==0).

However, changes to the environment are not detected by the resource_xxx
lookups after the change to dynamic kernel environment, so the lookup
routines only report the old stuff of hintmode==0, even after the change to
the dynamic kenv. This causes kenv users to see a different environment than
the kernel routines.

This is a problem in the mixer.c code that looks up initial mixer volume
settings from the hints: If the hints are dynamic and not from the
device.hints file, mixer.c doesn't see them, but kenv does.

The patch from the PR (modified to comply to the style of the function)
solves this.

PR:		83686
Submitted by:	Harry Coin <harrycoin@qconline.com>
2005-07-31 10:46:55 +00:00
netchild
5d0bbfe29c Add bounds checking to the setenv part of the kernel environment.
This has no security implications since only root is allowed to use
kenv(1) (and corrupt the kernel memory after adding too much variables
previous to this commit).

This is based upon the PR [1] mentioned below, but extended to check both
bounds (in case of an overflow of the counting variable) and to comply
to the style of the function. An overflow of the counting variable
shouldn't happen after adding the check for the upper bound, but better
safe than sorry (in case some other function in the kernel overwrites
random memory).

An interested soul may want to add a printf to notify root in case the
bounds are hit.

Also allocate KENV_SIZE+1 entries (the array is NULL-terminated), since
the comment for KENV_SIZE says it's the maximum number of environment
strings. [2]

PR:		83687 [1]
Submitted by:	Harry Coin <harrycoin@qconline.com> [1]
Submitted by:	Ariff Abdullah <skywizard@MyBSD.org.my> [2]
2005-07-31 10:28:35 +00:00
jkoshy
05d42812ac Fail the module loading process if the currently executing kernel
was not compiled with 'options HWPMC_HOOKS' or if the compiled-in
version numbers of the kernel and module are out of sync.

Reported by:	cracauer
MFC after:	3 days
2005-07-30 09:02:42 +00:00
ps
b8edc2b301 Ignore mutex asserts when we're dumping as well. This allows me
to panic a system from DDB when INVARIANTS is compiled into the
kernel on a scsi system.
2005-07-30 05:54:30 +00:00
sam
9e1f21a348 add m_align, a function to align any type of mbuf (i.e. it
is a superset of M_ALIGN and MH_ALIGN)

Reviewed by:	several
2005-07-30 01:32:16 +00:00
imura
3c148b71eb Change API of mb_copy_t in libmchain so that netsmb can handle
multibyte character share name correctly.

Reviewed by:	bp
2005-07-29 13:22:37 +00:00
gnn
4504989c4b Fix for PR 83885.
Make sure that there actually is a next packet before setting
nextrecord to that field.

PR: 83885
Submitted by: hirose@comm.yamaha.co.jp
Obtained from:	Patch suggested in the PR
MFC after: 1 week
2005-07-28 10:10:01 +00:00
pjd
9c0918e899 Fix the way how "InUse" column in 'vmstat -m' output works:
- increase number of allocations count only on successfull malloc(9),
  so it doesn't confuse people;
- because we need to check if 'size > 0', hide 'mtsp->mts_memalloced += size;'
  under the check as well, as for size=0 it is of course a no-op;
- avoid critical_enter()/critical_exit() in case of failure in
  malloc_type_allocated() as there will be nothing to do.

OK'ed by:	rwatson
MFC after:	2 days
2005-07-27 23:17:31 +00:00
delphij
049e7675aa Cast to uintptr_t when the compiler complains. This unbreaks ULE
scheduler breakage accompanied by the recent atomic_ptr() change.
2005-07-25 10:21:49 +00:00
alc
38bf328ab8 Eliminate inconsistency in the setting of the B_DONE flag. Specifically,
make the b_iodone callback responsible for setting it if it is needed.
Previously, it was set unconditionally by bufdone() without holding
whichever lock is shared by the b_iodone callback and the corresponding
top-half function.  Consequently, in a race, the top-half function could
conclude that operation was done before the b_iodone callback finished.
See, for example, aio_physwakeup() and aio_fphysio().

Note: I don't believe that the other, more widely-used b_iodone callbacks
are affected.

Discussed with: jeff
Reviewed by: phk
MFC after: 2 weeks
2005-07-20 19:06:06 +00:00
jeff
1b2743636c - Allow vnlru to drop giant if the filesystem does not require it. The
vnlru proc is extremely inefficient, potentially iteration over tens of
   thousands of vnodes without blocking.  Droping Giant allows other threads
   to preempt us although we should revisit the algorithm to fix the runtime
   problems especially since this may hold up all vnode allocations.
 - Remove the LK_NOWAIT from the VOP_LOCK in vlrureclaim.  This provides
   a natural blocking point to help alleviate the situation described above
   although it may not technically be desirable.
 - yield after we make a pass on all mount points to prevent us from
   blocking other threads which require Giant.

MFC after:	2 weeks
2005-07-20 01:43:27 +00:00
jhb
77e96aefca - Slightly reorder the events around the setting of PRS_ZOMBIE to be less
hokie and much more readable and expand the comment to explain why it is
  the way that it is.
- Close a race where one CPU could free the process belonging to a thread
  on another CPU that hasn't quite finished exiting yet but is beyond the
  point of setting the process state as PRS_ZOMBIE.

Reported and tested by:	ps (2)
MFC after:	3 days
2005-07-18 20:08:14 +00:00
rwatson
38dffb25ed Define four constants, MBUF_{,MEM,CLUSTER,PACKET,TAG}_MEM_NAME, which
are string names for their respective UMA zones and malloc types, and
are passed into uma_zcreate() and MALLOC_DEFINE().  Export them
outside of _KERNEL in mbuf.h so that netstat can reference them.

Change the names to improve consistency, with each zone/type
associated with the mbuf allocator being prefixed mbuf_.

MFC after:	1 week
2005-07-17 14:04:03 +00:00
jhb
c7383aebd6 Convert the atomic_ptr() operations over to operating on uintptr_t
variables rather than void * variables.  This makes it easier and simpler
to get asm constraints and volatile keywords correct.

MFC after:	3 days
Tested on:	i386, alpha, sparc64
Compiled on:	ia64, powerpc, amd64
Kernel toolchain busted on:	arm
2005-07-15 18:17:59 +00:00
rwatson
83059d9b07 Correct build on 64-bit: cast u_int64_t to (unsigned long long) before
printfing as (unsigned long long).  32-bit build on i386 didn't notice
this.  Whoops.

Reported by:	arved
Tested by:	sledge
2005-07-14 15:21:18 +00:00
rwatson
7a35a61825 Introduce a new sysctl, kern.malloc_stats, which exports kernel malloc
statistics via a binary structure stream:

- Add structure 'malloc_type_stream_header', which defines a stream
  version, definition of MAXCPUS used in the stream, and a number of
  malloc_type records in the stream.

- Add structure 'malloc_type_header', which defines the name of the
  malloc type being reported on.

- When the sysctl is queried, return a stream header, followed by a
  series of type descriptions, each consisting of a type header
  followed by a series of MAXCPUS malloc_type_stats structures holding
  per-CPU allocation information.  Typical values of MAXCPUS will be 1
  (UP compiled kernel) and 16 (SMP compiled kernel).

This query mechanism allows user space monitoring tools to extract
memory allocation statistics in a machine-readable form, and to do so
at a per-CPU granularity, allowing monitoring of allocation patterns
across CPUs in order to better understand the distribution of work and
memory flow over multiple CPUs.

While here:

- Bump statistics width to uint64_t, and hard code using fixed-width
  type in order to be more sure about structure layout in the stream.
  We allocate and free a lot of memory.

- Add kmemcount, a counter of the number of registered malloc types,
  in order to avoid excessive manual counting of types.  Export via a
  new sysctl to allow user-space code to better size buffers.

- De-XXX comment on no longer maintaining the high watermark in old
  sysctl monitoring code.

A follow-up commit of libmemstat(3), a library to monitor kernel memory
allocation, will occur in the next few days.  Likewise, similar changes
to UMA.
2005-07-14 11:52:06 +00:00
rwatson
0889b69634 Bump the module versions of the MAC Framework and MAC policy modules
from 2 (6.x) to 3 (7.x) to allow for future changes in the MAC policy
module ABI in 7.x.

Obtained from:	TrustedBSD Project
2005-07-14 10:46:03 +00:00
rwatson
79690d711b When devfs cloning takes place, provide access to the credential of the
process that caused the clone event to take place for the device driver
creating the device.  This allows cloned device drivers to adapt the
device node based on security aspects of the process, such as the uid,
gid, and MAC label.

- Add a cred reference to struct cdev, so that when a device node is
  instantiated as a vnode, the cloning credential can be exposed to
  MAC.

- Add make_dev_cred(), a version of make_dev() that additionally
  accepts the credential to stick in the struct cdev.  Implement it and
  make_dev() in terms of a back-end make_dev_credv().

- Add a new event handler, dev_clone_cred, which can be registered to
  receive the credential instead of dev_clone, if desired.

- Modify the MAC entry point mac_create_devfs_device() to accept an
  optional credential pointer (may be NULL), so that MAC policies can
  inspect and act on the label or other elements of the credential
  when initializing the skeleton device protections.

- Modify tty_pty.c to register clone_dev_cred and invoke make_dev_cred(),
  so that the pty clone credential is exposed to the MAC Framework.

While currently primarily focussed on MAC policies, this change is also
a prerequisite for changes to allow ptys to be instantiated with the UID
of the process looking up the pty.  This requires further changes to the
pty driver -- in particular, to immediately recycle pty nodes on last
close so that the credential-related state can be recreated on next
lookup.

Submitted by:	Andrew Reisse <andrew.reisse@sparta.com>
Obtained from:	TrustedBSD Project
Sponsored by:	SPAWAR, SPARTA
MFC after:	1 week
MFC note:	Merge to 6.x, but not 5.x for ABI reasons
2005-07-14 10:22:09 +00:00
jhb
33483dce73 Add a 'sysent' target that depends on the various files built from
syscalls.master for the master list and the Alpha/OSF1 compat ABI to be
consistent with all the other compat ABIs where 'make sysent' already
works.

MFC after:	3 days
2005-07-13 20:50:17 +00:00
davidxu
bc8b519d0f Validate if the value written into {FS,GS}.base is a canonical
address, writting non-canonical address can cause kernel a panic,
by restricting base values to 0..VM_MAXUSER_ADDRESS, ensuring
only canonical values get written to the registers.

Reviewed by: peter, Josepha Koshy < joseph.koshy at gmail dot com >
Approved by: re (scottl)
2005-07-10 23:31:11 +00:00
jhb
5cc6248fab Regen.
Approved by:	re (scottl)
2005-07-08 15:06:58 +00:00
jhb
877171db0e Mark second instance of lchown() MP safe just like the first.
Approved by:	re (scottl)
2005-07-08 15:01:13 +00:00
jhb
f962c71215 Regenerate.
Approved by:	re (scottl)
2005-07-07 18:20:38 +00:00
jhb
cf15cbb1b6 - Add two new system calls: preadv() and pwritev() which are like readv()
and writev() except that they take an additional offset argument and do
  not change the current file position.  In SAT speak:
  preadv:readv::pread:read and pwritev:writev::pwrite:write.
- Try to reduce code duplication some by merging most of the old
  kern_foov() and dofilefoo() functions into new dofilefoo() functions
  that are called by kern_foov() and kern_pfoov().  The non-v functions
  now all generate a simple uio on the stack from the passed in arguments
  and then call kern_foov().  For example, read() now just builds a uio and
  calls kern_readv() and pwrite() just builds a uio and calls kern_pwritev().

PR:		kern/80362
Submitted by:	Marc Olzheim marcolz at stack dot nl (1)
Approved by:	re (scottl)
MFC after:	1 week
2005-07-07 18:17:55 +00:00
rwatson
efcac3d02e Add MAC Framework and MAC policy entry point mac_check_socket_create(),
which is invoked from socket() and socketpair(), permitting MAC
policy modules to control the creation of sockets by domain, type, and
protocol.

Obtained from:	TrustedBSD Project
Sponsored by:	SPARTA, SPAWAR
Approved by:	re (scottl)
Requested by:	SCC
2005-07-05 22:49:10 +00:00
pjd
38bf7eadf9 Fix one "wrong b_bufobj" panic in reassignbuf() by moving VI_UNLOCK(vp)
below KASSERT()s, which means there was no real problem here, we just
needed better locking for assertions.

OK'ed by:	jeff
Approved by:	re (scottl)
2005-07-05 15:57:55 +00:00
ssouhlal
efe31cd3da Fix the recent panics/LORs/hangs created by my kqueue commit by:
- Introducing the possibility of using locks different than mutexes
for the knlist locking. In order to do this, we add three arguments to
knlist_init() to specify the functions to use to lock, unlock and
check if the lock is owned. If these arguments are NULL, we assume
mtx_lock, mtx_unlock and mtx_owned, respectively.

- Using the vnode lock for the knlist locking, when doing kqueue operations
on a vnode. This way, we don't have to lock the vnode while holding a
mutex, in filt_vfsread.

Reviewed by:	jmg
Approved by:	re (scottl), scottl (mentor override)
Pointyhat to:	ssouhlal
Will be happy:	everyone
2005-07-01 16:28:32 +00:00
jkoshy
3cade8d074 MFP4:
- pmcstat(8) gprof output mode fixes:

  lib/libpmc/pmclog.{c,h}, sys/sys/pmclog.h:
  + Add a 'is_usermode' field to the PMCLOG_PCSAMPLE event
  + Add an 'entryaddr' field to the PMCLOG_PROCEXEC event,
    so that pmcstat(8) can determine where the runtime loader
    /libexec/ld-elf.so.1 is getting loaded.

  sys/kern/kern_exec.c:
  + Use a local struct to group the entry address of the image being
    exec()'ed and the process credential changed flag to the exec
    handling hook inside hwpmc(4).

  usr.sbin/pmcstat/*:
  + Support "-k kernelpath", "-D sampledir".
  + Implement the ELF bits of 'gmon.out' profile generation in a new
    file "pmcstat_log.c".  Move all log related functions to this
    file.
  + Move local definitions and prototypes to "pmcstat.h"

- Other bug fixes:
  + lib/libpmc/pmclog.c: correctly handle EOF in pmclog_read().
  + sys/dev/hwpmc_mod.c: unconditionally log a PROCEXIT event to all
    attached PMCs when a process exits.
  + sys/sys/pmc.h: correct a function prototype.
  + Improve usage checks in pmcstat(8).

Approved by:	re (blanket hwpmc)
2005-06-30 19:01:26 +00:00
ps
22fde798f0 Use SCTL_MASK32 to determine that the sysctl call is from a 32bit
binary for kern.cp_time.

Approved by:	re
2005-06-30 17:17:29 +00:00
peter
921b3c5ee4 Jumbo-commit to enhance 32 bit application support on 64 bit kernels.
This is good enough to be able to run a RELENG_4 gdb binary against
a RELENG_4 application, along with various other tools (eg: 4.x gcore).
We use this at work.

ia32_reg.[ch]: handle the 32 bit register file format, used by ptrace,
	procfs and core dumps.
procfs_*regs.c: vary the format of proc/XXX/*regs depending on the client
	and target application.
procfs_map.c: Don't print a 64 bit value to 32 bit consumers, or their
	sscanf fails.  They expect an unsigned long.
imgact_elf.c: produce a valid 32 bit coredump for 32 bit apps.
sys_process.c: handle 32 bit consumers debugging 32 bit targets.  Note
	that 64 bit consumers can still debug 32 bit targets.

IA64 has got stubs for ia32_reg.c.

Known limitations: a 5.x/6.x gdb uses get/setcontext(), which isn't
implemented in the 32/64 wrapper yet.  We also make a tiny patch to
gdb pacify it over conflicting formats of ld-elf.so.1.

Approved by:	re
2005-06-30 07:49:22 +00:00