Commit Graph

10374 Commits

Author SHA1 Message Date
attilio
acc2f89a7f Really, no explicit checks against against lock_class_* object should be
done in consumers code: using locks properties is much more appropriate.
Fix current code doing these bogus checks.

Note: Really, callout are not usable by all !(LC_SPINLOCK | LC_SLEEPABLE)
primitives like rmlocks doesn't implement the generic lock layer
functions, but they can be equipped for this, so the check is still
valid.

Tested by: matteo, kris (earlier version)
Reviewed by: jhb
2008-02-06 00:04:09 +00:00
rwatson
f23198af5c Further clean up sorflush:
- Expose sbrelease_internal(), a variant of sbrelease() with no
  expectations about the validity of locks in the socket buffer.
- Use sbrelease_internel() in sorflush(), and as a result avoid intializing
  and destroying a socket buffer lock for the temporary stack copy of the
  actual buffer, asb.
- Add a comment indicating why we do what we do, and remove an XXX since
  things have gotten less ugly in sorflush() lately.

This makes socket close cleaner, and possibly also marginally faster.

MFC after:	3 weeks
2008-02-04 12:25:13 +00:00
phk
13132840a1 Give sendfile(2) a SF_SYNC flag which makes it wait until all mbufs
referencing the files VM pages are returned from the network stack,
making changes to the file safe.

This flag does not guarantee that the data has been transmitted to the
other end.
2008-02-03 15:54:41 +00:00
phk
df9c99b9c2 Give MEXTADD() another argument to make both void pointers to the
free function controlable, instead of passing the KVA of the buffer
storage as the first argument.

Fix all conventional users of the API to pass the KVA of the buffer
as the first argument, to make this a no-op commit.

Likely break the only non-convetional user of the API, after informing
the relevant committer.

Update the mbuf(9) manual page, which was already out of sync on
this point.

Bump __FreeBSD_version to 800016 as there is no way to tell how
many arguments a CPP macro needs any other way.

This paves the way for giving sendfile(9) a way to wait for the
passed storage to have been accessed before returning.

This does not affect the memory layout or size of mbufs.

Parental oversight by:	sam and rwatson.

No MFC is anticipated.
2008-02-01 19:36:27 +00:00
rwatson
95bb3acd1f Use FEATURE() macro to advertise aio availability. 2008-02-01 11:59:14 +00:00
rwatson
c57fa54759 Correct two problems relating to sorflush(), which is called to flush
read socket buffers in shutdown() and close():

- Call socantrcvmore() before sblock() to dislodge any threads that
  might be sleeping (potentially indefinitely) while holding sblock(),
  such as a thread blocked in recv().

- Flag the sblock() call as non-interruptible so that a signal
  delivered to the thread calling sorflush() doesn't cause sblock() to
  fail.  The sblock() is required to ensure that all other socket
  consumer threads have, in fact, left, and do not enter, the socket
  buffer until we're done flushin it.

To implement the latter, change the 'flags' argument to sblock() to
accept two flags, SBL_WAIT and SBL_NOINTR, rather than one M_WAITOK
flag.  When SBL_NOINTR is set, it forces a non-interruptible sx
acquisition, regardless of the setting of the disposition of SB_NOINTR
on the socket buffer; without this change it would be possible for
another thread to clear SB_NOINTR between when the socket buffer mutex
is released and sblock() is invoked.

Reviewed by:	bz, kmacy
Reported by:	Jos Backus <jos at catnook dot com>
2008-01-31 08:22:24 +00:00
ru
910410640b Add a wrapper function that bound checks writes to the dump device. 2008-01-28 19:04:07 +00:00
iwasaki
a9f086bbd3 Add devctl_process_running() so that power management system driver
can check whether devd(8) is running.

MFC after:	1 week
2008-01-27 16:06:37 +00:00
kib
82cf20c0b8 In rev. 1.156, the convertion of the minor number to the unit number
resulted in the argument to the make_dev() to be a unit number.

Correct this by supplying a minor number to make_dev(), and using
the unit number for the calculation of the slave tty name.

Reported and tested by:	Peter Holm
Reviewed by:	jhb
Yet another pointy hat to:	kib
MFC after:	1 day
2008-01-26 06:09:23 +00:00
jhb
dd3b84ba3a Fix a bug where a thread that hit the race where the sleep timeout fires
while the thread does not hold the thread lock would stop blocking for
subsequent interruptible sleeps and would always immediately fail the
sleep with EWOULDBLOCK instead (even sleeps that didn't have a timeout).

Some background:
- KSE has a facility for allowing one thread to interrupt another thread.
  During this process, the target thread aborts any interruptible sleeps
  much as if the target thread had a pending signal.  Once the target
  thread acknowledges the interrupt, normal sleep handling resumes.  KSE
  manages this via the TDF_INTERRUPTED flag.  Specifically, it sets the
  flag when it sends an interrupt to another thread and clears it when
  the interrupt is acknowledged.  (Note that this is purely a software
  interrupt sort of thing and has no relation to hardware interrupts
  or kernel interrupt threads.)
- The old code for handling the sleep timeout race handled the race
  by setting the TDF_INTERRUPT flag and faking a KSE-style thread
  interrupt to the thread in the process of going to sleep.  It probably
  should have just checked the TDF_TIMEOUT flag in sleepq_catch_signals()
  instead.
- The bug was that the sleepq code would set TDF_INTERRUPT but it was
  never cleared.  The sleepq code couldn't safely clear it in case there
  actually was a real KSE thread interrupt pending for the target thread
  (in fact, the sleepq timeout actually stomped on said pending interrupt).
  Thus, any future interruptible sleeps (*sleep(.. PCATCH ..) or
  cv_*wait_sig()) would see the TDF_INTERRUPT flag set and immediately
  fail with EWOULDBLOCK.  The flag could be cleared if the thread belonged
  to a KSE process and another thread posted an interrupt to the original
  thread.  However, in the more common case of a non-KSE process, the
  thread would pretty much stop sleeping.
- Fix the bug by just setting TDF_TIMEOUT in the sleepq timeout code and
  not messing with TDF_INTERRUPT and td_intrval.  With yesterday's fix to
  fix sleepq_switch() to check TDF_TIMEOUT, this is now sufficient.

MFC after:	3 days
2008-01-25 19:44:46 +00:00
jhb
5d22bdedcf Fix a race in the sleepqueue timeout code that resulted in sleeps not
being properly cancelled by a timeout.  In general there is a race
between a the sleepq timeout handler firing while the thread is still
in the process of going to sleep.  In 6.x with sched_lock, the race was
largely protected by sched_lock.  The only place it was "exposed" and had
to be handled was while checking for any pending signals in
sleepq_catch_signals().

With the thread lock changes, the thread lock is dropped in between
sleepq_add() and sleepq_*wait*() opening up a new window for this race.
Thus, if the timeout fired while the sleeping thread was in between
sleepq_add() and sleepq_*wait*(), the thread would be marked as timed
out, but the thread would not be dequeued and sleepq_switch() would
still block the thread until it was awakened via some other means.  In
the case of pause(9) where there is no other wakeup, the thread would
never be awakened.

Fix this by teaching sleepq_switch() to check if the thread has had its
sleep canceled before blocking by checking the TDF_TIMEOUT flag and
aborting the sleep and dequeueing the thread if it is set.

MFC after:	3 days
Reported by:	dwhite, peter
2008-01-25 02:09:38 +00:00
dumbbell
ba3df23cb8 When asked to use kqueue, AIO stores its internal state in the
`kn_sdata' member of the newly registered knote. The problem is that
this member is overwritten by a call to kevent(2) with the EV_ADD flag,
targetted at the same kevent/knote. For instance, a userland application
may set the pointer to NULL, leading to a panic.

A testcase was provided by the submitter.

PR:	kern/118911
Submitted by:	MOROHOSHI Akihiko <moro@remus.dti.ne.jp>
MFC after:	1 day
2008-01-24 17:10:19 +00:00
attilio
7213f4c32b Cleanup lockmgr interface and exported KPI:
- Remove the "thread" argument from the lockmgr() function as it is
  always curthread now
- Axe lockcount() function as it is no longer used
- Axe LOCKMGR_ASSERT() as it is bogus really and no currently used.
  Hopefully this will be soonly replaced by something suitable for it.
- Remove the prototype for dumplockinfo() as the function is no longer
  present

Addictionally:
- Introduce a KASSERT() in lockstatus() in order to let it accept only
  curthread or NULL as they should only be passed
- Do a little bit of style(9) cleanup on lockmgr.h

KPI results heavilly broken by this change, so manpages and
FreeBSD_version will be modified accordingly by further commits.

Tested by: matteo
2008-01-24 12:34:30 +00:00
bz
1c376286e0 Replace the last susers calls in netinet6/ with privilege checks.
Introduce a new privilege allowing to set certain IP header options
(hop-by-hop, routing headers).

Leave a few comments to be addressed later.

Reviewed by:	rwatson (older version, before addressing his comments)
2008-01-24 08:25:59 +00:00
jeff
be58be75dd - sched_prio() should only adjust tdq_lowpri if the thread is running or on
a run-queue.  If the priority is numerically raised only change lowpri
   if we're certain it will be correct.  Some slop is allowed however
   previously we could erroneously raise lowpri for an idle cpu that a
   thread had recently run on which lead to errors in load balancing
   decisions.
2008-01-23 03:10:18 +00:00
rwatson
0e6bbfc8e3 Regenerate. 2008-01-20 23:44:24 +00:00
rwatson
ff05f9dd9d Use audit events AUE_SHMOPEN and AUE_SHMUNLINK with new system calls
shm_open() and shm_unlink().  More auditing will need to be done for
these calls to capture arguments properly.
2008-01-20 23:43:06 +00:00
rwatson
ff397597d9 Export a type for POSIX SHM file descriptors via kern.proc.filedesc as
used by procstat, or SHM descriptors will show up as type unknown in
userspace.
2008-01-20 19:55:52 +00:00
attilio
caa2ca048b - Introduce the function lockmgr_recursed() which returns true if the
lockmgr lkp, when held in exclusive mode, is recursed
- Introduce the function BUF_RECURSED() which does the same for bufobj
  locks based on the top of lockmgr_recursed()
- Introduce the function BUF_ISLOCKED() which works like the counterpart
  VOP_ISLOCKED(9), showing the state of lockmgr linked with the bufobj

BUF_RECURSED() and BUF_ISLOCKED() entirely replace the usage of bogus
BUF_REFCNT() in a more explicative and SMP-compliant way.
This allows us to axe out BUF_REFCNT() and leaving the function
lockcount() totally unused in our stock kernel. Further commits will
axe lockcount() as well as part of lockmgr() cleanup.

KPI results, obviously, broken so further commits will update manpages
and freebsd version.

Tested by: kris (on UFS and NFS)
2008-01-19 17:36:23 +00:00
rwatson
dccd51b54f Move unlock of global UNIX domain socket lock slightly lower in
unp_connect(): it is expected to return with the lock held, and two
possible error paths otherwise returned with it unlocked.

The fix committed here is slightly different from the patch in the
PR, but along an alternative line suggested in the PR.

PR:		119778
MFC after:	3 days
Submitted by:	James Juran <james dot juran at baesystems dot com>
2008-01-18 19:16:03 +00:00
kib
3628ae460c In the rev. 1.153, the one place for converting minor number to unit
was missed. As result, pty_create_slave() may index out of the names[]
bounds, creating wrong slave tty names.

Tested by:	kensmith
Reviewed by:	jhb
MFC after:	3 days
2008-01-18 18:07:04 +00:00
julian
d6aa139aef refactor code so it can run in a chroot without having to have /dev/mounted
MFC After: 1 week
2008-01-18 17:02:14 +00:00
davidxu
ae86af8218 Make sure reading td_runtime in critical section since thread may be
preempted and td_runtime will be modified.
2008-01-18 13:00:28 +00:00
davidxu
80ec49a2cf Add POSIX clock id CLOCK_THREAD_CPUTIME_ID, this can be used to measure
per-thread runtime in user code.
2008-01-18 07:04:42 +00:00
jhb
7e32513a5b Add 'compat_freebsd[4567]' features corresponding to the kernel options
COMPAT_FREEBSD[4567].

MFC after:	1 week
Requested by:	kris
2008-01-17 22:46:32 +00:00
sam
e443f3b38c promote ath_defrag to m_collapse (and retire private+unused
m_collapse from cxgb)

Reviewed by:	pyun, jhb, kmacy
MFC after:	2 weeks
2008-01-17 21:25:09 +00:00
jhb
e3c7bebe5f Remove a conditional that is always true.
MFC after:	2 weeks
2008-01-17 20:15:15 +00:00
jhb
52da96d26e Add a set of regression tests for the POSIX shm API (shm_open(2) and
shm_unlink(2)).
2008-01-16 15:51:24 +00:00
njl
27739d97dc Remove duplicate cpufreq levels, i.e. ones that are within 25 Mhz of each
other.  The first one survives, the rest are removed.  So far, it appears
only some acpi_perf(4) BIOS tables have these invalid states, but address
this in the core to be sure to handle other potential driver data.

PR:		kern/114722
Tested by:	stefan.lambrev / moneybookers.com
MFC after:	3 days
2008-01-16 01:05:21 +00:00
jeff
7a1cea460a - When executing the 'tryself' branch in sched_pickcpu() look at the
lowest priority on the queue for the current cpu vs curthread's
   priority.  In the case that curthread is waking up many threads of a
   lower priority as would happen with a turnstile_broadcast() or wakeup()
   of many threads this prevents them from all ending up on the current cpu.
 - In sched_add() make the relationship between a scheduled ithread and
   the current cpu advisory rather than strict.  Only give the ithread
   affinity for the current cpu if it's actually being scheduled from
   a hardware interrupt.  This prevents it from migrating when it simply
   blocks on a lock.

Sponsored by:	Nokia
2008-01-15 09:03:09 +00:00
attilio
71b7824213 VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in
conjuction with 'thread' argument passing which is always curthread.
Remove the unuseful extra-argument and pass explicitly curthread to lower
layer functions, when necessary.

KPI results broken by this change, which should affect several ports, so
version bumping and manpage update will be further committed.

Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>
2008-01-13 14:44:15 +00:00
attilio
8032f44d5f lockmgr() function will return successfully when trying to work under
panic but it won't actually lock anything.
This can lead some paths to reach lockmgr_disown() with inconsistent
lock which will let trigger the relative assertions.

Fix those in order to recognize panic situation and to not trigger.

Reported by: pho
Submitted by: kib
2008-01-11 16:38:12 +00:00
rwatson
f261f9865b Don't zero td_runtime when billing thread CPU usage to the process;
maintain a separate td_incruntime to hold unbilled CPU usage for
the thread that has the previous properties of td_runtime.

When thread information is requested using the thread monitoring
sysctls, export thread td_runtime instead of process rusage runtime
in kinfo_proc.

This restores the display of individual ithread and other kernel
thread CPU usage since inception in ps -H and top -SH, as well for
libthr user threads, valuable debugging information lost with the
move to try kthreads since they are no longer independent processes.

There is universal agreement that we should rewrite the process and
thread export sysctls, but this commit gets things going a bit
better in the mean time.  Likewise, there are resevations about the
continued validity of statclock given the speed of modern processors.

Reviewed by:		attilio, emaste, jhb, julian
2008-01-10 22:11:20 +00:00
rwatson
5e4b4e0b32 Remove "lock pushdown" todo item in comment -- I did that for 7.0.
MFC after:	3 weeks
2008-01-10 12:38:17 +00:00
rwatson
ffaa005137 Correct typos in comments.
MFC after:	3 weeks
2008-01-10 12:29:12 +00:00
attilio
18d0a0dd51 vn_lock() is currently only used with the 'curthread' passed as argument.
Remove this argument and pass curthread directly to underlying
VOP_LOCK1() VFS method. This modify makes the code cleaner and in
particular remove an annoying dependence helping next lockmgr() cleanup.
KPI results, obviously, changed.

Manpage and FreeBSD_version will be updated through further commits.

As a side note, would be valuable to say that next commits will address
a similar cleanup about VFS methods, in particular vop_lock1 and
vop_unlock.

Tested by:	Diego Sardina <siarodx at gmail dot com>,
		Andrea Di Pasquale <whyx dot it at gmail dot com>
2008-01-10 01:10:58 +00:00
attilio
c3923db2c0 Fix a last second typo about recent lockmgr_disown() introduction. 2008-01-09 00:02:43 +00:00
attilio
9e182da191 Remove explicit calling of lockmgr() with the NULL argument.
Now, lockmgr() function can only be called passing curthread and the
KASSERT() is upgraded according with this.

In order to support on-the-fly owner switching, the new function
lockmgr_disown() has been introduced and gets used in BUF_KERNPROC().
KPI, so, results changed and FreeBSD version will be bumped soon.
Differently from previous code, we assume idle thread cannot try to
acquire the lockmgr as it cannot sleep, so loose the relative check[1]
in BUF_KERNPROC().

Tested by: kris

[1] kib asked for a KASSERT in the lockmgr_disown() about this
condition, but after thinking at it, as this is a well known general
rule, I found it not really necessary.
2008-01-08 23:48:31 +00:00
jhb
1975c09543 Regen for shm_open(2) and shm_unlink(2). 2008-01-08 22:01:26 +00:00
jhb
8cd9437636 Add a new file descriptor type for IPC shared memory objects and use it to
implement shm_open(2) and shm_unlink(2) in the kernel:
- Each shared memory file descriptor is associated with a swap-backed vm
  object which provides the backing store.  Each descriptor starts off with
  a size of zero, but the size can be altered via ftruncate(2).  The shared
  memory file descriptors also support fstat(2).  read(2), write(2),
  ioctl(2), select(2), poll(2), and kevent(2) are not supported on shared
  memory file descriptors.
- shm_open(2) and shm_unlink(2) are now implemented as system calls that
  manage shared memory file descriptors.  The virtual namespace that maps
  pathnames to shared memory file descriptors is implemented as a hash
  table where the hash key is generated via the 32-bit Fowler/Noll/Vo hash
  of the pathname.
- As an extension, the constant 'SHM_ANON' may be specified in place of the
  path argument to shm_open(2).  In this case, an unnamed shared memory
  file descriptor will be created similar to the IPC_PRIVATE key for
  shmget(2).  Note that the shared memory object can still be shared among
  processes by sharing the file descriptor via fork(2) or sendmsg(2), but
  it is unnamed.  This effectively serves to implement the getmemfd() idea
  bandied about the lists several times over the years.
- The backing store for shared memory file descriptors are garbage
  collected when they are not referenced by any open file descriptors or
  the shm_open(2) virtual namespace.

Submitted by:	dillon, peter (previous versions)
Submitted by:	rwatson (I based this on his version)
Reviewed by:	alc (suggested converting getmemfd() to shm_open())
2008-01-08 21:58:16 +00:00
jhb
c9a8a65b69 Close a race in the kern.ttys sysctl handler that resulted in panics in
dev2udev() when a tty was being detached concurrently with the sysctl
handler:
- Hold the 'tty_list_mutex' lock while we read all the fields out of the
  struct tty for copying out later.  Previously the pty(4) and pts(4)
  destroy routines could set t_dev to NULL, drop their reference on the
  tty and destroy the cdev while the sysctl handler was attempting to
  invoke dev2udev() on the cdev being destroyed.  This happened when the
  sysctl handler read the value of t_dev prior to it being set to NULL
  either due to it being stale or due to timing races.  By holding the
  list lock we guarantee that the destroy routines will block in ttyrel()
  in that case and not destroy the cdev until after we've copied all of our
  data.  We may see a NULL cdev pointer or we may see the previous value,
  but the previous value will no longer point to a destroyed cdev if we
  see it.
- Fix the ttyfree() routine used by tty device drivers in their detach
  methods to use ttyrel() on the tty so we don't leak them.  Also, fix it
  to use the same order of operations as pty/pts destruction (set t_dev
  NULL, ttyrel(), destroy_dev()) so it cooperates with the sysctl handler.

MFC after:	3 days
Tested by:	avatar
2008-01-08 04:53:28 +00:00
kris
0b7e3f244a Fix logic in skipcount handling (used to sample every 1/N lock operations
to reduce profiling overhead)
2008-01-08 01:11:40 +00:00
rwatson
d08bce9f30 Free MAC label on a POSIX semaphore when the semaphore is freed.
MFC after:	3 days
Submitted by:	jhb
2008-01-07 22:03:19 +00:00
jhb
f8a246b979 Make ftruncate a 'struct file' operation rather than a vnode operation.
This makes it possible to support ftruncate() on non-vnode file types in
the future.
- 'struct fileops' grows a 'fo_truncate' method to handle an ftruncate() on
  a given file descriptor.
- ftruncate() moves to kern/sys_generic.c and now just fetches a file
  object and invokes fo_truncate().
- The vnode-specific portions of ftruncate() move to vn_truncate() in
  vfs_vnops.c which implements fo_truncate() for vnode file types.
- Non-vnode file types return EINVAL in their fo_truncate() method.

Submitted by:	rwatson
2008-01-07 20:05:19 +00:00
bde
cf4bbeccc5 In sequential_heuristic():
- spell 16384 as 16384 and not as BKVASIZE.  16384 is (not quite) just a
  magic size that works well in practice.  BKVASIZE should be MAXBSIZE
  (65536), but is 16384 because i386's don't have enough kva for it to
  be MAXBSIZE; 16384 works (not so well) for it for much the same reasons
  that it works well in the heuristic.
- expand and/or add comments about this and other details.
- don't explicitly inline this function.
- fix some other style bugs.
2008-01-05 08:54:51 +00:00
peter
1e0f13faf7 Fall back to the binary-specified interpreter (ld-elf.so.1) if the
ABI override binary isn't found.  This could probably be smoother, but
it is what I did in p4 change #126891 on 2007/09/27.  It should solve
the "ld-elf32.so.1"-in-chroot problem.
2008-01-05 08:35:56 +00:00
jeff
ebfbe60329 - Restore timeslicing code for all bit SCHED_FIFO priority classes.
Reported by:	Peter Jeremy <peterjeremy@optushome.com.au>
2008-01-05 04:47:31 +00:00
bz
69daf9ac7f Add missing sb_sndptr* fields to db_print_sockbuf().
While here change %d to %u for u_ints.

Discussed with:	rwatson, kmacy
2008-01-03 15:19:31 +00:00
jeff
811f1d35ef - In sysctl_kern_file skip fdps with negative lastfiles. This can
happen if there are no files open.  Accounting for these can
   eventually return a negative value for olenp causing sysctl to
   crash with a bad malloc.

Reported by:	Pawel Worach <pawel.worach@gmail.com>
2008-01-03 01:26:59 +00:00
obrien
97f31c31f6 Note what is too {short,long}. 2008-01-02 18:48:27 +00:00
jhb
311f35fbe0 A few whitespace fixes. 2008-01-02 17:09:15 +00:00
jeff
d473542286 - Place the fhold() in unp_internalize_fp to be more consistent with refs.
- Clear all of the gc flags before doing a run.  Stale flags were causing
   us to skip some descriptors.
 - If a unp socket has been marked REF in a gc pass it can't be dead.

Found by:	rwatson's test tool.
2008-01-01 01:46:42 +00:00
rodrigc
9cea85864d In vfs_scanopt(), make sure that the mount option value is not NULL
before calling vsscanf().

PR:		118531
Submitted by:	Jaakko Heinonen <jh saunalahti fi>
MFC after:	3 days
2007-12-31 23:44:53 +00:00
jhb
f9f7dedb21 Actually declare the kern.features sysctl node.
Pointy hat to:	jhb
2007-12-31 22:03:57 +00:00
jeff
c8db393cdc - Pause a while after disabling lock profiling and before resetting it
to be sure that all participating CPUs have stopped updating it.
 - Restore the behavior of printing the name of the lock type in the output.
2007-12-31 03:45:51 +00:00
jeff
319a3672d9 - Check the correct variable against NULL in two places.
- If the unp_file is NULL that means it has never been internalized and it
   must be reachable.
2007-12-31 03:44:54 +00:00
imp
95c2febd5a Rather than not redirting the bp when we get ENXIO, only redirty it
when the error is EIO.  This catches a much larger class of errors
that are unlikely to succeed if retried.

Submitted by: bde
2007-12-30 05:53:45 +00:00
jeff
ce18638805 Remove explicit locking of struct file.
- Introduce a finit() which is used to initailize the fields of struct file
   in such a way that the ops vector is only valid after the data, type,
   and flags are valid.
 - Protect f_flag and f_count with atomic operations.
 - Remove the global list of all files and associated accounting.
 - Rewrite the unp garbage collection such that it no longer requires
   the global list of all files and instead uses a list of all unp sockets.
 - Mark sockets in the accept queue so we don't incorrectly gc them.

Tested by:	kris, pho
2007-12-30 01:42:15 +00:00
alc
4565fa1697 Add the superpage reservation system. This is "part 2 of 2" of the
machine-independent support for superpages.  (The earlier part was
the rewrite of the physical memory allocator.)  The remainder of the
code required for superpages support is machine-dependent and will
be added to the various pmap implementations at a later date.

Initially, I am only supporting one large page size per architecture.
Moreover, I am only enabling the reservation system on amd64.  (In
an emergency, it can be disabled by setting VM_NRESERVLEVELS to 0
in amd64/include/vmparam.h or your kernel configuration file.)
2007-12-29 19:53:04 +00:00
rwatson
166f16a0c6 In "show lockedvnods" DDB command, use db_printf() rather than printf()
so that the results end up in the DDB output stream rather than the
console output stream.

This should likely also be done for the vprint() function it calls.

MFC after:	3 months
2007-12-28 00:47:31 +00:00
attilio
c720b77ca9 Trimm out now unused option LK_EXCLUPGRADE from the lockmgr namespace.
This option just adds complexity and the new implementation no longer
will support it, so axing it now that it is unused is probabilly the
better idea.

FreeBSD version is bumped in order to reflect the KPI breakage introduced
by this patch.

In the ports tree, kris found that only old OSKit code uses it, but as
it is thought to work only on 2.x kernels serie, version bumping will
solve any problem.
2007-12-28 00:38:13 +00:00
attilio
bd42361fd1 In order to avoid a huge class of deadlocks (in particular in interactions
with the interlock), owner of the lock should be only curthread or at
least, for its limited usage, NULL which identifies LK_KERNPROC.

The thread "extra argument" for the lockmgr interface is going to be
removed in the near future, but for the moment, just let kernel run for
some days with this check on in order to find potential deadlocking
places around the kernel and fix them.
2007-12-27 22:56:57 +00:00
rwatson
010adfaab6 Return ESRCH when a kernel stack is queried on a process in execve() --
p_candebug() will return EAGAIN which, if the other process never
leaves execve(), will result in the sysctl spinning and never returning
to userspace.  Processes should always eventually leave execve(), but
spinning in kernel while we wait is bad for countless reasons, and
particularly harmful if execve() itself is deadlocked.

Possibly we should return another error, or return a marker indicating
the thread is in execve() so it can be reported that way in userspace.

Reported by:	kris
2007-12-27 22:44:01 +00:00
attilio
d9b244638e As LK_EXCLUPGRADE is used in conjuction with LK_NOWAIT, LK_UPGRADE becames
equivalent with this and so operate the switch.

That call is the only one remaining LK_EXCLUPGRADE consumer and removing
it will prepare the ground for LK_EXCLUPGRADE axing and further
lockmgr improvements.

Discussed with: jeff, ups
2007-12-27 20:52:05 +00:00
imp
430e46303c A partial solution to some of the 'pull the umass device with a
mounted FS' problems.  These are more along the lines of 'avoiding an
avoidable panic' than a complete solution to removable devices.  We
now close the barn door after the horse has gotten lose and has been
hit by a truck, as it were.  The barn no longer catches fire in this
case, but the horse is still dead :-).

The vfs_bio.c fix causes us not to put a failed write back into the
dirty pool if the error returned was ENXIO.  In that case, the buffer
is treated like any other clean buffer that's being retured.  ENXIO
means the device isn't there anymore and will never be there again in
the future, so retrying is futile.

The vfs_mount.c fix treats 'ENXIO' as success for unmounting a file
system.  If the device is gone, retrying later won't help and we'll
never be able to unmount the device.

These two are part of a larger patch set submitted by the author.  The
other patches will be forth coming.  I added comments to these two
patches.

Submitted by: Henrik Gulbrandsen
Reviewed by: phk@
PR: usb/46176 (partial)
2007-12-27 16:38:28 +00:00
rwatson
956e2983ba Add textdump(4) facility, which provides an alternative form of kernel
dump using mechanically generated/extracted debugging output rather than
a simple memory dump.  Current sources of debugging output are:

- DDB output capture buffer, if there is captured output to save
- Kernel message buffer
- Kernel configuration, if included in kernel
- Kernel version string
- Panic message

Textdumps are stored in swap/dump partitions as with regular dumps, but
are laid out as ustar files in order to allow multiple parts to be stored
as a stream of sequentially written blocks.  Blocks are written out in
reverse order, as the size of a textdump isn't known a priori.  As with
regular dumps, they will be extracted using savecore(8).

One new DDB(4) command is added, "textdump", which accepts "set",
"unset", and "status" arguments.  By default, normal kernel dumps are
generated unless "textdump set" is run in order to schedule a textdump.
It can be canceled using "textdump unset" to restore generation of a
normal kernel dump.

Several sysctls exist to configure aspects of textdumps;
debug.ddb.textdump.pending can be set to check whether a textdump is
pending, or set/unset in order to control whether the next kernel dump
will be a textdump from userspace.

While textdumps don't have to be generated as a result of a DDB script
run automatically as part of a kernel panic, this is a particular useful
way to use them, as instead of generating a complete memory dump, a
simple transcript of an automated DDB session can be captured using the
DDB output capture and textdump facilities.  This can be used to
generate quite brief kernel bug reports rich in debugging information
but not dependent on kernel symbol tables or precisely synchronized
source code.  Most textdumps I generate are less than 100k including
the full message buffer.  Using textdumps with an interactive debugging
session is also useful, with capture being enabled/disabled in order to
record some but not all of the DDB session.

MFC after:	3 months
2007-12-26 11:32:33 +00:00
wkoszek
2179e058f4 Rewrite kern.console handling in sbuf(9). My intention is to leave
kern.console format as is. Thus, no difference in output format should
appear after this commit.

Reviewed by:	cognet@ (mentor)
Approved by:	cognet@ (mentor)
2007-12-25 21:17:34 +00:00
rwatson
bdee30611d Add a new 'why' argument to kdb_enter(), and a set of constants to use
for that argument.  This will allow DDB to detect the broad category of
reason why the debugger has been entered, which it can use for the
purposes of deciding which DDB script to run.

Assign approximate why values to all current consumers of the
kdb_enter() interface.
2007-12-25 17:52:02 +00:00
julian
265714a11e give thread0 the tid 100000 and bumpt the others to start at 100001
MFC after:	1 week
2007-12-22 04:56:48 +00:00
wkoszek
7d4fbe3a46 Make SCHED_ULE buildable with gcc3.
Reviewed by:	cognet (mentor), jeffr
Approved by:	cognet (mentor), jeffr
2007-12-21 23:30:18 +00:00
imp
46b79678a8 When devclass_get_maxunit is passed a NULL, return -1 to indicate that
there's nothing allocated at all yet.
2007-12-19 22:05:07 +00:00
obrien
ee337f2c34 Be more exact with sigaction SA_SIGINFO handling.
Reviewed by:	marcel
2007-12-18 20:39:13 +00:00
kmacy
41d59439f8 Add SB_NOCOALESCE flag to disable socket buffer update in place 2007-12-17 10:02:01 +00:00
davidxu
496a3a2c52 Check NULL pointer. 2007-12-17 08:09:37 +00:00
davidxu
747bbe486b Add missing changes for fixing LOR of umtx lock and thread lock, follow
the committing of files:
	kern_resource.c revision 1.181
	sched_4bsd.c	revision 1.111
	sched_ule.c	revision 1.218
2007-12-17 05:55:07 +00:00
jeff
4ec9caf00c Refactor select to reduce contention and hide internal implementation
details from consumers.

 - Track individual selecters on a per-descriptor basis such that there
   are no longer collisions and after sleeping for events only those
   descriptors which triggered events must be rescaned.
 - Protect the selinfo (per descriptor) structure with a mtx pool mutex.
   mtx pool mutexes were chosen to preserve api compatibility with
   existing code which does nothing but bzero() to setup selinfo
   structures.
 - Use a per-thread wait channel rather than a global wait channel.
 - Hide select implementation details in a seltd structure which is
   opaque to the rest of the kernel.
 - Provide a 'selsocket' interface for those kernel consumers who wish to
   select on a socket when they have no fd so they no longer have to
   be aware of select implementation details.

Tested by:	kris
Reviewed on:	arch
2007-12-16 06:21:20 +00:00
rrs
edce837b8c - fix tab to space issue, hmm maybe I should use vi. 2007-12-15 23:14:53 +00:00
jeff
12adc443d6 - Re-implement lock profiling in such a way that it no longer breaks
the ABI when enabled.  There is no longer an embedded lock_profile_object
   in each lock.  Instead a list of lock_profile_objects is kept per-thread
   for each lock it may own.  The cnt_hold statistic is now always 0 to
   facilitate this.
 - Support shared locking by tracking individual lock instances and
   statistics in the per-thread per-instance lock_profile_object.
 - Make the lock profiling hash table a per-cpu singly linked list with a
   per-cpu static lock_prof allocator.  This removes the need for an array
   of spinlocks and reduces cache contention between cores.
 - Use a seperate hash for spinlocks and other locks so that only a
   critical_enter() is required and not a spinlock_enter() to modify the
   per-cpu tables.
 - Count time spent spinning in the lock statistics.
 - Remove the LOCK_PROFILE_SHARED option as it is always supported now.
 - Specifically drop and release the scheduler locks in both schedulers
   since we track owners now.

In collaboration with:	Kip Macy
Sponsored by:	Nokia
2007-12-15 23:13:31 +00:00
obrien
56e498fb3b style.Makefile(5) 2007-12-14 21:30:51 +00:00
davidxu
3695a78889 Fix LOR of thread lock and umtx's priority propagation mutex due
to the reworking of scheduler lock.

MFC: after 3 days
2007-12-11 08:25:36 +00:00
rwatson
cce7cfdaf5 Check for P_WEXIT before PHOLD() on a process in kstack and vm query
sysctls, as PHOLD() asserts !P_WEXIT.

Reported by:	Michael Plass <mfp49_freebsd at plass-family dot net>
2007-12-09 17:22:27 +00:00
jkoshy
72c27d71d8 Kernel and hwpmc(4) support for callchain capture.
Sponsored by:	FreeBSD Foundation and Google Inc.
2007-12-07 08:20:17 +00:00
jhb
0373a29045 Move several data structure definitions out of freebsd32_misc.c and into
freebsd32.h instead.

MFC after:	1 week
2007-12-06 23:11:27 +00:00
rrs
28889e11f4 - Puts default limits on 4k/9k and 16k zones for mbufs all based
on 1/2 of each of the successive limits tied to the limit for
  2k clusters.
- Adds real functionality in so that doing a sysctl to change these
  actually changes them :-)

MFC after:	1 week
2007-12-05 15:29:44 +00:00
kib
53229c8ee9 Use curthread instead of the FIRST_THREAD_IN_PROC for vnlru and syncer,
when applicable.

Aquire Giant slightly later for vnlru.

In the syncer, aquire the Giant only when a vnode belongs to the
non-MPsafe fs.

In both speedup_syncer() and syncer_shutdown(), remove the syncer thread from
the lbolt sleep queue after the syncer state is modified, not before.

Herded by:	attilio
Tested by:	Peter Holm
Reviewed by:	ups
MFC after:	1 week
2007-12-05 09:34:04 +00:00
rodrigc
ca18adc0b5 In nmount(), internally convert the mount option: "rdonly" to "ro".
This makes updates mounts such as:
 "mount -u -o rdonly" work more like, "mount -u -o ro".

References to "-o rdonly" were changed to "-o ro" in revision 1.60 of
the mount(8) man page,
but some people still like to use "-o rdonly" since it was documented
in earlier versions of FreeBSD.

Requested by:	rwatson
MFC after:	1 week
2007-12-05 03:26:14 +00:00
thompsa
36f039142a Apply a workaround for the unkillable jail problem where some devices created
within the jail are never freed. si_cred is only used by the MAC framework so
make the cred reference conditional on it being compiled in, this is not a fix
and will need to be reviewed for any new consumers of si_cred.

This will quell some user complaint when using jails with a default kernel.

Reviewed by:	rwatson
MFC after:	3 days
2007-12-05 01:22:03 +00:00
kib
feb2aba5b6 Implement fetching of the __FreeBSD_version from the ELF ABI-tag note.
The value is read into the p_osrel member of the struct proc. p_osrel
is set to 0 for the binaries without the note.

MFC after:	3 days
2007-12-04 12:28:07 +00:00
kib
dbef1afd93 Check for the program headers alignment of the ELF images before
dereferencing. Unaligned access could cause panic on strict alignment
architectures.

Reviewed by:	marcel, marius (also tested on sparc64, thanks !)
MFC after:	3 days
2007-12-04 12:21:27 +00:00
alc
200640eacf Introduce an UMA backend page allocator for the jumbo frame zones that
allocates physically contiguous memory.

MFC after: 3 months
Requested and reviewed by: Kip Macy
Tested by: Andrew Gallatin and Pyun YongHyeon
2007-12-04 07:06:08 +00:00
rwatson
0c4e2d79d0 When a symbol name can't be resolved, return "??" as the name, rather
than "Unknown func", in order to avoid putting spaces in what ideally
is a string separated by white space.
2007-12-03 14:44:35 +00:00
rwatson
e891688481 Add another new sysctl in support of the forthcoming procstat(1) to
support its -k argument:

kern.proc.kstack - dump the kernel stack of a process, if debugging
  is permitted.

This sysctl is present if either "options DDB" or "options STACK" is
compiled into the kernel.  Having support for tracing the kernel
stacks of processes from user space makes it much easier to debug
(or understand) specific wmesg's while avoiding the need to enter
DDB in order to determine the path by which a process came to be
blocked on a particular wait channel or lock.
2007-12-02 21:52:18 +00:00
rwatson
99285f7544 Break out stack(9) from ddb(4):
- Introduce per-architecture stack_machdep.c to hold stack_save(9).
- Introduce per-architecture machine/stack.h to capture any common
  definitions required between db_trace.c and stack_machdep.c.
- Add new kernel option "options STACK"; we will build in stack(9) if it is
  defined, or also if "options DDB" is defined to provide compatibility
  with existing users of stack(9).

Add new stack_save_td(9) function, which allows the capture of a stacktrace
of another thread rather than the current thread, which the existing
stack_save(9) was limited to.  It requires that the thread be neither
swapped out nor running, which is the responsibility of the consumer to
enforce.

Update stack(9) man page.

Build tested:	amd64, arm, i386, ia64, powerpc, sparc64, sun4v
Runtime tested:	amd64 (rwatson), arm (cognet), i386 (rwatson)
2007-12-02 20:40:35 +00:00
rwatson
c25458da37 Add two new sysctls in support of the forthcoming procstat(1) to support
its -f and -v arguments:

kern.proc.filedesc - dump file descriptor information for a process, if
  debugging is permitted, including socket addresses, open flags, file
  offsets, file paths, etc.

kern.proc.vmmap - dump virtual memory mapping information for a process,
  if debugging is permitted, including layout and information on
  underlying objects, such as the type of object and path.

These provide a superset of the information historically available
through the now-deprecated procfs(4), and are intended to be exported
in an ABI-robust form.
2007-12-02 10:10:27 +00:00
alc
ed1f1ac93c Eliminate vfs_page_set_valid()'s unused argument. 2007-12-02 01:28:35 +00:00
rwatson
090235e567 Modify stack(9) stack_print() and stack_sbuf_print() routines to use new
linker interfaces for looking up function names and offsets from
instruction pointers.  Create two variants of each call: one that is
"DDB-safe" and avoids locking in the linker, and one that is safe for
use in live kernels, by virtue of observing locking, and in particular
safe when kernel modules are being loaded and unloaded simultaneous to
their use.  This will allow them to be used outside of debugging
contexts.

Modify two of three current stack(9) consumers to use the DDB-safe
interfaces, as they run in low-level debugging contexts, such as inside
lockmgr(9) and the kernel memory allocator.

Update man page.
2007-12-01 22:04:16 +00:00
rwatson
0ab794a171 The kernel linker includes a number of utility functions to look up symbol
information in support of DDB(4); these functions bypass normal linker
locking as they may run in contexts where locking is unsafe (such as the
kernel debugger).

Add a new interface linker_ddb_search_symbol_name(), which looks up a
symbol name and offset given an address, and also
linker_search_symbol_name() which does the same but *does* follow the
locking conventions of the linker.

Unlike existing functions, these functions place the name in a
caller-provided buffer, which is stable even after linker locks have been
released.  These functions will be used in upcoming revisions to stack(9)
to support kernel stack trace generation in contexts as part of a live,
rather than suspended, kernel.
2007-12-01 19:24:28 +00:00
peter
e287ae6b7a Deal with the possibility of device_set_unit() being called when attaching
the associated devinfo sysctl tree.
2007-11-30 21:30:14 +00:00
peter
8959a77f75 Add sysctl_rename_oid() to support device_set_unit() usage. Otherwise,
when unit numbers are changed, the sysctl devinfo tree gets out of sync
and duplicate trees are attempted to be attached with the original name.
2007-11-30 21:29:08 +00:00
rwatson
df297799bd Move use of 'i' in cp_time sysctl under SCTL_MASK32 so that it compiles
without warnings on systems that don't define it.
2007-11-29 08:38:22 +00:00
peter
8e9baed553 Move the shared cp_time array (counts %sys, %user, %idle etc) to the
per-cpu area.  cp_time[] goes away and a new function creates a merged
cp_time-like array for things like linprocfs, sysctl etc.  The
atomic ops for updating cp_time[] in statclock go away, and the scope
of the thread lock is reduced.

sysctl kern.cp_time returns a backwards compatible cp_time[] array.
A new kern.cp_times sysctl returns the individual per-cpu stats.

I have pending changes to make top and vmstat optionally show per-cpu
stats.

I'm very aware that there are something like 5 or 6 other versions "out
there" for doing this - but none were handy when I needed them.

I did merge my changes with John Baldwin's, and ended up replacing a
few chunks of my stuff with his, and stealing some other code.

Reviewed by:  jhb
Partly obtained from:  jhb
2007-11-29 06:34:30 +00:00
attilio
2562874cb6 Make ADAPTIVE_GIANT as the default in the kernel and remove the option.
Currently, Giant is not too much contented so that it is ok to treact it
like any other mutexes.

Please don't forget to update your own custom config kernel files.

Approved by:	cognet, marcel (maintainers of arches where option is
		not enabled at the moment)
2007-11-28 05:50:45 +00:00
attilio
6fcea2407c Simplify the adaptive spinning algorithm in rwlock and mutex:
currently, before to spin the turnstile spinlock is acquired and the
waiters flag is set.
This is not strictly necessary, so just spin before to acquire the
spinlock and to set the flags.
This will simplify a lot other functions too, as now we have the waiters
flag set only if there are actually waiters.
This should make wakeup/sleeping couplet faster under intensive mutex
workload.
This also fixes a bug in rw_try_upgrade() in the adaptive case, where
turnstile_lookup() will recurse on the ts_lock lock that will never be
really released [1].

[1] Reported by: jeff with Nokia help
Tested by: pho, kris (earlier, bugged version of rwlock part)
Discussed with: jhb [2], jeff
MFC after: 1 week

[2] John had a similar patch about 6.x and/or 7.x about mutexes probabilly
2007-11-26 22:37:35 +00:00
attilio
7d4ec70696 Fix the spinlock static table adding missing spinlocks.
- rm_spinlock has turnstile chain as child
- srclock has callout and clk as child, found by witness "emulation".
  Just move it very high in our ranking
2007-11-24 04:32:32 +00:00
attilio
c61d48e731 transferlockers() is a very dangerous and hack-ish function as waiters
should never be moved by one lock to another.
As, luckily, nothing in our tree is using it, axe the function.

This breaks lockmgr KPI, so interested, third-party modules should update
their source code with appropriate replacement.

Ok'ed by: ups, rwatson
MFC after: 3 days
2007-11-24 04:22:28 +00:00
kris
e9fa0b52d6 Remove remaining Giant acquisition around vn_fullpath1. This was missed
in r1.106 and has not been required for some years now.

Reviewed by:  jeff
MFC After:    1 week
2007-11-22 21:26:25 +00:00
attilio
5ed5cf01be Cache the value of c_lock as it can change, in the struct,
while the global callout spinlock is not held, and can lead to PF#.

Reported by: dougb, Mark Atkinson <atkin901 at yahoo dot com>
Tested by: dougb
Diagnosed by: jhb
2007-11-22 12:15:54 +00:00
davidxu
648833c953 Add function UMTX_OP_WAIT_UINT, the function causes thread to wait for
an integer to be changed.
2007-11-21 04:21:02 +00:00
rwatson
6651bdd106 Test that p_textvp is non-NULL be dereferencing, as no executable vnode is
set for kernel processes.

Reported by:	Skip Ford <skip at menantico dot com>
MFC after:	3 days
2007-11-20 18:03:09 +00:00
attilio
6c4cd05be5 Add the function callout_init_rw() to callout facility in order to use
rwlocks in conjuction with callouts.  The function does basically what
callout_init_mtx() alredy does with the difference of using a rwlock
as extra argument.
CALLOUT_SHAREDLOCK flag can be used, now, in order to acquire the lock only
in read mode when running the callout handler.  It has no effects when used
in conjuction with mtx.

In order to implement this, underlying callout functions have been made
completely lock type-unaware, so accordingly with this, sysctl
debug.to_avg_mtxcalls is now changed in the generic
debug.to_avg_lockcalls.

Note: currently the allowed lock classes are mutexes and rwlocks because
callout handlers run in softclock swi, so they cannot sleep and they
cannot acquire sleepable locks like sx or lockmgr.

Requested by: kmacy, pjd, rwatson
Reviewed by: jhb
2007-11-20 00:37:45 +00:00
jhb
3c82b6a5c7 Bump up the number of ttys supported by pty(4) to 512 by making use of
[pt]ty[lmnoLMNO][0-9a-v].

MFC after:	3 days
Reviewed by:	rwatson
2007-11-19 20:49:42 +00:00
dumbbell
fff090c011 The kernel uses two ways to write data on a pipe:
o  buffered write, for chunks smaller than PIPE_MINDIRECT bytes
    o  direct write, for everything else

A call to writev(2) may receive struct iov of various size and the
kernel may have to switch from one solution to the other. Before doing
this, it must wake reader processes and any select/poll/kqueue up.

This commit fixes a bug where select/poll/kqueue are not triggered
when switching from buffered write to direct write. It adds calls to
pipeselwakeup().

I give more details on freebsd-arch@:
http://lists.freebsd.org/pipermail/freebsd-arch/2007-September/006790.html

This should fix issues with Erlang (lang/erlang) and kqueue.

Reported by:	Rickard Green (Erlang)
2007-11-19 15:05:20 +00:00
attilio
bfc761fdba Expand lock class with the "virtual" function lc_assert which will offer
an unified way for all the lock primitives to express lock assertions.
Currenty, lockmgrs and rmlocks don't have assertions, so just panic in
that case.
This will be a base for more callout improvements.

Ok'ed by: jhb, jeff
2007-11-18 14:43:53 +00:00
rrs
9dbec8e7df - Add in missing event handler invokes for initial proc and thread. 2007-11-18 13:56:51 +00:00
jb
9bd9c03e92 Add a function to list symbols in a file and their values at the
same time rather than having to list the symbols and then go back
and look each one up by name.
2007-11-18 00:23:31 +00:00
jhb
b113160d81 Acquire the process mutex and spin locks before calling thread_exit() in
kthread_exit() to fix panics when using INVARIANTS.
2007-11-15 21:45:17 +00:00
rrs
303afeb279 - Adds event handlers for process_ctor,process_dtor, process_init,
process_fini, thread_ctor, thread_dtor, thread_init, thread_fini. This
  will allow us to extend dynamically areas in proc/thread for dtrace ;-)
Reviewed by:    rwatson
2007-11-15 14:20:07 +00:00
glebius
93fa6a684a Fix build. 2007-11-15 14:16:20 +00:00
rrs
3925f424bc Adds an event handler for:
- process_ctor,dtor, init and fini
  - thread_ctor,dtor, init and fini
This allows the ability to add on additional things
during construction/destruction of threads and processes.

Reviewed by:	rwatson
2007-11-15 13:28:54 +00:00
julian
303014d009 This time REALLY copy the name from the proc to the thread as a default. 2007-11-15 06:35:26 +00:00
julian
bfb9581757 When forking, the new thread deserves a name too. Don't just use the
td_startcopy section as it is not the right thing to do
in other cases (e.g. if starting a new thread from one that is already named).
2007-11-15 02:13:44 +00:00
attilio
4b290fe097 Remove a bogus KASSERT which will prevent rwlock to be acquired
recursively in exclusive mode with debugging kernels.

Submitted by: kmacy
Approved by: jeff
2007-11-14 21:21:48 +00:00
marcel
1e7c4f0a3f o Rename cpu_thread_setup() to cpu_thread_alloc() to better
communicate that it relates to (is called by) thread_alloc()
o  Add cpu_thread_free() which is called from thread_free()
   to counter-act cpu_thread_alloc().

i386:	Have cpu_thread_free() call cpu_thread_clean() to
	preserve behaviour.
ia64:	Have cpu_thread_free() call mtx_destroy() for the
	mutex initialized in cpu_thread_alloc().

PR: ia64/118024
2007-11-14 20:21:54 +00:00
julian
7ee6259be7 A bunch more files that should probably print out a thread name
instead of a process name.
2007-11-14 06:51:33 +00:00
julian
b2732e0c22 generally we are interested in what thread did something as
opposed to what process. Since threads by default have teh name of the
process unless over-written with more useful information, just print the
thread name instead.
2007-11-14 06:21:24 +00:00
julian
b248158d8d Make sure there is a good default thread name for all threads. 2007-11-14 06:04:57 +00:00
rwatson
9162bf224c Add rm_wowned(9) function to test whether the current thread owns an
exclusive lock on the passed rmlock.

Reviewed by:	ups
2007-11-10 15:06:30 +00:00
jhb
a97196ea0d A couple of optimizations to the last commit.
Submitted by:	Christoph Mallon christoph mallon of gmx de
2007-11-08 21:45:56 +00:00
ups
c6c6cc7efa Use VM_FAULT_DIRTY to fault in pages for write access in
proc_rwmen.
Otherwise copy on write may create an anonymous page that is
not marked as dirty. Since  writing data to these pages
in this function also does not dirty these pages they may be
later discarded by the pagedaemon.
2007-11-08 19:35:36 +00:00
jhb
3a40a5ed4b Make it easier to add more ptys to the pty(4) driver:
- Use unit2minor() and minor2unit() to generate minor numbers to support
  unit numbers higher than 255.
- Use simple string operations on the 'names' array rather than hard-coded
  constants and switch statements so that more ptys can be added by simply
  expanding the 'names' array.

MFC after:	1 week
2007-11-08 15:51:52 +00:00
ups
9c75ede409 Initial checkin for rmlock (read mostly lock) a multi reader single writer
lock optimized for almost exclusive reader access. (see also rmlock.9)

TODO:
    Convert to per cpu variables linkerset as soon as it is available.
    Optimize UP (single processor)  case.
2007-11-08 14:47:55 +00:00
rwatson
70292a328f Remove unused variable td from sched_idletd().
MFC after:	3 days
Found with:	Coverity Prevent(tm)
CID:		3561
2007-11-05 12:01:12 +00:00
kib
9ae733819b Fix for the panic("vm_thread_new: kstack allocation failed") and
silent NULL pointer dereference in the i386 and sparc64 pmap_pinit()
when the kmem_alloc_nofault() failed to allocate address space. Both
functions now return error instead of panicing or dereferencing NULL.

As consequence, vmspace_exec() and vmspace_unshare() returns the errno
int. struct vmspace arg was added to vm_forkproc() to avoid dealing
with failed allocation when most of the fork1() job is already done.

The kernel stack for the thread is now set up in the thread_alloc(),
that itself may return NULL. Also, allocation of the first process
thread is performed in the fork1() to properly deal with stack
allocation failure. proc_linkup() is separated into proc_linkup()
called from fork1(), and proc_linkup0(), that is used to set up the
kernel process (was known as swapper).

In collaboration with:	Peter Holm
Reviewed by:	jhb
2007-11-05 11:36:16 +00:00
julian
0bf0ae3b24 Completely remove the code for single threading the mainline fork code.
Put in a little comment explaining why it went away.
Re-enable it in the case there an exisiting process is just splitting
off its address space and file descriptors.
(I donpt think anything uses that code but it needs some sort of locking
and this does the job.

Reviewed by:	Davidxu, alc, others
MFC after:	3 days
2007-11-02 19:40:36 +00:00
njl
53b7cf834c If we're on an SMP kernel and there is more than 1 CPU, reject any attempts
to change the freq before the other CPUs are active.  The current code
always attempts to change all CPUs to match each other, and the requisite
sched_bind() call won't work before APs are launched.
2007-10-30 22:18:08 +00:00
julian
03284e6e08 fix typo in code normally not compiled in. 2007-10-29 20:45:31 +00:00
julian
5d7e7a3ea7 Fix typo in code obviously not being compiled on any of my machines.
found by: rdivacky@
2007-10-28 23:11:57 +00:00
jhb
3ee3b45ab8 Change the roundrobin implementation in the 4BSD scheduler to trigger a
userland preemption directly from hardclock() via sched_clock() when a
thread uses up a full quantum instead of using a periodic timeout to cause
a userland preemption every so often.  This fixes a potential deadlock
when IPI_PREEMPTION isn't enabled where softclock blocks on a lock held
by a thread pinned or bound to another CPU.  The current thread on that
CPU will never be preempted while softclock is blocked.

Note that ULE already drives its round-robin userland preemption from
sched_clock() as well and always enables IPI_PREEMPT.

MFC after:	1 week
2007-10-27 22:07:40 +00:00
rodrigc
0b0826633e In nmount(), if MNT_ROOT is in the mount flags, filter it
out instead of returning an error.
(1)  This makes the behavior consistent with mount(2).
(2)  This makes update mounts on the root file system work properly.
(3)  The explicit checks for MNT_ROOTFS in src/sbin/fsck_ffs/main.c
     and src/usr.sbin/mountd/mountd.c which were put in to
     eliminate errors during update mounts on the root file system
     can be removed.

The only place were MNT_ROOTFS can be validly set
is inside the kernel, i.e. with vfs_mountroot_try().

Reviewed by:	phk
MFC after:	3 days
2007-10-27 15:59:18 +00:00
julian
93719028ad Add support for the pre-exisiting module shutdoen handshake.
Fix some comments.
2007-10-27 00:54:16 +00:00
julian
bc607610fb rename the process to 'idle' and 'intr' as per jhb. 2007-10-27 00:52:26 +00:00
julian
83e9ad5680 Initialise the initial process pointer to NULL so that we know we don't
have an idle process yet.
I'm guessing that on my system this was always 0 already.

found by: Ed Schouten
2007-10-27 00:42:40 +00:00
julian
4eb58f88d1 If kthread_exit() is called on the last kthread in a kproc, then
all the work in kproc_exit must be done.
We don't actually have a user of this yet but why leave it to chance.
2007-10-26 22:18:20 +00:00
julian
9f54ea2c1e if one changes a function's arguments, one must also change the callers. 2007-10-26 22:03:19 +00:00
julian
c21b38f409 oops, over optimised and broke non-SMP builds 2007-10-26 20:32:33 +00:00
julian
b6e413a633 kthread_exit needs no stinkin argument. 2007-10-26 17:03:22 +00:00
obrien
2fc050a9c2 style(9) 2007-10-26 16:33:47 +00:00
julian
11e1aa0d18 Introduce a way to make pure kernal threads.
kthread_add() takes the same parameters as the old kthread_create()
plus a pointer to a process structure, and adds a kernel thread
to that process.

kproc_kthread_add() takes the parameters for kthread_add,
plus a process name and a pointer to a pointer to a process instead of just
a pointer, and if the proc * is NULL, it creates the process to the
specifications required, before adding the thread to it.

All other old kthread_xxx() calls return, but act on (struct thread *)
instead of (struct proc *). One reason to change the name is so that
any old kernel modules that are lying around and expect kthread_create()
to make a process will not just accidentally link.

fix top to show  kernel threads by their thread name in -SH mode
add a tdnam formatting option to ps to show thread names.

make all idle threads actual kthreads and put them into their own idled process.
make all interrupt threads kthreads and put them in an interd process
(mainly for aesthetic and accounting reasons)
rename proc 0 to be 'kernel' and it's swapper thread is now 'swapper'

man page fixes to follow.
2007-10-26 08:00:41 +00:00
csjp
a16bb8381d Implement AUE_CORE, which adds process core dump support into the kernel.
This change introduces audit_proc_coredump() which is called by coredump(9)
to create an audit record for the coredump event.  When a process
dumps a core, it could be security relevant.  It could be an indicator that
a stack within the process has been overflowed with an incorrectly constructed
malicious payload or a number of other events.

The record that is generated looks like this:

header,111,10,process dumped core,0,Thu Oct 25 19:36:29 2007, + 179 msec
argument,0,0xb,signal
path,/usr/home/csjp/test.core
subject,csjp,csjp,staff,csjp,staff,1101,1095,50457,10.37.129.2
return,success,1
trailer,111

- We allocate a completely new record to make sure we arent clobbering
  the audit data associated with the syscall that produced the core
  (assuming the core is being generated in response to SIGABRT  and not
  an invalid memory access).
- Shuffle around expand_name() so we can use the coredump name at the very
  beginning of the coredump call.  Make sure we free the storage referenced
  by "name" if we need to bail out early.
- Audit both successful and failed coredump creation efforts

Obtained from:	TrustedBSD Project
Reviewed by:	rwatson
MFC after:	1 month
2007-10-26 01:23:07 +00:00
rwatson
60570a92bf Merge first in a series of TrustedBSD MAC Framework KPI changes
from Mac OS X Leopard--rationalize naming for entry points to
the following general forms:

  mac_<object>_<method/action>
  mac_<object>_check_<method/action>

The previous naming scheme was inconsistent and mostly
reversed from the new scheme.  Also, make object types more
consistent and remove spaces from object types that contain
multiple parts ("posix_sem" -> "posixsem") to make mechanical
parsing easier.  Introduce a new "netinet" object type for
certain IPv4/IPv6-related methods.  Also simplify, slightly,
some entry point names.

All MAC policy modules will need to be recompiled, and modules
not updates as part of this commit will need to be modified to
conform to the new KPI.

Sponsored by:	SPARTA (original patches against Mac OS X)
Obtained from:	TrustedBSD Project, Apple Computer
2007-10-24 19:04:04 +00:00