10575 Commits

Author SHA1 Message Date
Ed Schouten
e7ea30e404 Remove the use of lbolt from the VFS syncer.
It seems we only use `lbolt' inside the VFS syncer and the TTY layer
now.  Because I'm planning to replace the TTY layer next month, there's
no reason to keep `lbolt' if it's only used in a single thread inside
the kernel.

Because the syncer code wanted to wake up the syncer thread before the
timeout, it called sleepq_remove(). Because we now just use a condvar(9)
with a timeout value of `hz', we can wake it up using cv_broadcast()
without waking up any unrelated threads.

Reviewed by:	phk
2008-07-30 12:39:18 +00:00
Ed Schouten
911d490140 Don't make subr_clist.c depend on the TTY layer.
After the import of the new TTY layer, the TTY_QUOTE definition will not
be present anymore. To make sure clists will still work as expected,
introduce an internal definition called QUOTEMASK.

Maybe we can decide to remove the quote bits entirely, but we still have
to look into this. There may be drivers that still use the quote bits.

Obtained from:	//depot/projects/mpsafetty
2008-07-30 12:32:42 +00:00
John Baldwin
c3ea337801 When choosing a CPU for a thread in a cpuset, prefer the last CPU that the
thread ran on if there are no other CPUs in the set with a shorter per-CPU
runqueue.
2008-07-28 20:39:21 +00:00
John Baldwin
f7f1cc1518 Really fix this. 2008-07-28 18:33:43 +00:00
Pawel Jakub Dawidek
7224dd4dad Properly check if td_name is empty and if it is, print process name,
instead of empty thread name.

Reviewed by:	jhb
2008-07-28 18:10:26 +00:00
John Baldwin
f200843b72 Implement support for cpusets in the 4BSD scheduler.
- When a cpuset is applied to a thread, walk the cpuset to see if it is a
  "full" cpuset (includes all available CPUs).  If not, set a new
  TDS_AFFINITY flag to indicate that this thread can't run on all CPUs.
  When inheriting a cpuset from another thread during thread creation, the
  new thread also inherits this flag.  It is in a new ts_flags field in
  td_sched rather than using one of the TDF_SCHEDx flags because fork()
  clears td_flags after invoking sched_fork().
- When placing a thread on a runqueue via sched_add(), if the thread is not
  pinned or bound but has the TDS_AFFINITY flag set, then invoke a new
  routine (sched_pickcpu()) to pick a CPU for the thread to run on next.
  sched_pickcpu() walks the cpuset and picks the CPU with the shortest
  per-CPU runqueue length.  Note that the reason for the TDS_AFFINITY flag
  is to avoid having to walk the cpuset and examine runq lengths in the
  common case.
- To avoid walking the per-CPU runqueues in sched_pickcpu(), add an array
  of counters to hold the length of the per-CPU runqueues and update them
  when adding and removing threads to per-CPU runqueues.

MFC after:	2 weeks
2008-07-28 17:25:24 +00:00
John Baldwin
8aa3d7ffc0 Various and sundry style and whitespace fixes. 2008-07-28 15:52:02 +00:00
Kip Macy
947265b6bd - track maximum wait time
- resize columns based on actual observed numerical values

MFC after:	3 days
2008-07-27 21:45:20 +00:00
Pawel Jakub Dawidek
5573021d78 Assert for exclusive vnode lock in vinactive(), vrecycle() and vgonel()
functions.

Reviewed by:	kib
2008-07-27 11:48:15 +00:00
Pawel Jakub Dawidek
610507ae00 - Move vp test for beeing NULL under IGNORE_LOCK().
- Check if panicstr isn't set, if it is ignore the lock. This helps to avoid
  confusion, because lockmgr is a no-op when panicstr isn't NULL, so
  asserting anything at this point doesn't make sense and can just race with
  other panic.

Discussed with:	kib
2008-07-27 11:46:42 +00:00
Tom Rhodes
be6b130476 Fill in a few sysctl descriptions.
Approved by:	rwatson
2008-07-26 00:55:35 +00:00
Ed Schouten
bea45cdda3 Move ttyinfo() into its own C file.
The ttyinfo() routine generates the fancy output when pressing ^T. Right
now it is stored in tty.c. In the MPSAFE TTY code it is already stored
in tty_info.c. To make integration of the MPSAFE TTY code a little
easier, take the same approach.

This makes the TTY code a little bit more readable, because having the
proc_*/thread_* routines in tty.c is very distractful.

Approved by:	philip (mentor)
2008-07-25 14:31:00 +00:00
Konstantin Belousov
58e8af1bf5 Call pargs_drop() unconditionally in do_execve(), the function correctly
handles the NULL argument.
Make pargs_free() static.

MFC after:	1 week
2008-07-25 11:55:32 +00:00
Konstantin Belousov
96f1567fa7 s/alredy/already/ in the comments and the log message. 2008-07-25 11:22:25 +00:00
Konstantin Belousov
8b4a2800de Do the pargs_hold() on the copy of the pointer to the p_args of the
child process immediately after bulk bcopy() without dropping the
process lock.

Since process is not single-threaded when forking, dropping and
reacquiring the lock allows an other thread to change the process title
of the parent in between, and results in hold being done on the invalid
pointer. The problem manifested itself as the double free of the old
p_args.

Reported by:	kris
Reviewed by:	jhb
MFC after:	1 week
2008-07-23 08:45:25 +00:00
Attilio Rao
09400d5abe - Disallow XFS mounting in write mode. The write support never worked really
and there is no need to maintain it.
- Fix vn_get() in order to let it call vget(9) with a valid locking
  request.  vget(9) returns the vnode locked in order to prevent recycling,
  but in this case internal XFS locks alredy prevent it from happening, so
  it is safe to drop the vnode lock before to return by vn_get().
- Add a VNASSERT() in vget(9) in order to catch malformed locking requests.

Discussed with:	kan, kib
Tested by:	Lothar Braun <lothar at lobraun dot de>
2008-07-21 23:01:09 +00:00
Robert Watson
828e07694c If run_interrupt_driven_config_hooks() waits 360 seconds and INVARIANTS
is compiled into the kernel, then panic.

MFC after:	3 days
Discussed with:	scottl
2008-07-21 20:50:49 +00:00
Pawel Jakub Dawidek
7f41115ef6 Implement the following macros for completeness:
SYSCTL_QUAD()
	SYSCTL_ADD_QUAD()
	TUNABLE_QUAD()
	TUNABLE_QUAD_FETCH()

Now we can use 64bit tunables on 32bit systems.
2008-07-21 15:05:25 +00:00
Kip Macy
dd0e6c383a Add accessor functions for socket fields.
MFC after:	1 week
2008-07-21 00:49:34 +00:00
Alan Cox
14e69e48b8 Eliminate dead code. (The commit message for revision 1.287 explains why
this code is dead.)
2008-07-20 04:13:51 +00:00
Robert Watson
1cc2bd820b Rather than simply waiting silently and indefinitely for all
interrupt-driven configuration handlers to complete, print out a
diagnostic message every 60 second indicating which handlers are
still running.  Do this at most 5 times per run so as to avoid
scrolling out any useful information from the kernel message
buffer.

The interval of 60 seconds was selected based on a best guess as
to the nature of "long enough" and may want to be tuned higher
or lower depending on real-world tolerances.

MFC after:	3 days
Discussed with:	scottl
2008-07-19 19:08:35 +00:00
Robert Watson
1a4b919f8e witness_addgraph() is required even if DDB isn't compiled into the kernel,
so exclude it from #ifdef DDB.

Submitted by:	attilio
2008-07-19 17:47:23 +00:00
Robert Watson
51c0f94ed7 Add DDB "show conifhk" command, which lists hooks currently waiting
for completion in run_interrupt_driven_config_hooks().  This is
helpful when trying to figure out which device drivers have gone
into la-la land during boot-time autoconfiguration.

MFC after:	3 days
2008-07-19 12:12:54 +00:00
Jeff Roberson
9fc51b0bf4 Fix a race which could result in some timeout buckets being skipped.
- When a tick occurs on a cpu, iterate from cs_softticks until ticks.
   The per-cpu tick processing happens asynchronously with the actual
   adjustment of the 'ticks' variable.  Sometimes the results may
   be visible before the local call and sometimes after.  Previously this
   could cause a one tick window where we didn't evaluate the bucket.
 - In softclock fetch curticks before incrementing cc_softticks so we
   don't skip insertions which were made for the current time.

Sponsored by:	Nokia
2008-07-19 05:18:29 +00:00
Jeff Roberson
e980fff622 - Check whether we've recorded this tick in ts_ticks on another cpu in
sched_tick() to prevent multiple increments for one tick.  This pushes
   the value out of range and breaks priority calculation.

Reviewed by:	kib
Found by:	pho/nokia
Sponsored by:	Nokia
MFC after:	3 days
2008-07-19 05:13:47 +00:00
Kip Macy
694382c8eb revert local change 2008-07-18 07:10:33 +00:00
Kip Macy
2976b312a1 revert change from local tree 2008-07-18 07:07:57 +00:00
Kip Macy
4af83c8cff import vendor fixes to cxgb 2008-07-18 06:12:31 +00:00
Konstantin Belousov
9a75ea2333 Pair the VOP_OPEN call from do_execve() with the reciprocal VOP_CLOSE.
This was unnoticed because local filesystems usually do nothing
non-trivial in the close vop.

Reported and tested by:	Rick Macklem
MFC after:	2 weeks
2008-07-17 16:44:07 +00:00
Antoine Brodin
23d5e112eb Staticize M_STACK.
Approved by:	rwatson (mentor)
MFC after:	1 month
2008-07-13 17:15:05 +00:00
Craig Rodrigues
1aad294b4e In nmount(), if we see "update" in the mount options,
set MNT_UPDATE in fsflags, and delete the
"update" option from the global mount options.

MNT_UPDATE is a command, and not a property of a mount
that should persist after the command is executed.

We need to do similar things for MNT_FORCE and MNT_RELOAD.

All mount flags are prefixed by MNT_..... it would
be nice if flags which were commands were named differently
from flags which are persistent properties of a mount.
This was not such a big deal in the pre-nmount() days,
but with nmount() it is more important.

Requested by:	yar
MFC after:	2 weeks
2008-07-12 20:12:40 +00:00
David E. O'Brien
b474c780b5 Improve readability and cscope searches a little bit by not using the
same variable name in closely related (but not conflicting) contexts.
2008-07-11 14:48:28 +00:00
Konstantin Belousov
ae95dc623a Make it atomic for the devfs_populate_loop() to see the setting of
SI_ALIAS flag and initialization of the si_parent when alias is created.
Assert that supplied parent device is not NULL.

Both situations could cause NULL dereference in the
devfs_populate_loop() when creating a symlink for SI_ALIAS'ed device.
Namely, cdp->cdp_c.si_parent may be NULL.

Reported by:	mav
MFC after:	2 weeks
2008-07-11 11:22:19 +00:00
David E. O'Brien
4f2945f832 Revert r180431.
r180431 broke the AMD64 build (the only arch using kern/link_elf_obj.c)
2008-07-11 01:10:40 +00:00
David E. O'Brien
f55ffb3990 Allow 'elf_file_t' to be used in a wider scope. 2008-07-10 16:35:57 +00:00
Edwin Groothuis
552f9f63c1 Improve the output of kldload(8) to show which module can't be loaded.
Was:		kldload: Unsupported file type
Is now:		kldload: /boot/modules/test.ko: Unsupported file type

PR:		kern/121276
Submitted by:	Edwin Groothuis <edwin@mavetju.org>
Approved by:	bde (mentor)
MFC after:	1 week
2008-07-08 23:51:38 +00:00
Bjoern A. Zeeb
dea0ed6690 Add a `show cpusets' DDB command to print numbered root and
assigned CPU affinity sets.

Reviewed by:	brooks
2008-07-07 21:32:02 +00:00
Bjoern A. Zeeb
45e48455cd MFp4 144659:
Plug a memory leak with jail services.

PR:		125257
Submitted by:	Mateusz Guzik <mjguzik gmail.com>
MFC after:	6 days
2008-07-07 20:53:49 +00:00
Bjoern A. Zeeb
7a8f695a21 Move cpuset_refroot and cpuset_refbase functions up, grouping the
cpuset_ref* functions together. Will make it easier to read and
add code without forward declarations.
No functional changes.
2008-07-07 20:45:55 +00:00
Konstantin Belousov
7054ee4e38 The kqueue_register() function assumes that it is called from the top of
the syscall code and acquires various event subsystem locks as needed.
The handling of the NOTE_TRACK for EVFILT_PROC is currently done by
calling the kqueue_register() from filt_proc() filter, causing recursive
entrance of the kqueue code. This results in the LORs and recursive
acquisition of the locks.

Implement the variant of the knote() function designed to only handle
the fork() event. It mostly copies the knote() body, but also handles
the NOTE_TRACK, removing the handling from the filt_proc(), where it
causes problems described above. The function is called from the fork1()
instead of knote().

When encountering NOTE_TRACK knote, it marks the knote as influx
and drops the knlist and kqueue lock. In this context call to
kqueue_register is safe from the problems.

An error from the kqueue_register() is reported to the observer as
NOTE_TRACKERR fflag.

PR:	108201
Reviewed by:	jhb, Pramod Srinivasan <pramod juniper net> (previous version)
Discussed with:	jmg
Tested by:	pho
MFC after:	2 weeks
2008-07-07 09:30:11 +00:00
Konstantin Belousov
e1a32fd42b The r178914 I erronously put the setting of the KQ_FLUXWAIT flag before
KQ_FLUX_WAKEUP(). Since the later macro clears the KQ_FLUXWAIT, the
kqueue_scan() thread may be not woken up.

Move the setting of KQ_FLUXWAIT after wakeup to correct the issue.

Reported and tested by:	pho
MFC after:	3 days
2008-07-07 09:15:29 +00:00
Alan Cox
b89eaf4e9f Enable the creation of a kmem map larger than 4GB.
Submitted by: Tz-Huan Huang

Make several variables related to kmem map auto-sizing static.
Found by: CScout
2008-07-05 19:34:33 +00:00
Robert Watson
4f7d1876d5 Introduce a new lock, hostname_mtx, and use it to synchronize access
to global hostname and domainname variables.  Where necessary, copy
to or from a stack-local buffer before performing copyin() or
copyout().  A few uses, such as in cd9660 and daemon_saver, remain
under-synchronized and will require further updates.

Correct a bug in which a failed copyin() of domainname would leave
domainname potentially corrupted.

MFC after:	3 weeks
2008-07-05 13:10:10 +00:00
Alan Cox
6819e13eeb Correct an error in the comments for init_param3().
Discussed with: silby
2008-07-04 19:36:58 +00:00
Robert Watson
59dd72d040 Remove NETISR_MPSAFE, which allows specific netisr handlers to be directly
dispatched without Giant, and add NETISR_FORCEQUEUE, which allows specific
netisr handlers to always be dispatched via a queue (deferred).  Mark the
usb and if_ppp netisr handlers as NETISR_FORCEQUEUE, and explicitly
acquire Giant in those handlers.

Previously, any netisr handler not marked NETISR_MPSAFE would necessarily
run deferred and with Giant acquired.  This change removes Giant
scaffolding from the netisr infrastructure, but NETISR_FORCEQUEUE allows
non-MPSAFE handlers to continue to force deferred dispatch so as to avoid
lock order reversals between their acqusition of Giant and any calling
context.

It is likely we will be able to remove NETISR_FORCEQUEUE once
IFF_NEEDSGIANT is removed, as non-MPSAFE usb and if_ppp drivers will no
longer be supported.

Reviewed by:	bz
MFC after:	1 month
X-MFC note:	We can't remove NETISR_MPSAFE from stable/7 for KPI reasons,
		but the rest can go back.
2008-07-04 00:21:38 +00:00
Ed Maste
7928893d83 Use bcopy instead of strlcpy in uipc_bind and unp_connect, since
soun->sun_path isn't a null-terminated string.  As UNIX(4) states, "the
terminating NUL is not part of the address."  Since strlcpy has to return
"the total length of the string [it] tried to create," it walks off the end
of soun->sun_path looking for a \0.

This reverts r105332.

Reported by:    Ryan Stone
2008-07-03 23:26:10 +00:00
Julian Elischer
f44e6e2ecc Change a variable name to not shadow a global
Obtained from:	vimage
2008-07-03 08:35:59 +00:00
Robert Watson
6992381eca Update copyright date in light of soreceive_dgram(9). 2008-07-03 06:47:45 +00:00
Robert Watson
5df3e83946 Add soreceive_dgram(9), an optimized socket receive function for use by
datagram-only protocols, such as UDP.  This version removes use of
sblock(), which is not required due to an inability to interlace data
improperly with datagrams, as well as avoiding some of the larger loops
and state management that don't apply on datagram sockets.

This is experimental code, so hook it up only for UDPv4 for testing; if
there are problems we may need to revise it or turn it off by default,
but it offers *significant* performance improvements for threaded UDP
applications such as BIND9, nsd, and memcached using UDP.

Tested by:	kris, ps
2008-07-02 23:23:27 +00:00
Roman Divacky
bff2d4d5ff Use msleep_spin() instead of unlock/tsleep/lock. This was
already commited but with a wrong msleep variant and then
backed out. Note that this changes the semantic a little
as msleep_spin does not let us to specify priority after
wakeup.

Approved by:	wkoszek, cognet
Approved by:	kib (mentor)
2008-07-02 20:44:33 +00:00