Commit Graph

7246 Commits

Author SHA1 Message Date
julian
7835fe4007 Move the KSE ABI specific code here and separate it from code that
is generic to any threading system. This commit does not link this
file to the build yet, nor does it remove these functions from their
current location in kern_thread.c. (that commit coming up after further review)
2004-06-07 07:25:03 +00:00
phk
1d74a27efd Remove filename+line number from panic messages. 2004-06-06 21:26:49 +00:00
bde
f3c51ac0d4 Detect interrupt storms better. The storm detection didn't work at all
with an ASUS A7N8X-E motherboard in APIC mode, since storming interrupts
don't repeat immediately.  Use DELAY(1) to wait a bit for them to repeat.
This affects all systems.  Only delay for the first
(10 * intr_storm_threshold) interrupts (per interrupt handler) so that
this is only a pessimization while warming up.  Throttle after calling
the sub-handlers instead of before so that the long delay given by
throttling can be used instead of the DELAY(1) to detect storms after
warming up.

Reduced the throttling period from 1/10 second to 1/hz seconds so that
throttling doesn't destroy performance so much.  Interrupts that are
detected as storming are effectively handled by polling at a frequency
of hz Hz.  On A7N8X-E's there is another hardware or configuration bug
that makes the throttled frequency closer to 2*hz Hz.
2004-06-05 18:27:28 +00:00
mux
289f1bd7df When we don't have any meaningful value to print for the device sysctl
tree, output an empty string instead of "?".  This is already what
happened with DEVICE_SYSCTL_LOCATION and DEVICE_SYSCTL_PNPINFO.  This
makes the output of "sysctl dev" much nicer (it won't display those
empty sysctls).

Reviewed by:	des
2004-06-05 11:39:05 +00:00
tjr
5545221528 Change the types of vn_rdwr_inchunks()'s len and aresid arguments to
size_t and size_t *, respectively. Update callers for the new interface.
This is a better fix for overflows that occurred when dumping segments
larger than 2GB to core files.
2004-06-05 02:18:28 +00:00
tjr
30c66c4349 Back out workaround for vn_rdwr_inchunks()'s INT_MAX length limitation
after discussions with bde; vn_rdwr_inchunks() itself should be fixed.
2004-06-05 02:00:12 +00:00
phk
ab9c3b8324 Centralize the line discipline optimization determination in a function
called ttyldoptim().

Use this function from all the relevant drivers.

I belive no drivers finger linesw[] directly anymore, paving the way for
locking and refcounting.
2004-06-04 21:55:55 +00:00
phk
db2bdd44aa Manual edits to change linesw[]-frobbing to ttyld_*() calls. 2004-06-04 20:04:52 +00:00
phk
c2fc58d70f Machine generated patch which changes linedisc calls from accessing
linesw[] directly to using the ttyld...() functions

The ttyld...() functions ar inline so there is no performance hit.
2004-06-04 16:02:56 +00:00
tjr
27c6e572f9 Remove a stale comment. 2004-06-04 11:00:22 +00:00
des
d0d588d925 Add a devclass level to the dev sysctl tree, in order to support per-
class variables in addition to per-device variables.  In plain English,
this means that dev.foo0.bar is now called dev.foo.0.bar, and it is
possible to to have dev.foo.bar as well.
2004-06-04 10:23:00 +00:00
phk
e38a975fdb Get rid of ttyregister(). All drivers now use ttymalloc() for struct
tty, so now we stand a chance of implementing refcounting and getting
rid of the damn things again.
2004-06-04 07:17:03 +00:00
phk
51f05dbc85 Use ttymalloc() instead of ttyregister(). Use ttyioctl() instead of
direct calls to the linedisc.
2004-06-04 06:50:35 +00:00
tjr
94e7af9e6c Write segments to core dump files in maximally-sized chunks that neither
exceed vn_rdwr_inchunks()'s INT_MAX length limitation nor span a block
boundary. This fixes dumping segments larger than 2GB.

PR:	67546
2004-06-04 06:30:16 +00:00
rwatson
2763c817f5 Mark sun_noname as const since it's immutable. Update definitions
of functions that potentially accept &sun_noname (sbappendaddr(),
et al) to accept a const sockaddr pointer.
2004-06-04 04:07:08 +00:00
alc
7d5d99d158 Move the definitions of SWAPBLK_NONE and SWAPBLK_MASK from vm_page.h to
blist.h, enabling the removal of numerous #includes from subr_blist.c.
(subr_blist.c and swap_pager.c are the only users of these definitions.)
2004-06-04 04:03:26 +00:00
jhb
6f6def0e34 - Comment out NULL, NULL barrier for Unix domain sockets section as the
double NULL entries signal Witness to stop processing the array of
  order entries meaning none of the spin locks are added resulting in
  panics on boot.
- Add a missing NULL, NULL terminator to the Slip locks list to keep them
  separate from the spin locks.
2004-06-03 20:07:44 +00:00
tjr
f651d922d3 Remove checks for curthread == NULL - it can't happen. 2004-06-03 10:22:47 +00:00
tjr
56835bf880 Move TDF_DEADLKTREAT into td_pflags (and rename it accordingly) to avoid
having to acquire sched_lock when manipulating it in lockmgr(), uiomove(),
and uiomove_fromphys().

Reviewed by:	jhb
2004-06-03 01:47:37 +00:00
rwatson
c4609f82d9 Expand the hard-coded WITNESS lock order to include the following
relationships:

Sockets:    filedesc->accept->sellck
Routing:    radix node head->rtentry->ifaddr
UDP:        udp->udpinp
TCP:        tcp->tcpinp
SLIP:       slip_mtx->slip sc_mtx

Drop in a place holder section for UNIX domain sockets.  Various
sections to be expanded over the next few days.
2004-06-02 23:28:06 +00:00
mux
a50c49eeab As discussed on arch@, flatten the device sysctl tree to make it
more convenient to deal with.  The notion of hierarchy is however
preserved by adding a new %parent node.
2004-06-02 22:43:35 +00:00
tjr
92a3c489b3 Remove a redundant "td = curthread" statement from profclock(). 2004-06-02 12:05:06 +00:00
tjr
38428912ad Move TDF_SA from td_flags to td_pflags (and rename it accordingly)
so that it is no longer necessary to hold sched_lock while
manipulating it.

Reviewed by:	davidxu
2004-06-02 07:52:36 +00:00
jeff
9039a2bc2b - Run sched_balance() and sched_balance_groups() from hardclock via
sched_clock() rather than using callouts.  This means we no longer have to
   take the load of the callout thread into consideration while balancing and
   should make the balancing decisions simpler and more accurate.

Tested on:	x86/UP, amd64/SMP
2004-06-02 05:46:48 +00:00
rwatson
661f51c0cf Integrate accept locking from rwatson_netperf, introducing a new
global mutex, accept_mtx, which serializes access to the following
fields across all sockets:

          so_qlen          so_incqlen         so_qstate
          so_comp          so_incomp          so_list
          so_head

While providing only coarse granularity, this approach avoids lock
order issues between sockets by avoiding ownership of the fields
by a specific socket and its per-socket mutexes.

While here, rewrite soclose(), sofree(), soaccept(), and
sonewconn() to add assertions, close additional races and  address
lock order concerns.  In particular:

- Reorganize the optimistic concurrency behavior in accept1() to
  always allocate a file descriptor with falloc() so that if we do
  find a socket, we don't have to encounter the "Oh, there wasn't
  a socket" race that can occur if falloc() sleeps in the current
  code, which broke inbound accept() ordering, not to mention
  requiring backing out socket state changes in a way that raced
  with the protocol level.  We may want to add a lockless read of
  the queue state if polling of empty queues proves to be important
  to optimize.

- In accept1(), soref() the socket while holding the accept lock
  so that the socket cannot be free'd in a race with the protocol
  layer.  Likewise in netgraph equivilents of the accept1() code.

- In sonewconn(), loop waiting for the queue to be small enough to
  insert our new socket once we've committed to inserting it, or
  races can occur that cause the incomplete socket queue to
  overfill.  In the previously implementation, it was sufficient
  to simply tested once since calling soabort() didn't release
  synchronization permitting another thread to insert a socket as
  we discard a previous one.

- In soclose()/sofree()/et al, it is the responsibility of the
  caller to remove a socket from the incomplete connection queue
  before calling soabort(), which prevents soabort() from having
  to walk into the accept socket to release the socket from its
  queue, and avoids races when releasing the accept mutex to enter
  soabort(), permitting soabort() to avoid lock ordering issues
  with the caller.

- Generally cluster accept queue related operations together
  throughout these functions in order to facilitate locking.

Annotate new locking in socketvar.h.
2004-06-02 04:15:39 +00:00
rwatson
3597128242 Rather than assert f_type==DTYPE_VNODE, conditionally perform the
file lock release based on f_type==DTYPE_VNODE.  vn_closefile() is
used by non-vnode types as well (fifo).
2004-06-01 23:36:47 +00:00
rwatson
6d73586edc Add GIANT_REQUIRED to kqueue_close(), since kqueue currently requires
Giant.
2004-06-01 18:05:41 +00:00
rwatson
63cc146826 Push the VOP_ADVLOCK() call to release advisory locks on vnode file
descriptors out of fdrop_locked() and into vn_closefile().  This
removes all knowledge of vnodes from fdrop_locked(), since the lock
behavior was specific to vnodes.  This also removes the specific
requirement for Giant in fdrop_locked(), it's now only required by
code that it calls into.

Add GIANT_REQUIRED to vn_closefile() since VFS requires Giant.
2004-06-01 18:03:20 +00:00
bmilekic
a414d08468 Fix a couple of bugs in the mbuf and packet ctors. In the latter case,
nextpkt within the m_hdr was not being initialized to NULL for
!M_PKTHDR cases.  *Maybe* this will fix weird socket buffer
inconsistency panics, but we'll see.
2004-06-01 16:17:10 +00:00
phk
7344701cb1 Introduce a ttyioctl() cdevsw default function. 2004-06-01 13:39:02 +00:00
phk
7a77d42d27 There is no need to explicitly call the stop function. In all likelyhood
->l_close() did it and ttyclose certainly will.
2004-06-01 11:57:15 +00:00
rwatson
17311e082f Add a global mutex, accept_filter_mtx, to protect the global list of
accept filters and prevent read-modify-write races.
2004-06-01 04:08:48 +00:00
rwatson
1064dac12a The SS_COMP and SS_INCOMP flags in the so_state field indicate whether
the socket is on an accept queue of a listen socket.  This change
renames the flags to SQ_COMP and SQ_INCOMP, and moves them to a new
state field on the socket, so_qstate, as the locking for these flags
is substantially different for the locking on the remainder of the
flags in so_state.
2004-06-01 02:42:56 +00:00
truckman
2ac80b5329 Add MSG_NBIO flag option to soreceive() and sosend() that causes
them to behave the same as if the SS_NBIO socket flag had been set
for this call.  The SS_NBIO flag for ordinary sockets is set by
fcntl(fd, F_SETFL, O_NONBLOCK).

Pass the MSG_NBIO flag to the soreceive() and sosend() calls in
fifo_read() and fifo_write() instead of frobbing the SS_NBIO flag
on the underlying socket for each I/O operation.  The O_NONBLOCK
flag is a property of the descriptor, and unlike ordinary sockets,
fifos may be referenced by multiple descriptors.
2004-06-01 01:18:51 +00:00
bmilekic
a0dab454d6 Bring in mbuma to replace mballoc.
mbuma is an Mbuf & Cluster allocator built on top of a number of
extensions to the UMA framework, all included herein.

Extensions to UMA worth noting:
  - Better layering between slab <-> zone caches; introduce
    Keg structure which splits off slab cache away from the
    zone structure and allows multiple zones to be stacked
    on top of a single Keg (single type of slab cache);
    perhaps we should look into defining a subset API on
    top of the Keg for special use by malloc(9),
    for example.
  - UMA_ZONE_REFCNT zones can now be added, and reference
    counters automagically allocated for them within the end
    of the associated slab structures.  uma_find_refcnt()
    does a kextract to fetch the slab struct reference from
    the underlying page, and lookup the corresponding refcnt.

mbuma things worth noting:
  - integrates mbuf & cluster allocations with extended UMA
    and provides caches for commonly-allocated items; defines
    several zones (two primary, one secondary) and two kegs.
  - change up certain code paths that always used to do:
    m_get() + m_clget() to instead just use m_getcl() and
    try to take advantage of the newly defined secondary
    Packet zone.
  - netstat(1) and systat(1) quickly hacked up to do basic
    stat reporting but additional stats work needs to be
    done once some other details within UMA have been taken
    care of and it becomes clearer to how stats will work
    within the modified framework.

From the user perspective, one implication is that the
NMBCLUSTERS compile-time option is no longer used.  The
maximum number of clusters is still capped off according
to maxusers, but it can be made unlimited by setting
the kern.ipc.nmbclusters boot-time tunable to zero.
Work should be done to write an appropriate sysctl
handler allowing dynamic tuning of kern.ipc.nmbclusters
at runtime.

Additional things worth noting/known issues (READ):
   - One report of 'ips' (ServeRAID) driver acting really
     slow in conjunction with mbuma.  Need more data.
     Latest report is that ips is equally sucking with
     and without mbuma.
   - Giant leak in NFS code sometimes occurs, can't
     reproduce but currently analyzing; brueffer is
     able to reproduce but THIS IS NOT an mbuma-specific
     problem and currently occurs even WITHOUT mbuma.
   - Issues in network locking: there is at least one
     code path in the rip code where one or more locks
     are acquired and we end up in m_prepend() with
     M_WAITOK, which causes WITNESS to whine from within
     UMA.  Current temporary solution: force all UMA
     allocations to be M_NOWAIT from within UMA for now
     to avoid deadlocks unless WITNESS is defined and we
     can determine with certainty that we're not holding
     any locks when we're M_WAITOK.
   - I've seen at least one weird socketbuffer empty-but-
     mbuf-still-attached panic.  I don't believe this
     to be related to mbuma but please keep your eyes
     open, turn on debugging, and capture crash dumps.

This change removes more code than it adds.

A paper is available detailing the change and considering
various performance issues, it was presented at BSDCan2004:
http://www.unixdaemons.com/~bmilekic/netbuf_bmilekic.pdf
Please read the paper for Future Work and implementation
details, as well as credits.

Testing and Debugging:
    rwatson,
    brueffer,
    Ketrien I. Saihr-Kesenchedra,
    ...
Reviewed by: Lots of people (for different parts)
2004-05-31 21:46:06 +00:00
rwatson
3974bac90d Assert Giant in vn_start_write() and vn_finished_write(). 2004-05-31 20:56:10 +00:00
rwatson
d270ab39e1 Assert Giant in vrele(). 2004-05-31 19:06:01 +00:00
phk
1d9d256461 Add missing #include <sys/module.h> 2004-05-30 20:34:58 +00:00
phk
503f5f6026 Add some missing <sys/module.h> includes which are masked by the
one on death-row in <sys/kernel.h>
2004-05-30 17:57:46 +00:00
tjr
fc7b6331a2 Enable MI bits for gcc -ftest-coverage -fprofile-arcs on amd64. 2004-05-29 01:18:14 +00:00
pjd
4867f006aa Sysctl hw.bus.devctl_disable shouldn't be writtable from inside a jail.
Approved by:	imp
2004-05-26 16:36:32 +00:00
tmm
6fb23d7b62 Retire cpu_sched_exit(); it is not used any more. 2004-05-26 12:09:39 +00:00
des
a86e2e73df As previously threatened, give each device its own sysctl context and
subtree (under the new dev top-level node).  This should greatly simplify
drivers which need per-device sysctl variables (such as ndis).
2004-05-25 12:06:26 +00:00
gad
7bd7ddc551 Implement the new KERN_PROC_RGID option, and also implement the
KERN_PROC_SESSION option which had been previously defined but
never implemented.

PR:		bin/65803  (a very tiny piece of the PR)`
Submitted by:	Cyrille Lefevre
2004-05-22 23:11:44 +00:00
davidxu
3bfd07153d Clear KSE thread flags after KSE thread mode is ended. The side effect
of not clearing the flags for execv() syscall will result that a new
program runs in KSE thread mode without enabling it.

Submitted by: tjr
Modified by: davidxu
2004-05-21 14:50:23 +00:00
bde
830748ec5f Fixed some style bugs in tdsigwakeup(). 2004-05-21 10:02:24 +00:00
jhb
811db3f335 In tdsigwakeup(), use TD_ON_SLEEPQ() rather than TD_IS_SLEEPING() to see if
a thread is on a sleep queue and should have it's sleep aborted.

Reported by:	Thierry Herbelot thierry at herbelot dot com
2004-05-20 20:17:28 +00:00
bde
bab09445f3 Fixed printf format errors which helped break GUPROF for arches with
64-bit function pointers.
2004-05-20 16:48:17 +00:00
bde
db0509f4ef Initialize the history counter type field in struct gmonparam as
threatened in rev.1.10 of usr.sbin/kgmon/kgmon.c more than 2 years ago.
kgmon has been recovering from the missing initialization for too
long, but the fixup there is ifdefed for i386's and shouldn't be
needed for other arches.
2004-05-20 16:42:39 +00:00
bde
53ac360c99 Moved i386 asms to an i386 header. The asms are for calibration of
high resolution kernel profiling (options GUPROF.  "U" in GUPROF stands
for microseconds resolution, but the resolution is now smaller than 1
nanosecond on multi-GHz machines and the accuracy is heading towards
1 nanosecond too).  Arches that support GUPROF must now provide certain
macros for the calibration.  GUPROF is now only supported for i386's,
so the absence of the new macros for other arches doesn't break anything
that wasn't already broken.  amd64's have uncommitted support for
GUPROF, and sparc64's have support that seems to be complete except
here (there was an #error for non-i386 cases; now there are undefined
macros).

Changed the asms a little:
- declare them as __volatile.  They must not be moved, and exporting a
  label across asms is technically incorrect, so try harder to stop gcc
  moving them.
- don't put the non-clobbered register "bx" in the clobber list.  The
  clobber lists are still more conservative than necessary.
- drop the non-support for gcc-1.  It just gave a better error message,
  and this is not useful since compiling with gcc-1 would cause thousands
  of worse error messages.
- drop the support for aout.
2004-05-20 16:12:19 +00:00