7255 Commits

Author SHA1 Message Date
stefanf
d7af95e868 Avoid assignments to cast expressions.
Reviewed by:	md5
Approved by:	das (mentor)
2004-06-08 13:08:19 +00:00
tjr
58dbd6e669 Remove remnants of PGINPROF. 2004-06-08 10:37:30 +00:00
rwatson
8555f72de8 Correct a resource leak introduced in recent accept locking changes:
when I reordered events in accept1() to allocate a file descriptor
earlier, I didn't properly update use of goto on exit to unwind for
cases where the file descriptor is now held, but wasn't previously.
The result was that, in the event of accept() on a non-blocking socket,
or in the event of a socket error, a file descriptor would be leaked.

This ended up being non-fatal in many cases, as the file descriptor
would be properly GC'd on process exit, so only showed up for processes
that do a lot of non-blocking accept() calls, and also live for a long
time (such as qmail).

This change updates the use of goto targets to do additional unwinding.

Eyes provided by:	Brian Feldman <green@freebsd.org>
Feet, hands provided by:	Stefan Ehmann <shoesoft@gmx.net>,
				Dimitry Andric <dimitry@andric.com>
				Arjan van Leeuwen <avleeuwen@piwebs.com>
2004-06-07 21:45:44 +00:00
phk
bfb13da831 Make linesw[] an array of pointers to linedesc instead of an array of
linedisc.
2004-06-07 20:45:45 +00:00
julian
769daa5d1d Split kern_thread.c into 2 parts. kern_kse.c and kern_thread.c
Kern_kse has already been committed.
This separates out the KSE threading ABI from  generic thread support.
2004-06-07 19:00:57 +00:00
davidxu
90554db906 According to SUSv3, sigwait is different with sigwaitinfo, sigwait
returns error code in return value, not in errno.
2004-06-07 13:35:02 +00:00
pjd
c66d0ff628 Remove unused code.
Submitted by:	Bjoern A. Zeeb
2004-06-07 12:19:55 +00:00
ume
3a5bdeaf2c allow more than MLEN bytes for ancillary data to meet the
requirement of Section 20.1 of RFC3542.

Obtained from:	KAME
MFC after:	1 week
2004-06-07 09:59:50 +00:00
tjr
24fcba21fb Remove a stale and misleading comment. 2004-06-07 09:35:00 +00:00
julian
85b03d3641 Move the KSE ABI specific code here and separate it from code that
is generic to any threading system. This commit does not link this
file to the build yet, nor does it remove these functions from their
current location in kern_thread.c. (that commit coming up after further review)
2004-06-07 07:25:03 +00:00
phk
4c3fd8116d Remove filename+line number from panic messages. 2004-06-06 21:26:49 +00:00
bde
e02f078768 Detect interrupt storms better. The storm detection didn't work at all
with an ASUS A7N8X-E motherboard in APIC mode, since storming interrupts
don't repeat immediately.  Use DELAY(1) to wait a bit for them to repeat.
This affects all systems.  Only delay for the first
(10 * intr_storm_threshold) interrupts (per interrupt handler) so that
this is only a pessimization while warming up.  Throttle after calling
the sub-handlers instead of before so that the long delay given by
throttling can be used instead of the DELAY(1) to detect storms after
warming up.

Reduced the throttling period from 1/10 second to 1/hz seconds so that
throttling doesn't destroy performance so much.  Interrupts that are
detected as storming are effectively handled by polling at a frequency
of hz Hz.  On A7N8X-E's there is another hardware or configuration bug
that makes the throttled frequency closer to 2*hz Hz.
2004-06-05 18:27:28 +00:00
mux
b7f9b2983e When we don't have any meaningful value to print for the device sysctl
tree, output an empty string instead of "?".  This is already what
happened with DEVICE_SYSCTL_LOCATION and DEVICE_SYSCTL_PNPINFO.  This
makes the output of "sysctl dev" much nicer (it won't display those
empty sysctls).

Reviewed by:	des
2004-06-05 11:39:05 +00:00
tjr
02a7d287a2 Change the types of vn_rdwr_inchunks()'s len and aresid arguments to
size_t and size_t *, respectively. Update callers for the new interface.
This is a better fix for overflows that occurred when dumping segments
larger than 2GB to core files.
2004-06-05 02:18:28 +00:00
tjr
445b7fecaa Back out workaround for vn_rdwr_inchunks()'s INT_MAX length limitation
after discussions with bde; vn_rdwr_inchunks() itself should be fixed.
2004-06-05 02:00:12 +00:00
phk
17b52df3d7 Centralize the line discipline optimization determination in a function
called ttyldoptim().

Use this function from all the relevant drivers.

I belive no drivers finger linesw[] directly anymore, paving the way for
locking and refcounting.
2004-06-04 21:55:55 +00:00
phk
06049d3eaf Manual edits to change linesw[]-frobbing to ttyld_*() calls. 2004-06-04 20:04:52 +00:00
phk
ba3920e2a2 Machine generated patch which changes linedisc calls from accessing
linesw[] directly to using the ttyld...() functions

The ttyld...() functions ar inline so there is no performance hit.
2004-06-04 16:02:56 +00:00
tjr
5c5d136c33 Remove a stale comment. 2004-06-04 11:00:22 +00:00
des
7fec1d4931 Add a devclass level to the dev sysctl tree, in order to support per-
class variables in addition to per-device variables.  In plain English,
this means that dev.foo0.bar is now called dev.foo.0.bar, and it is
possible to to have dev.foo.bar as well.
2004-06-04 10:23:00 +00:00
phk
41a29cfd2f Get rid of ttyregister(). All drivers now use ttymalloc() for struct
tty, so now we stand a chance of implementing refcounting and getting
rid of the damn things again.
2004-06-04 07:17:03 +00:00
phk
7e6e0efd64 Use ttymalloc() instead of ttyregister(). Use ttyioctl() instead of
direct calls to the linedisc.
2004-06-04 06:50:35 +00:00
tjr
85aaf94278 Write segments to core dump files in maximally-sized chunks that neither
exceed vn_rdwr_inchunks()'s INT_MAX length limitation nor span a block
boundary. This fixes dumping segments larger than 2GB.

PR:	67546
2004-06-04 06:30:16 +00:00
rwatson
87449e4f90 Mark sun_noname as const since it's immutable. Update definitions
of functions that potentially accept &sun_noname (sbappendaddr(),
et al) to accept a const sockaddr pointer.
2004-06-04 04:07:08 +00:00
alc
b5cd9ba03c Move the definitions of SWAPBLK_NONE and SWAPBLK_MASK from vm_page.h to
blist.h, enabling the removal of numerous #includes from subr_blist.c.
(subr_blist.c and swap_pager.c are the only users of these definitions.)
2004-06-04 04:03:26 +00:00
jhb
66f3d8ffca - Comment out NULL, NULL barrier for Unix domain sockets section as the
double NULL entries signal Witness to stop processing the array of
  order entries meaning none of the spin locks are added resulting in
  panics on boot.
- Add a missing NULL, NULL terminator to the Slip locks list to keep them
  separate from the spin locks.
2004-06-03 20:07:44 +00:00
tjr
48c79c9521 Remove checks for curthread == NULL - it can't happen. 2004-06-03 10:22:47 +00:00
tjr
7a46b27935 Move TDF_DEADLKTREAT into td_pflags (and rename it accordingly) to avoid
having to acquire sched_lock when manipulating it in lockmgr(), uiomove(),
and uiomove_fromphys().

Reviewed by:	jhb
2004-06-03 01:47:37 +00:00
rwatson
de0c6ecd47 Expand the hard-coded WITNESS lock order to include the following
relationships:

Sockets:    filedesc->accept->sellck
Routing:    radix node head->rtentry->ifaddr
UDP:        udp->udpinp
TCP:        tcp->tcpinp
SLIP:       slip_mtx->slip sc_mtx

Drop in a place holder section for UNIX domain sockets.  Various
sections to be expanded over the next few days.
2004-06-02 23:28:06 +00:00
mux
0ccfefe220 As discussed on arch@, flatten the device sysctl tree to make it
more convenient to deal with.  The notion of hierarchy is however
preserved by adding a new %parent node.
2004-06-02 22:43:35 +00:00
tjr
9bd12a2fd9 Remove a redundant "td = curthread" statement from profclock(). 2004-06-02 12:05:06 +00:00
tjr
80d36400ed Move TDF_SA from td_flags to td_pflags (and rename it accordingly)
so that it is no longer necessary to hold sched_lock while
manipulating it.

Reviewed by:	davidxu
2004-06-02 07:52:36 +00:00
jeff
33a226cf5e - Run sched_balance() and sched_balance_groups() from hardclock via
sched_clock() rather than using callouts.  This means we no longer have to
   take the load of the callout thread into consideration while balancing and
   should make the balancing decisions simpler and more accurate.

Tested on:	x86/UP, amd64/SMP
2004-06-02 05:46:48 +00:00
rwatson
576b26bafd Integrate accept locking from rwatson_netperf, introducing a new
global mutex, accept_mtx, which serializes access to the following
fields across all sockets:

          so_qlen          so_incqlen         so_qstate
          so_comp          so_incomp          so_list
          so_head

While providing only coarse granularity, this approach avoids lock
order issues between sockets by avoiding ownership of the fields
by a specific socket and its per-socket mutexes.

While here, rewrite soclose(), sofree(), soaccept(), and
sonewconn() to add assertions, close additional races and  address
lock order concerns.  In particular:

- Reorganize the optimistic concurrency behavior in accept1() to
  always allocate a file descriptor with falloc() so that if we do
  find a socket, we don't have to encounter the "Oh, there wasn't
  a socket" race that can occur if falloc() sleeps in the current
  code, which broke inbound accept() ordering, not to mention
  requiring backing out socket state changes in a way that raced
  with the protocol level.  We may want to add a lockless read of
  the queue state if polling of empty queues proves to be important
  to optimize.

- In accept1(), soref() the socket while holding the accept lock
  so that the socket cannot be free'd in a race with the protocol
  layer.  Likewise in netgraph equivilents of the accept1() code.

- In sonewconn(), loop waiting for the queue to be small enough to
  insert our new socket once we've committed to inserting it, or
  races can occur that cause the incomplete socket queue to
  overfill.  In the previously implementation, it was sufficient
  to simply tested once since calling soabort() didn't release
  synchronization permitting another thread to insert a socket as
  we discard a previous one.

- In soclose()/sofree()/et al, it is the responsibility of the
  caller to remove a socket from the incomplete connection queue
  before calling soabort(), which prevents soabort() from having
  to walk into the accept socket to release the socket from its
  queue, and avoids races when releasing the accept mutex to enter
  soabort(), permitting soabort() to avoid lock ordering issues
  with the caller.

- Generally cluster accept queue related operations together
  throughout these functions in order to facilitate locking.

Annotate new locking in socketvar.h.
2004-06-02 04:15:39 +00:00
rwatson
41a003003f Rather than assert f_type==DTYPE_VNODE, conditionally perform the
file lock release based on f_type==DTYPE_VNODE.  vn_closefile() is
used by non-vnode types as well (fifo).
2004-06-01 23:36:47 +00:00
rwatson
5adf35c004 Add GIANT_REQUIRED to kqueue_close(), since kqueue currently requires
Giant.
2004-06-01 18:05:41 +00:00
rwatson
1e76056c09 Push the VOP_ADVLOCK() call to release advisory locks on vnode file
descriptors out of fdrop_locked() and into vn_closefile().  This
removes all knowledge of vnodes from fdrop_locked(), since the lock
behavior was specific to vnodes.  This also removes the specific
requirement for Giant in fdrop_locked(), it's now only required by
code that it calls into.

Add GIANT_REQUIRED to vn_closefile() since VFS requires Giant.
2004-06-01 18:03:20 +00:00
bmilekic
9e06a1e05a Fix a couple of bugs in the mbuf and packet ctors. In the latter case,
nextpkt within the m_hdr was not being initialized to NULL for
!M_PKTHDR cases.  *Maybe* this will fix weird socket buffer
inconsistency panics, but we'll see.
2004-06-01 16:17:10 +00:00
phk
3521579704 Introduce a ttyioctl() cdevsw default function. 2004-06-01 13:39:02 +00:00
phk
e0c89dae13 There is no need to explicitly call the stop function. In all likelyhood
->l_close() did it and ttyclose certainly will.
2004-06-01 11:57:15 +00:00
rwatson
5a32935851 Add a global mutex, accept_filter_mtx, to protect the global list of
accept filters and prevent read-modify-write races.
2004-06-01 04:08:48 +00:00
rwatson
bddadcf71a The SS_COMP and SS_INCOMP flags in the so_state field indicate whether
the socket is on an accept queue of a listen socket.  This change
renames the flags to SQ_COMP and SQ_INCOMP, and moves them to a new
state field on the socket, so_qstate, as the locking for these flags
is substantially different for the locking on the remainder of the
flags in so_state.
2004-06-01 02:42:56 +00:00
truckman
d503c79cad Add MSG_NBIO flag option to soreceive() and sosend() that causes
them to behave the same as if the SS_NBIO socket flag had been set
for this call.  The SS_NBIO flag for ordinary sockets is set by
fcntl(fd, F_SETFL, O_NONBLOCK).

Pass the MSG_NBIO flag to the soreceive() and sosend() calls in
fifo_read() and fifo_write() instead of frobbing the SS_NBIO flag
on the underlying socket for each I/O operation.  The O_NONBLOCK
flag is a property of the descriptor, and unlike ordinary sockets,
fifos may be referenced by multiple descriptors.
2004-06-01 01:18:51 +00:00
bmilekic
f7574a2276 Bring in mbuma to replace mballoc.
mbuma is an Mbuf & Cluster allocator built on top of a number of
extensions to the UMA framework, all included herein.

Extensions to UMA worth noting:
  - Better layering between slab <-> zone caches; introduce
    Keg structure which splits off slab cache away from the
    zone structure and allows multiple zones to be stacked
    on top of a single Keg (single type of slab cache);
    perhaps we should look into defining a subset API on
    top of the Keg for special use by malloc(9),
    for example.
  - UMA_ZONE_REFCNT zones can now be added, and reference
    counters automagically allocated for them within the end
    of the associated slab structures.  uma_find_refcnt()
    does a kextract to fetch the slab struct reference from
    the underlying page, and lookup the corresponding refcnt.

mbuma things worth noting:
  - integrates mbuf & cluster allocations with extended UMA
    and provides caches for commonly-allocated items; defines
    several zones (two primary, one secondary) and two kegs.
  - change up certain code paths that always used to do:
    m_get() + m_clget() to instead just use m_getcl() and
    try to take advantage of the newly defined secondary
    Packet zone.
  - netstat(1) and systat(1) quickly hacked up to do basic
    stat reporting but additional stats work needs to be
    done once some other details within UMA have been taken
    care of and it becomes clearer to how stats will work
    within the modified framework.

From the user perspective, one implication is that the
NMBCLUSTERS compile-time option is no longer used.  The
maximum number of clusters is still capped off according
to maxusers, but it can be made unlimited by setting
the kern.ipc.nmbclusters boot-time tunable to zero.
Work should be done to write an appropriate sysctl
handler allowing dynamic tuning of kern.ipc.nmbclusters
at runtime.

Additional things worth noting/known issues (READ):
   - One report of 'ips' (ServeRAID) driver acting really
     slow in conjunction with mbuma.  Need more data.
     Latest report is that ips is equally sucking with
     and without mbuma.
   - Giant leak in NFS code sometimes occurs, can't
     reproduce but currently analyzing; brueffer is
     able to reproduce but THIS IS NOT an mbuma-specific
     problem and currently occurs even WITHOUT mbuma.
   - Issues in network locking: there is at least one
     code path in the rip code where one or more locks
     are acquired and we end up in m_prepend() with
     M_WAITOK, which causes WITNESS to whine from within
     UMA.  Current temporary solution: force all UMA
     allocations to be M_NOWAIT from within UMA for now
     to avoid deadlocks unless WITNESS is defined and we
     can determine with certainty that we're not holding
     any locks when we're M_WAITOK.
   - I've seen at least one weird socketbuffer empty-but-
     mbuf-still-attached panic.  I don't believe this
     to be related to mbuma but please keep your eyes
     open, turn on debugging, and capture crash dumps.

This change removes more code than it adds.

A paper is available detailing the change and considering
various performance issues, it was presented at BSDCan2004:
http://www.unixdaemons.com/~bmilekic/netbuf_bmilekic.pdf
Please read the paper for Future Work and implementation
details, as well as credits.

Testing and Debugging:
    rwatson,
    brueffer,
    Ketrien I. Saihr-Kesenchedra,
    ...
Reviewed by: Lots of people (for different parts)
2004-05-31 21:46:06 +00:00
rwatson
13656d723e Assert Giant in vn_start_write() and vn_finished_write(). 2004-05-31 20:56:10 +00:00
rwatson
afc098b3e1 Assert Giant in vrele(). 2004-05-31 19:06:01 +00:00
phk
30a7ac8468 Add missing #include <sys/module.h> 2004-05-30 20:34:58 +00:00
phk
d6f7d2bde6 Add some missing <sys/module.h> includes which are masked by the
one on death-row in <sys/kernel.h>
2004-05-30 17:57:46 +00:00
tjr
2bc3263ac9 Enable MI bits for gcc -ftest-coverage -fprofile-arcs on amd64. 2004-05-29 01:18:14 +00:00
pjd
19d2b54248 Sysctl hw.bus.devctl_disable shouldn't be writtable from inside a jail.
Approved by:	imp
2004-05-26 16:36:32 +00:00