Commit Graph

7218 Commits

Author SHA1 Message Date
bmilekic
9e06a1e05a Fix a couple of bugs in the mbuf and packet ctors. In the latter case,
nextpkt within the m_hdr was not being initialized to NULL for
!M_PKTHDR cases.  *Maybe* this will fix weird socket buffer
inconsistency panics, but we'll see.
2004-06-01 16:17:10 +00:00
phk
3521579704 Introduce a ttyioctl() cdevsw default function. 2004-06-01 13:39:02 +00:00
phk
e0c89dae13 There is no need to explicitly call the stop function. In all likelyhood
->l_close() did it and ttyclose certainly will.
2004-06-01 11:57:15 +00:00
rwatson
5a32935851 Add a global mutex, accept_filter_mtx, to protect the global list of
accept filters and prevent read-modify-write races.
2004-06-01 04:08:48 +00:00
rwatson
bddadcf71a The SS_COMP and SS_INCOMP flags in the so_state field indicate whether
the socket is on an accept queue of a listen socket.  This change
renames the flags to SQ_COMP and SQ_INCOMP, and moves them to a new
state field on the socket, so_qstate, as the locking for these flags
is substantially different for the locking on the remainder of the
flags in so_state.
2004-06-01 02:42:56 +00:00
truckman
d503c79cad Add MSG_NBIO flag option to soreceive() and sosend() that causes
them to behave the same as if the SS_NBIO socket flag had been set
for this call.  The SS_NBIO flag for ordinary sockets is set by
fcntl(fd, F_SETFL, O_NONBLOCK).

Pass the MSG_NBIO flag to the soreceive() and sosend() calls in
fifo_read() and fifo_write() instead of frobbing the SS_NBIO flag
on the underlying socket for each I/O operation.  The O_NONBLOCK
flag is a property of the descriptor, and unlike ordinary sockets,
fifos may be referenced by multiple descriptors.
2004-06-01 01:18:51 +00:00
bmilekic
f7574a2276 Bring in mbuma to replace mballoc.
mbuma is an Mbuf & Cluster allocator built on top of a number of
extensions to the UMA framework, all included herein.

Extensions to UMA worth noting:
  - Better layering between slab <-> zone caches; introduce
    Keg structure which splits off slab cache away from the
    zone structure and allows multiple zones to be stacked
    on top of a single Keg (single type of slab cache);
    perhaps we should look into defining a subset API on
    top of the Keg for special use by malloc(9),
    for example.
  - UMA_ZONE_REFCNT zones can now be added, and reference
    counters automagically allocated for them within the end
    of the associated slab structures.  uma_find_refcnt()
    does a kextract to fetch the slab struct reference from
    the underlying page, and lookup the corresponding refcnt.

mbuma things worth noting:
  - integrates mbuf & cluster allocations with extended UMA
    and provides caches for commonly-allocated items; defines
    several zones (two primary, one secondary) and two kegs.
  - change up certain code paths that always used to do:
    m_get() + m_clget() to instead just use m_getcl() and
    try to take advantage of the newly defined secondary
    Packet zone.
  - netstat(1) and systat(1) quickly hacked up to do basic
    stat reporting but additional stats work needs to be
    done once some other details within UMA have been taken
    care of and it becomes clearer to how stats will work
    within the modified framework.

From the user perspective, one implication is that the
NMBCLUSTERS compile-time option is no longer used.  The
maximum number of clusters is still capped off according
to maxusers, but it can be made unlimited by setting
the kern.ipc.nmbclusters boot-time tunable to zero.
Work should be done to write an appropriate sysctl
handler allowing dynamic tuning of kern.ipc.nmbclusters
at runtime.

Additional things worth noting/known issues (READ):
   - One report of 'ips' (ServeRAID) driver acting really
     slow in conjunction with mbuma.  Need more data.
     Latest report is that ips is equally sucking with
     and without mbuma.
   - Giant leak in NFS code sometimes occurs, can't
     reproduce but currently analyzing; brueffer is
     able to reproduce but THIS IS NOT an mbuma-specific
     problem and currently occurs even WITHOUT mbuma.
   - Issues in network locking: there is at least one
     code path in the rip code where one or more locks
     are acquired and we end up in m_prepend() with
     M_WAITOK, which causes WITNESS to whine from within
     UMA.  Current temporary solution: force all UMA
     allocations to be M_NOWAIT from within UMA for now
     to avoid deadlocks unless WITNESS is defined and we
     can determine with certainty that we're not holding
     any locks when we're M_WAITOK.
   - I've seen at least one weird socketbuffer empty-but-
     mbuf-still-attached panic.  I don't believe this
     to be related to mbuma but please keep your eyes
     open, turn on debugging, and capture crash dumps.

This change removes more code than it adds.

A paper is available detailing the change and considering
various performance issues, it was presented at BSDCan2004:
http://www.unixdaemons.com/~bmilekic/netbuf_bmilekic.pdf
Please read the paper for Future Work and implementation
details, as well as credits.

Testing and Debugging:
    rwatson,
    brueffer,
    Ketrien I. Saihr-Kesenchedra,
    ...
Reviewed by: Lots of people (for different parts)
2004-05-31 21:46:06 +00:00
rwatson
13656d723e Assert Giant in vn_start_write() and vn_finished_write(). 2004-05-31 20:56:10 +00:00
rwatson
afc098b3e1 Assert Giant in vrele(). 2004-05-31 19:06:01 +00:00
phk
30a7ac8468 Add missing #include <sys/module.h> 2004-05-30 20:34:58 +00:00
phk
d6f7d2bde6 Add some missing <sys/module.h> includes which are masked by the
one on death-row in <sys/kernel.h>
2004-05-30 17:57:46 +00:00
tjr
2bc3263ac9 Enable MI bits for gcc -ftest-coverage -fprofile-arcs on amd64. 2004-05-29 01:18:14 +00:00
pjd
19d2b54248 Sysctl hw.bus.devctl_disable shouldn't be writtable from inside a jail.
Approved by:	imp
2004-05-26 16:36:32 +00:00
tmm
7b769ce88f Retire cpu_sched_exit(); it is not used any more. 2004-05-26 12:09:39 +00:00
des
7eb92d1257 As previously threatened, give each device its own sysctl context and
subtree (under the new dev top-level node).  This should greatly simplify
drivers which need per-device sysctl variables (such as ndis).
2004-05-25 12:06:26 +00:00
gad
d284b07886 Implement the new KERN_PROC_RGID option, and also implement the
KERN_PROC_SESSION option which had been previously defined but
never implemented.

PR:		bin/65803  (a very tiny piece of the PR)`
Submitted by:	Cyrille Lefevre
2004-05-22 23:11:44 +00:00
davidxu
e7578c3795 Clear KSE thread flags after KSE thread mode is ended. The side effect
of not clearing the flags for execv() syscall will result that a new
program runs in KSE thread mode without enabling it.

Submitted by: tjr
Modified by: davidxu
2004-05-21 14:50:23 +00:00
bde
9ec48f4cab Fixed some style bugs in tdsigwakeup(). 2004-05-21 10:02:24 +00:00
jhb
91621895aa In tdsigwakeup(), use TD_ON_SLEEPQ() rather than TD_IS_SLEEPING() to see if
a thread is on a sleep queue and should have it's sleep aborted.

Reported by:	Thierry Herbelot thierry at herbelot dot com
2004-05-20 20:17:28 +00:00
bde
b3597241df Fixed printf format errors which helped break GUPROF for arches with
64-bit function pointers.
2004-05-20 16:48:17 +00:00
bde
38ad669603 Initialize the history counter type field in struct gmonparam as
threatened in rev.1.10 of usr.sbin/kgmon/kgmon.c more than 2 years ago.
kgmon has been recovering from the missing initialization for too
long, but the fixup there is ifdefed for i386's and shouldn't be
needed for other arches.
2004-05-20 16:42:39 +00:00
bde
36151be4c9 Moved i386 asms to an i386 header. The asms are for calibration of
high resolution kernel profiling (options GUPROF.  "U" in GUPROF stands
for microseconds resolution, but the resolution is now smaller than 1
nanosecond on multi-GHz machines and the accuracy is heading towards
1 nanosecond too).  Arches that support GUPROF must now provide certain
macros for the calibration.  GUPROF is now only supported for i386's,
so the absence of the new macros for other arches doesn't break anything
that wasn't already broken.  amd64's have uncommitted support for
GUPROF, and sparc64's have support that seems to be complete except
here (there was an #error for non-i386 cases; now there are undefined
macros).

Changed the asms a little:
- declare them as __volatile.  They must not be moved, and exporting a
  label across asms is technically incorrect, so try harder to stop gcc
  moving them.
- don't put the non-clobbered register "bx" in the clobber list.  The
  clobber lists are still more conservative than necessary.
- drop the non-support for gcc-1.  It just gave a better error message,
  and this is not useful since compiling with gcc-1 would cause thousands
  of worse error messages.
- drop the support for aout.
2004-05-20 16:12:19 +00:00
pjd
7cbfe4913b Fix sysctl name: security.jail.getfsstate_getfsstatroot_only ->
security.jail.getfsstatroot_only.

Approved by:	rwatson
2004-05-20 05:28:44 +00:00
bde
eef4a49645 Include <sys/gmon.h> instead of <machine/profile.h> for the declaration
of kmupetext().  The declaration is misplaced in <machine/profile.h>
since it is not MD and not related to the lowest level of profiling.
It will be moved, but getting it via <sys/gmon.h> already works.
2004-05-19 14:36:38 +00:00
ps
b36520446e syncache broke rev 1.23 which was done to fix the "thundering herd"
problem in Apache.  Fix it.

Reviewed by:	peter
2004-05-19 00:22:10 +00:00
peter
235a3d8f64 If a symbol has section+offset definitions provided, always use instead
of doing a name lookup for global symbols.  This fixes the snd_pcm module.
2004-05-18 05:15:43 +00:00
peter
a6091b3b08 Remove leftover padding variables.
Convert some silent 'ignore programmer error' cases into panics
Remove 'align' field from section table (no longer needed)
2004-05-18 05:14:19 +00:00
peter
867065a3a4 Since we go to the trouble of compiling the kobj ops table for each class,
and cannot handle it going away, add an explicit reference to the kobj
class inside each linker class.  Without this, a class with no modules
loaded will sit with an idle refcount of 0.  Loading and unloading
a module with it causes a 0->1->0 transition which frees the ops table
and causes subsequent loads using that class to explode.  Normally, the
"kernel" module will remain forever loaded and prevent this happening, but
if you have more than one linker class active, only one owns the "kernel".

This finishes making modules work for kldload(8) on amd64.
2004-05-17 21:24:39 +00:00
peter
b7f2f12793 Clean up the code some more. Unify the text/data (progbits) and bss
(nobits) tables to simplify some code.  Try and shorten some of the very
wide lines.  Somewhere along the way, I think I fixed the memory
corruption that caused panics after going multiuser.
2004-05-17 21:20:23 +00:00
peter
961476df3c Oops, use the generic ELF_ST_BIND() macro instead of ELF64_ST_BIND.
Submitted by:  marks
2004-05-17 00:51:34 +00:00
peter
ea4215c521 Make a small revision to the api between the elf linker core and the
elf_reloc() backends for two reasons.  First, to support the possibility
of there being two elf linkers in the kernel (eg: amd64), and second, to
pass the relocbase explicitly (for relocating .o format kld files).
2004-05-16 20:00:28 +00:00
bde
802b835b3d Fixed some common printf format errors. Don't assume that "struct foo *"
is "void *" (it isn't) or that the default promotion of pid_t is int.
Instead, assume that casting "struct foo *" to "void *" and printing the
result with %p is useful, and that all pid_t's are representable as longs.

Fixed some minor style bugs (mainly spelling errors in comments).
2004-05-14 20:51:42 +00:00
jhb
4e9e9bbec8 Split sleepq_wakeup_thread() into two functions. sleepq_remove_thread()
removes a specific thread from a sleep queue.  sleepq_resume_thread()
resumes scheduling of a thread that has been previously removed from a
sleep queue.
- sleepq_catch_signals() just removes a thread from the queue it was just
  added to when a pending signal is found.
- sleepq_signal() and sleepq_broadcast() remove threads from a queue,
  drop the queue lock, and then resume all the previously removed threads.
  This doesn't completely fix the sched_lock <-> sleepq chain LOR, but it
  makes it a little better as we no longer call setrunnble() with a sleep
  queue lock held meaning if setrunnable() tries to wakeup the swapper we
  don't try to lock two sleep queue chains at the same time.
2004-05-13 20:00:43 +00:00
tjr
e167ef630d Eliminate a memory leak in kern_symlink() that could occur if
vn_start_write() failed.
2004-05-11 10:42:02 +00:00
julian
790c941069 Remove misplaced duplicate comment and slightly reformat the
version that was in the right place.
2004-05-09 22:29:14 +00:00
sam
c89a947571 set m_len to reflect mbuf contents on return from m_dup1; fixes an obscure
m_pullup case that contributed to breaking ipcomp in tunnel mode for kame

Submitted by:	itojun
Obtained from:	kame
2004-05-09 05:57:58 +00:00
julian
98d03d5d06 Fix rtprio() to do sensible things when called from threaded processes.
It's not quite correct from a posix Point Of view, but it is a lot better
than what was there before. This will be revisited later
when we decide what form our priority extensions will take. Posix doesn't
specify  how a system scope thread can change its priority so you need to
add non-standard extensions to be able to do it..
For now make this slightly non standard to allow it to be done.

Submitted by:	Dan Eischen originally, changed by myself.
2004-05-08 08:56:05 +00:00
alc
b0b9413238 Avoid pointless zeroing of the bogus page in vfs_bio_clrbuf().
Suggested by:	tegge@	(from October of last year)
2004-05-08 06:46:40 +00:00
rwatson
7128a3b5cc Unconditionally lock Giant in do_sendfile(), rather than locking it
conditional on debug.mpsafenet.  We can try pushing down Giant here
later, but we don't want to enter VFS without holding Giant.

Bumped into by:	kris
2004-05-08 02:24:21 +00:00
cognet
6897229d63 Compare t_brkc against (char)_POSIX_VDISABLE, not against -1.
Discussed with:	bde
2004-05-07 15:35:38 +00:00
njl
e9ec8dbd49 Move the CPU newbus attachment to i386 legacy. The acpi_cpu device will
become just "cpu" and provide attachments in the !legacy case.

Tested by:	des
2004-05-06 15:54:02 +00:00
alc
b57e5e03fd Make vm_page's PG_ZERO flag immutable between the time of the page's
allocation and deallocation.  This flag's principal use is shortly after
allocation.  For such cases, clearing the flag is pointless.  The only
unusual use of PG_ZERO is in vfs_bio_clrbuf().  However, allocbuf() never
requests a prezeroed page.  So, vfs_bio_clrbuf() never sees a prezeroed
page.

Reviewed by:	tegge@
2004-05-06 05:03:23 +00:00
rwatson
9ec8ab1c20 Add /* !MAC */ to final #endif. 2004-05-03 22:54:46 +00:00
rwatson
16bb0e59b9 Bump copyright date for NETA to 2004. 2004-05-03 20:53:27 +00:00
rwatson
a857ce2f0a Add MAC_STATIC, a kernel option that disables internal MAC Framework
synchronization protecting against dynamic load and unload of MAC
policies, and instead simply blocks load and unload.  In a static
configuration, this allows you to avoid the synchronization costs
associated with introducing dynamicism.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, McAfee Research
2004-05-03 20:53:05 +00:00
cperciva
fe4e3a8b16 Fix a race condition which could result in profprocs being decremented
more than once if stopprofclock is called multiple times on the same
process.
2004-05-03 00:48:11 +00:00
peter
929c81a05b Checkpoint commit for an alternative WIP kernel module loader that isn't
as dependent on binutils features/quirks as the current one.  This one
loads plain .o files without having to mess with shared object mode.

This happens to be essential on amd64, because binutils hasn't implemented
all the quirks/features that we need for producing the hack non-PIC shared
objects.  As it turned out, .o format isn't all that inconvenient after
all.  It looks like the ability to use the same .o files for linking
directly into a static kernel or loading as a module might be worth it.

It is still very much a work-in-progress, but it is almost usable.  Other
changes are still needed in order to use it though, these have not been
committed yet.  There is still a memory corruption/overrun bug somewhere.
For example, test modules load and work, but the machine explodes a few
minutes later in vm_forkproc() or the like.  Notable missing things
include kldxref support, and loader(8) support.  I wanted to figure out
a working baseline set of code first.
2004-04-30 16:32:40 +00:00
deischen
122d328ccb Keep track of threads waiting in kse_release() to avoid a race
condition where kse_wakeup() doesn't yet see them in (interruptible)
sleep queues.  Also add an upcall check to sleepqueue_catch_signals()
suggested by jhb.

This commit should fix recent mysql hangs.

Reviewed by:	jhb, davidxu
Mysql'd by:	Robin P. Blanchard <robin.blanchard at gactr uga edu>
2004-04-28 20:36:53 +00:00
das
9df402daa4 If the buffer supplied to kenv(KENV_DUMP, ...) isn't big enough,
return the number of bytes needed instead of 0.  The manpage claims
that we do this anyway.
2004-04-28 01:27:33 +00:00
bmilekic
6bbcc9da29 Give jail(8) the feature to allow raw sockets from within a
jail, which is less restrictive but allows for more flexible
jail usage (for those who are willing to make the sacrifice).
The default is off, but allowing raw sockets within jails can
now be accomplished by tuning security.jail.allow_raw_sockets
to 1.

Turning this on will allow you to use things like ping(8)
or traceroute(8) from within a jail.

The patch being committed is not identical to the patch
in the PR.  The committed version is more friendly to
APIs which pjd is working on, so it should integrate
into his work quite nicely.  This change has also been
presented and addressed on the freebsd-hackers mailing
list.

Submitted by: Christian S.J. Peron <maneo@bsdpro.com>
PR: kern/65800
2004-04-26 19:46:52 +00:00