Commit Graph

4188 Commits

Author SHA1 Message Date
Andrey A. Chernov
63347f1e8f lseek: simplify overflow checks 2001-08-29 18:35:53 +00:00
Robert Watson
3c4543e046 o Reduce gratuitous whitespace difference from Darwin. 2001-08-29 17:18:04 +00:00
Peter Wemm
df55753880 Fix the ogetkerninfo() syscall handling of sizes for
KINFO_BSDI_SYSINFO.  This supposedly fixes Netscape 3.0.4 (bsdi binary)
on -current.  (and is also applicable to RELENG_4)

PR:		25476
Submitted by:	Philipp Mergenthaler <un1i@rz.uni-karlsruhe.de>
2001-08-29 11:47:53 +00:00
Brian Somers
546a92c4d4 OR M_WAITOK with M_ZERO in malloc()s args for clarity. 2001-08-28 23:58:32 +00:00
Robert Watson
7fd6a9596d o Improve the style of a number of routines and comments in kern_prot.c,
with regards to redundancy, formatting, and style(9).

Submitted by:	bde
2001-08-28 16:35:33 +00:00
Robert Watson
4bcbade869 Fix typos in recent comments.
Submitted by:	dd
2001-08-28 05:16:19 +00:00
Robert Watson
3b243b7292 Generally improve documentation of kern_prot.c:
o Add comments for:
  - kern.security.suser_permitted
  - p_cansee()
  - p_cansignal()
  - p_cansched()
  - kern.security.unprivileged_procdebug_permitted
  - p_candebug()

Update copyright.

Obtained from:	TrustedBSD
2001-08-27 16:01:52 +00:00
Peter Wemm
0f7289022b If a file has been completely unlinked, stop automatically syncing the
file.  ffs will discard any pending dirty pages when it is closed,
so we may as well not waste time trying to clean them.  This doesn't
stop other things from writing it out, eg: pageout, fsync(2) etc.
2001-08-27 06:09:56 +00:00
Andrey A. Chernov
c4778eed9f Cosmetique & style fixes from bde 2001-08-26 10:23:49 +00:00
Peter Wemm
268bdb43f9 Optionize UPAGES for the i386. As part of this I split some of the low
level implementation stuff out of machine/globaldata.h to avoid exposing
UPAGES to lots more places.  The end result is that we can double
the kernel stack size with 'options UPAGES=4' etc.

This is mainly being done for the benefit of a MFC to RELENG_4 at some
point.  -current doesn't really need this so much since each interrupt
runs on its own kstack.
2001-08-25 02:20:02 +00:00
Bosko Milekic
76dcbd6f9f Force a commit on kern_mutex.c to explain reason for last commit but while
I'm at it also add a comment in mtx_validate() explaining the purpose
of the last change.

Basically, this fixes booting kernels compiled with MUTEX_DEBUG. What used
to happen is before we setidt from init386() [still using BTX idt], we
called mtx_init() on several mutex locks, notably Giant and some others.
This is a problem for MUTEX_DEBUG because it enables mtx_validate() which
calls kernacc(), some of which in turn requires Giant.
Fix by calling kernacc() from mtx_validate() only if (!cold).
2001-08-24 23:00:59 +00:00
Bosko Milekic
ab07087e16 *** empty log message *** 2001-08-24 22:53:45 +00:00
John Baldwin
6385dec00e Style nits:
- Don't use punctuation or newlines in panic messages.
- Remove excess blank lines.

Requested and partially submitted by:	bde
2001-08-24 17:46:58 +00:00
Peter Pentchev
ccdbd10cb7 Prevent passing a null pointer as a filename to vn_open(),
if for some reason expand_name() failed to build a core file name.

PR:		29931
Submitted by:	Foldi Tamas <crow@kapu.hu>
Reviewed by:	dd, -arch
MFC after:	1 month
2001-08-24 15:49:30 +00:00
Andrey A. Chernov
dc6e1079e6 Remove extra check unneded now 2001-08-24 10:20:26 +00:00
Robert Watson
670f6b2fc6 o Clarify comments in vaccess_acl_posix1e() ACL evaluation routine so
as to improve readability and accuracy.

Obtained from:	TrustedBSD Project
2001-08-24 01:41:42 +00:00
John Baldwin
b0b7cb508c Use witness_upgrade/downgrade for sx_try_upgrade/downgrade. 2001-08-23 22:51:22 +00:00
John Baldwin
c19fe5e261 Add witness_upgrade() and witness_downgrade() for handling upgrades and
downgrades of shared/exclusive locks.
2001-08-23 22:47:05 +00:00
John Baldwin
d7c4536a55 Convert some KASSERT()'s into if (foo) panic() because they are testing
how locks are managed by the rest of the kernel, not verifying the internal
integrity of witness itself.
2001-08-23 22:44:47 +00:00
John Baldwin
1432aa0c5e Add a new kernel option RESTARTABLE_PANICS. If this option is present,
then one can restart from a panic by resetting the panicstr variable to
NULL.  This commit conditionalizes the previously committed functionality
on this variable.  It also removes the __dead2 attribute from the panic()
function so that when one continues from a panic() the behavior will
be predictable.
2001-08-23 20:32:21 +00:00
John Baldwin
e2870579fa Clear the sx_xholder pointer when downgrading an exclusive lock. 2001-08-23 17:57:37 +00:00
Andrey A. Chernov
5d97bedb22 vn_stat(): if va_size (u_quad_t) > OFF_MAX, return EOVERFLOW, don't copy it
blindly to st_size
2001-08-23 17:56:48 +00:00
Andrey A. Chernov
6fb9fbceab Add yet one check for SEEK_END overflow 2001-08-23 17:09:23 +00:00
Andrey A. Chernov
db106eff39 lseek: fix check for vattr.va_size overflow. Check suggested by bde simple not
works with unsigned types.
2001-08-23 17:01:25 +00:00
Andrey A. Chernov
62be011ebd Oops, fix my broken handling of new l_len<0 case 2001-08-23 16:00:27 +00:00
Andrey A. Chernov
f510e1c2ec Originally BSD return EINVAL for l_len < 0, but now POSIX wants it too,
so implement POSIX l_len < 0 handling.
2001-08-23 15:40:30 +00:00
Andrey A. Chernov
6d24c65d96 Cosmetique: correct English in comments
Pointed by:	bde
2001-08-23 14:41:39 +00:00
Andrey A. Chernov
b82f5b624c Cosmetique: more <sys/*> into one group, separate include families by
blank line
2001-08-23 13:51:17 +00:00
Andrey A. Chernov
b44af710d3 Move <machine/*> after <sys/*>
Pointed by:	bde
2001-08-23 13:21:17 +00:00
Andrey A. Chernov
4b207d9868 Move <machine/*> after <sys/*>
Add missing fdrop() before EOVERFLOW

Pointed by:	bde
2001-08-23 13:19:32 +00:00
Andrey A. Chernov
69cc1d0d7f Detect off_t EOVERFLOW of start/end offsets calculations for adv. lock,
as POSIX require.
2001-08-23 07:42:40 +00:00
Thomas Moestl
040ef07af8 Regenerate from syscalls.master using the new makesyscalls.sh revision. 2001-08-22 23:27:20 +00:00
Thomas Moestl
a4189a088b Add padding before each element of the syscall argument structures in
sysproto.h in addition to the existing padding afterwards.
This is needed to support big-endian architectures like sparc64.

Reviewed by:	bde
Tested on alpha by:	jhb
2001-08-22 23:22:47 +00:00
Alexander Langer
b8c526df70 Fix a simple typo I just happened to find. 2001-08-22 19:12:24 +00:00
Matthew Dillon
0cf5e0ebd6 Remove the code that limited the buffer_map to 1/2 the size of the
kernel_map.  maxbcache takes care of this now and the 1/2 limit can
interfere with testing.

Suggested by: bde
2001-08-22 18:10:37 +00:00
Matthew Dillon
219d632c15 Move most of the kernel submap initialization code, including the
timeout callwheel and buffer cache, out of the platform specific areas
and into the machine independant area.  i386 and alpha adjusted here.
Other cpus can be fixed piecemeal.

Reviewed by:    freebsd-smp, jake
2001-08-22 04:07:27 +00:00
John Baldwin
61e9650010 Clear db_active in boot() so that one can call the boot function (as well
as use the panic command) w/o having to manually clear db_active first
to avoid the db_error() in mi_switch().
2001-08-21 23:29:40 +00:00
John Baldwin
b285782b29 Release the sched_lock before bombing out in mi_switch() via db_error().
This makes things slightly easier if you call a function that calls
mi_switch() as it keeps the locking before and after closer.
2001-08-21 23:10:37 +00:00
John Baldwin
1a5333c37c Allow one to restart from a panic in DDB by clearing the panicstr
variable to NULL.  Note that since panic() is marked with __dead2, this
has somewhat unpredictable results at best.
2001-08-21 22:55:20 +00:00
Andrey A. Chernov
383f169d4a Make lseek() POSIXed: for non character special files
1) handle off_t overflow with EOVERFLOW
2) handle negative offsets with EINVAL

Reviewed by:	arch discussion
2001-08-21 21:20:42 +00:00
John Baldwin
161778121a Add a hook to mi_switch() to abort via db_error() if we attempt to
perform a context switch from DDB.

Consulting from:	bde
2001-08-21 20:09:05 +00:00
John Baldwin
91a4536f22 - Fix a bug in the previous workaround for the tsleep/endtsleep race.
callout_stop() would fail in two cases:
    1) The timeout was currently executing, and
    2) The timeout had already executed.
  We only needed to work around the race for 1).  We caught some instances
  of 2) via the PS_TIMEOUT flag, however, if endtsleep() fired after the
  process had been woken up but before it had resumed execution,
  PS_TIMEOUT would not be set, but callout_stop() would fail, so we
  would block the process until endtsleep() resumed it.  Except that
  endtsleep() had already run and couldn't resume it.  This adds a new flag
  PS_TIMOFAIL to indicate the case of 2) when PS_TIMEOUT isn't set.
- Implement this race fix for condition variables as well.

Tested by:	sos
2001-08-21 18:42:45 +00:00
Peter Wemm
e8ebc08f80 Make COMPAT_43 optional again. XXX we need COMPAT_FBSD3 etc for this
stuff.
2001-08-21 02:32:59 +00:00
Ian Dowse
8774836bf8 Avoid sleeping while holding a mutex in dounmount(). This problem
has existed for a long time, but I made it worse a few months ago
by by adding calls to VFS_ROOT() and checkdirs() in revision 1.179.

Also, remove the LK_REENABLE flag in the lockmgr() call; this flag
has been ignored by the lockmgr code for 4 years. This was the only
remaining mention of it apart from its definition.

Reviewed by:	jhb
2001-08-20 19:16:31 +00:00
Matthew Dillon
e1616f3a7b Conditionalize VM_SWZONE_SIZE_MAX and VM_BCACHE_SIZE_MAX so MD sections
that don't define these constants don't break.
2001-08-20 16:29:13 +00:00
Dima Dorfman
fcd7e67061 Sync the default module search path with the one in
sys/boot/common/module.c.

PR:		21405
Submitted by:	Makoto MATSUSHITA <matusita@jp.FreeBSD.org>
2001-08-20 01:12:28 +00:00
Matthew Dillon
2f9e4e8025 Limit the amount of KVM reserved for the buffer cache and for swap-meta
information.  The default limits only effect machines with > 1GB of ram
and can be overriden with two new kernel conf variables VM_SWZONE_SIZE_MAX
and VM_BCACHE_SIZE_MAX, or with loader variables kern.maxswzone and
kern.maxbcache.  This has the effect of leaving more KVM available for
sizing NMBCLUSTERS and 'maxusers' and should avoid tripups where a sysad
adds memory to a machine and then sees the kernel panic on boot due to
running out of KVM.

Also change the default swap-meta auto-sizing calculation to allocate half
of what it was previously allocating.  The prior defaults were way too high.
Note that we cannot afford to run out of swap-meta structures so we still
stay somewhat conservative here.
2001-08-20 00:41:12 +00:00
Julian Elischer
a8cfc0ee40 Forgot to remove this un-needed test. (M_WAITOK won't fail)
I vaguely remember someone once proving it COULD return NULL..
was that changed?

Reminded by: BDE

MFC after:	2 weeks
2001-08-19 04:30:13 +00:00
Julian Elischer
ad4ff09012 fix typo
Submitted by:	Ian Dowse <iedowse@maths.tcd.ie>
2001-08-18 17:43:29 +00:00
Mark Peek
29b7fbd17f Unbreak linux compatibility by providing the correct length of the buffer.
Reported by:	"Pierre Y. Dampure" <pierre.dampure@westmarsh.com>,
		"Niels Chr. Bank-Pedersen" <ncbp@bank-pedersen.dk>
Pointy hat to:	mp
2001-08-18 04:24:30 +00:00
Julian Elischer
8f364875fe Don't alocate a 400 byte buffer on the stack,
Nor 800 bytes of structures..

MFC after:	2 weeks
2001-08-18 02:53:50 +00:00
Dima Dorfman
0c1bb4fbf1 Implement a LOCAL_PEERCRED socket option which returns a
`struct xucred` with the credentials of the connected peer.
Obviously this only works (and makes sense) on SOCK_STREAM
sockets.  This works for both the connect(2) and listen(2)
callers.

There is precise documentation of the semantics in unix(4).

Reviewed by:	dwmalone (eyeballed)
2001-08-17 22:01:18 +00:00
Peter Wemm
0ecd57ad0b Fix part of another problem that bde pointed out. This is different
to what bde suggested though.
2001-08-16 23:43:24 +00:00
Peter Wemm
5a66a2532b Remove redundant null-termination. The buffer is already explicitly
zeroed, and we intentionally leave -1 on the strncpy length to leave
the original \0.

Submitted by: bde
2001-08-16 20:18:43 +00:00
Peter Wemm
a75a0c55f4 Don't explicitly null-terminate. The buffer we are copying into is
already zeroed, and we explicitly leave the last byte untouched.

Submitted by: bde
2001-08-16 20:16:20 +00:00
Mark Peek
911c2be00b Reduce stack allocation (stack-fast?).
elf_load_file()   =>  352 to 52 bytes
    exec_elf_imgact() => 1072 to 48 bytes
    elf_corehdr()     =>  396 to  8 bytes

Reviewed by:	julian
2001-08-16 16:14:26 +00:00
Peter Wemm
77330eeba7 Use the backwards compatability mechanisms so that ps/top etc dont have
unnecessary breakage.

While here, use explicit sizes for the string fields so that we dont
have unintentional changes again in the future when key tunables change.

This still is not quite right, but a june userland is happy with
a -current kernel with these tweaks.
2001-08-16 08:41:15 +00:00
Peter Wemm
6eef6816a8 Use explicit sizes for the prpsinfo command length string so that
we dont have any more unexpected changes in core dumps.  This gets us
back to the original core dump layout from a few days ago.
2001-08-16 08:35:51 +00:00
Bruce Evans
a572c95c3b Don't dump on the label sector or below. This avoids clobbering the
label if the dump device overflaps the label (which is a slight
misconfiguration).  Dump routines don't use dscheck(), so the normal
write protection of the label doesn't help.

Reduced some nearby overflow bugs.  In disk_dumpcheck(), there was
(fatal but fail-safe) overflow on i386's with 4GB of memory, at least
if Maxmem was the top page (can this happen?).  The fix assumes that
the sector size divides PAGE_SIZE (dump routines already assume this).
In setdumpdev(), the corresponding overflow occurred with only about
2GB of memory on all machines with 32-bit ints.  This allowed setdumpdev()
to succeed when it shouldn't have, but then disk_dumpcheck() failed
safe later.  Except in old versions of FreeBSD like RELENG_3 where
there is no disk_dumpcheck().

PR:		28164 (label clobbering part)
MFC after:	1 week
2001-08-15 11:35:45 +00:00
Jason Evans
54db32e945 Implement kernel semaphores.
Reviewed by:	jhb
2001-08-14 22:13:14 +00:00
Jason Evans
d55229b72e Add sx_try_upgrade() and sx_downgrade().
Submitted by:	Alexander Kabaev <ak03@gte.com>
2001-08-13 21:25:30 +00:00
John Baldwin
3f085c228e If we've panic'd already, then just bail in lockmgr rather than blocking or
possibly panic'ing again.
2001-08-10 23:29:15 +00:00
Bill Paul
c214e6636e Fix some of the GDB linkage setup. The l_name member of the gdb linkage
structure is always free()ed yet only sometimes malloc()ed. In particular,
it was simply set to point to l_filename from the a linker_file_t in
link_elf_link_preload_finish(). The l_filename had been malloc()ed inside
the kern_linker.c module and was being free()ed twice: once by
link_elf_unload_file() and again by linker_file_unload(), leading to
a panic.

How to duplicate the problem:

- Pre-load a kernel module from the loader, i.e. if_sis.ko
- Boot system
- Attempt to unload module with kldunload if_sis
- Bewm

The problem here is that the case where the module was loaded with kldload
after system boot would work correctly, so this bug went unnoticed until
I stubbed my toe on it just now. (Also, you can only trip this bug if
you compile a kernel with options DDB, but that's the default now.)

Fix: remember to malloc() a separate copy of the module name for the
l_name member of the gdb linkage structure in three places where the
linkage structure can be initialized.
2001-08-10 23:15:13 +00:00
John Baldwin
688ebe120c - Close races with signals and other AST's being triggered while we are in
the process of exiting the kernel.  The ast() function now loops as long
  as the PS_ASTPENDING or PS_NEEDRESCHED flags are set.  It returns with
  preemption disabled so that any further AST's that arrive via an
  interrupt will be delayed until the low-level MD code returns to user
  mode.
- Use u_int's to store the tick counts for profiling purposes so that we
  do not need sched_lock just to read p_sticks.  This also closes a
  problem where the call to addupc_task() could screw up the arithmetic
  due to non-atomic reads of p_sticks.
- Axe need_proftick(), aston(), astoff(), astpending(), need_resched(),
  clear_resched(), and resched_wanted() in favor of direct bit operations
  on p_sflag.
- Fix up locking with sched_lock some.  In addupc_intr(), use sched_lock
  to ensure pr_addr and pr_ticks are updated atomically with setting
  PS_OWEUPC.  In ast() we clear pr_ticks atomically with clearing
  PS_OWEUPC.  We also do not grab the lock just to test a flag.
- Simplify the handling of Giant in ast() slightly.

Reviewed by:	bde (mostly)
2001-08-10 22:53:32 +00:00
John Baldwin
827dcaf663 Make witness compile w/o DDB.
Reported by:	wpaul
2001-08-10 22:33:59 +00:00
Ian Dowse
a9a8ba3d71 Arbitrarily limit to 64k the number of bytes that can be read at
a time using the ogetdirentries() compatibility syscall. This is a
hack to ensure that rediculous values don't get passed to MALLOC().

Reviewed by:	kris
2001-08-10 22:14:18 +00:00
John Baldwin
8791b43513 Work around a race between msleep() and endtsleep() where it was possible
for endtsleep() to be executing when msleep() resumed, for endtsleep()
to spin on sched_lock long enough for the other process to loop on
msleep() and sleep again resulting in endtsleep() waking up the "wrong"
msleep.

Obtained from:	BSD/OS
2001-08-10 21:08:56 +00:00
John Baldwin
a45982d2ea Change callout_stop() to return an integer. If callout_stop() succeeds in
removing the callout entry, return 1.  If callout_stop() fails to remove
the callout entry because it is currently executing or has already been
executed, then the function returns 0.  The idea was obtained from BSD/OS,
however, BSD/OS changed untimeout(), and I've just changed callout_stop()
to be more conservative.

Obtained from:	BSD/OS
2001-08-10 21:06:59 +00:00
John Baldwin
4d33620270 Style nit: covert a couple of if (p_wchan) tests to if (p_wchan != NULL). 2001-08-10 20:56:25 +00:00
John Baldwin
c4a448100c - Remove asleep(), await(), and M_ASLEEP.
- Callers of asleep() and await() have been converted to calling tsleep().
  The only caller outside of M_ASLEEP was the ata driver, which called both
  asleep() and await() with spl-raised, so there was no need for the
  asleep() and await() pair.  M_ASLEEP was unused.

Reviewed by:	jasone, peter
2001-08-10 06:45:43 +00:00
John Baldwin
8ec48c6dbf - Remove asleep(), await(), and M_ASLEEP.
- Callers of asleep() and await() have been converted to calling tsleep().
  The only caller outside of M_ASLEEP was the ata driver, which called both
  asleep() and await() with spl-raised, so there was no need for the
  asleep() and await() pair.  M_ASLEEP was unused.

Reviewed by:	jasone, peter
2001-08-10 06:37:05 +00:00
John Baldwin
ab32297d8d Axe spl's obsoleted by the callout mutex. 2001-08-10 01:36:25 +00:00
Peter Wemm
99ab2d5dca *** empty log message *** 2001-08-09 01:21:58 +00:00
Peter Wemm
2aca0c28d3 Zap 'ptrace(PT_READ_U, ...)' and 'ptrace(PT_WRITE_U, ...)' since they
are a really nasty interface that should have been killed long ago
when 'ptrace(PT_[SG]ETREGS' etc came along.  The entity that they
operate on (struct user) will not be around much longer since it
is part-per-process and part-per-thread in a post-KSE world.

gdb does not actually use this except for the obscure 'info udot'
command which does a hexdump of as much of the child's 'struct user'
as it can get.  It carries its own #defines so it doesn't break
compiles.
2001-08-08 05:25:15 +00:00
Brian Feldman
bcc92693d4 Previously, the ELF linker would always just store the pointer to a
filename passed in via the module loader functions in the GDB
"sharedlibrary" support structures.  This isn't good, since the pointer
would become stale in almost every case (not the pre-loaded case, of
course).

Change this to malloc()ed copy of the string and finally fix the reason
that gdb -k's "sharedlibrary" command stopped working.

Obtained from:	LOMAC/FreeBSD (cf. NAI Labs)
2001-08-06 14:21:57 +00:00
Chris Costello
c30d4da338 Remove the fildesc_clone() function and its associated unnecessary code.
It didn't implement the proper /dev/fd functionality (which would be to
include in the directory listing /dev/fd/n if the process has fd n open)
anyway.

Anything needing access to /dev/fd/n where n > 2 can use the optional
fdescfs module, which implements this properly and does not cause any
trouble with devfs.

Discussed with:	phk
2001-08-06 05:56:33 +00:00
Thomas Moestl
12543b2e98 Export the tk_nin and tk_nout variables (number of tty input/output
characters) as sysctls (kern.tty_nin and kern.tty_nout).
2001-08-04 18:09:24 +00:00
Thomas Moestl
938a4e5c0c Export the head structure for the device statistics STAILQ in
sys/devicestat.h, so that the queue can be walked in crashdumps using
libkvm.
2001-08-04 18:02:47 +00:00
John Baldwin
c9c1406f76 Add KTR_INTR tracepoints for when clock interrupts are triggered. 2001-08-03 20:54:41 +00:00
Robert Watson
fd6aaf7fe1 Anton kindly pointed out (and fixed) a bug in the Jail handling of the
bind() call on IPv4 sockets:

  Currently, if one tries to bind a socket using INADDR_LOOPBACK inside a
  jail, it will fail because prison_ip() does not take this possibility
  into account.  On the other hand, when one tries to connect(), for
  example, to localhost, prison_remote_ip() will silently convert
  INADDR_LOOPBACK to the jail's IP address.  Therefore, it is desirable to
  make bind() to do this implicit conversion as well.

  Apart from this, the patch also replaces 0x7f000001 in
  prison_remote_ip() to a more correct INADDR_LOOPBACK.

This is a 4.4-RELEASE "during the freeze, thanks" MFC candidate.

Submitted by:	Anton Berezin <tobez@FreeBSD.org>
Discussed with at some point:	phk
MFC after:	3 days
2001-08-03 18:21:06 +00:00
Bosko Milekic
ba3e88262e Rename mb_init() mbuf subsystem initialization routine to mbuf_init(), in
order to avoid namespace collision with subr_mchain.c's mb_init(). This
wasn't "fatal" as the mbuf initialization routine mb_init() was local to
subr_mbuf.c which in turn didn't pull in subr_mchain.c's mb_init()
declaration, but it should deffinately be changed now before it creates
headache.
2001-08-03 05:05:32 +00:00
Jake Burkholder
f74250ca46 Remove some code that appears to have endian problems with INVARIANTS.
This is #if BIG_ENDIAN, but is only necessary if malloc types are shorts,
not struct malloc_type * like they are now.
2001-08-03 03:31:45 +00:00
John Baldwin
b39bc3e160 Use 'p' instead of the potentially more expensive 'curproc' inside of
mi_switch().
2001-08-02 22:15:31 +00:00
Warner Losh
c7021493ba Make the fmt arguments to make_dev and make_dev_alias const char *.
Approved on IRC as long as it didn't cause a large number of warnings by: phk

MFC After: 700 hours
2001-08-02 20:35:35 +00:00
Peter Wemm
aa7a4dae6d Temporarily back out kern_sig.c rev 1.125 and kern_exit.c rev 1.131.
This paniced my one of my machines one time too many :-( and there is
no sign of a solution in the pipeline.  The deltas are still easily
available in cvs.  The problem is that if the parent has been swapped
out, the child process cannot grope around in the parent's UPAGES to
see the sigact[] array or it will fault.  This probably is a showstopper
for this implementation anyway.
2001-08-01 20:35:24 +00:00
Bosko Milekic
bb6f838c79 Move CPU_ABSENT() macro to smp.h, where it belongs anyway. It will be
defined to 0 in the non-SMP case, which very much makes sense as it
permits its usage in per-CPU initialization loops (for an example, check
out subr_mbuf.c).
  Further, on a UP system, make mb_alloc always use the first per-CPU
container, regardless of cpuid (i.e. remove reliability on cpuid in the
UP case).

Requested by: alfred
2001-08-01 00:54:00 +00:00
John Baldwin
36c2e9feb4 Apply the cluebat to myself and undo the await() -> mawait() rename. The
asleep() and await() functions split the functionality of msleep() up into
two halves.  Only the asleep() half (which is what puts the process on the
sleep queue) actually needs the lock usually passed to msleep() held to
prevent lost wakeups.  await() does not need the lock held, so the lock
can be released prior to calling await() and does not need to be passed in
to the await() function.  Typical usage of these functions would be as
follows:

        mtx_lock(&foo_mtx);
        ... do stuff ...
        asleep(&foo_cond, PRIxx, "foowt", hz);
        ...
        mtx_unlock&foo_mtx);
        ...
        await(-1, -1);

Inspired by:	dillon on the couch at Usenix
2001-07-31 22:06:56 +00:00
John Baldwin
e9121d0663 Add a safety belt to mawait() for the (cold || panicstr) case identical to
the one in msleep() such that we return immediately rather than blocking.

Submitted by:	peter
Prodded by:	sheldonh
2001-07-31 20:57:57 +00:00
John Baldwin
5cb0fbe47e If we have already panic'd then don't bother enforcing mutex asserts as
things are pretty much shot already and all panic'ing does is hurt our
chances of getting a dump.

Inspired by:	sheldonh
2001-07-31 17:45:50 +00:00
John Baldwin
32bca5fe03 - Fix panicstr checks to explicitly check against NULL.
- Add a few more panicstr checks so that we don't panic recursively.

Requested by:	sheldonh (2)
2001-07-31 17:44:57 +00:00
Robert Watson
e7f65fdcf9 o Modify p_candebug() such that there is no longer automatic acceptance
of debugging the current process when that is in conflict with other
  restrictions (such as jail, unprivileged_procdebug_permitted, etc).
o This corrects anomolies in the behavior of
  kern.security.unprivileged_procdebug_permitted when using truss and
  ktrace.  The theory goes that this is now safe to use.

Obtained from:	TrustedBSD Project
2001-07-31 17:25:12 +00:00
Robert Watson
0ef5652e27 o Introduce new kern.security sysctl tree for kernel security policy
MIB entries.
o Relocate kern.suser_permitted to kern.security.suser_permitted.
o Introduce new kern.security.unprivileged_procdebug_permitted, which
  (when set to 0) prevents processes without privilege from performing
  a variety of inter-process debugging activities.  The default is 1,
  to provide current behavior.

  This feature allows "hardened" systems to disable access to debugging
  facilities, which have been associated with a number of past security
  vulnerabilities.  Previously, while procfs could be unmounted, other
  in-kernel facilities (such as ptrace()) were still available.  This
  setting should not be modified on normal development systems, as it
  will result in frustration.  Some utilities respond poorly to
  failing to get the debugging access they require, and error response
  by these utilities may be improved in the future in the name of
  beautification.

  Note that there are currently some odd interactions with some
  facilities, which will need to be resolved before this should be used
  in production, including odd interactions with truss and ktrace.
  Note also that currently, tracing is permitted on the current process
  regardless of this flag, for compatibility with previous
  authorization code in various facilities, but that will probably
  change (and resolve the odd interactions).

Obtained from:	TrustedBSD Project
2001-07-31 15:48:21 +00:00
Jake Burkholder
146be906a1 Don't try to find an eventhandler list if the list of lists hasn't
been initialized yet.
2001-07-31 03:52:16 +00:00
Jake Burkholder
98b0e9d587 Don't try to print a field that doesn't exist; in usually commented
out debugging code.
2001-07-31 03:51:07 +00:00
Jake Burkholder
7e5102989e Use a machine dependent type, Elf_Hashelt, for the elements of the elf
dynamic symbol table buckets and chains.  The sparc64 toolchain uses 32
bit .hash entries, unlike other 64 bits architectures (alpha), which use
64 bit entries.

Discussed with: dfr, jdp
2001-07-31 03:46:39 +00:00
Jeroen Ruigrok van der Werven
7b389f3335 Fix obsolete code.
FreeBSD _does_ define ENOMSG, so no need for checking if we support it.

Inspired by PR:		22470
Which was submitted by:	Bjorn Tornqvist <bjorn@west.se>
MFC after:	1 week
2001-07-30 19:28:02 +00:00
Peter Wemm
b219758f94 Revert previous accidental commit. FWIW, it was part of enabling
VM caching of disks through mmap() and stopping syncing of open files
that had their last reference in the fs removed (ie: their unsync'ed
pages get discarded on close already, so I made it stop syncing too).
2001-07-27 15:57:17 +00:00
Peter Wemm
24a590a074 Fix cut/paste blunder. Serves me right for doing a last minute tweak
to what I had for some time.

Submitted by:	bde
2001-07-27 15:52:49 +00:00
Peter Wemm
a03bd29498 Use the tunable maxusers rather than the compile-time one. Evaluate and
initialize in the right order to make derivative settings work right.
eg: at compile time, nmbufs was double nmbclusters.  For POLA this should
work the same at runtime.
2001-07-26 23:08:31 +00:00
Peter Wemm
ee342e1bf1 Move param.c out of the conf directory and make it fully dynamic.
Tunables are now derived at boot time from maxusers.  ie: change maxusers
via a tunable and all the derivative settings change.  You can change
the other tunables individually as well.  Even hz etc is tunable.
2001-07-26 23:04:03 +00:00
Bosko Milekic
49f854f926 - Do not handle the per-CPU containers in mbuf code as though the cpuids
were indices in a dense array. The cpuids are a sparse set and treat
  them as such, setting up containers only for CPUs activated during
  mb_init().

- Fix netstat(1) and systat(1) to treat the per-CPU stats area as a sparse
  map, in accordance with the above.

This allows us to properly boot with certain CPUs disactivated. However, if
we later decide to re-activate said CPUs, we will barf until we decide to
implement CPU spinon/spinoff callback hooks to allow for said CPUs' per-CPU
containers to get configured on their activation.

Reported by: mjacob
Partially (sys/ diffs) Submitted by: mjacob
2001-07-26 18:47:46 +00:00
Bill Fenner
c3cb7e5d7a Don't bother passing p to rtioctl just so it can fail to pass it to mrt_ioctl 2001-07-25 20:15:28 +00:00
Peter Pentchev
7ca4d05f34 Make dynamic sysctl entries start at 0x100, not decimal 100 - there are
static entries with oid's over 100, and defining enough dynamic entries
causes an overlap.

Move the "magic" value 0x100 into <sys/sysctl.h> where it belongs.

PR:		29131
Submitted by:	"Alexander N. Kabaev" <kabaev@mail.ru>
Reviewed by:	-arch, -audit
MFC after:	2 weeks
2001-07-25 17:21:18 +00:00
Peter Pentchev
107e7dc5c3 Style(9): function names on a separate line, max line length 80 chars.
Reviewed by:	-arch, -audit
MFC after:	2 weeks
2001-07-25 17:13:58 +00:00
Dima Dorfman
02bd5400fe sys/kern/tty_snoop.c is now sys/dev/snp/snp.c.
Repo-copy by:	jdp
2001-07-25 12:06:36 +00:00
Assar Westerlund
2b3dc41c15 correct description of `vpp' for mknod/symlink: they are actually
returned locked
2001-07-24 16:16:00 +00:00
Matthew Dillon
4fec48c6fe As per further discussions on hackers redo the SIGCHLD patch to not generate
an unexpected user-visible side effect with the sigaction flags.  Also cleanup
a minor union issue.

Submitted by: Rudolf Cejka <cejkar@dcse.fee.vutbr.cz>
MFC addendum: MFC will be combined w/ original commit
MFC after: 3 days
2001-07-22 18:47:31 +00:00
Assar Westerlund
17b65d5532 revert previous commit (bad style and not needed)
Noticed:	bde
2001-07-22 10:24:31 +00:00
Assar Westerlund
8cfdf32239 add prototype for dosetrlimit 2001-07-22 00:21:19 +00:00
Assar Westerlund
129a62d7c7 add <sys/cdefs.h> (for __unused and such) 2001-07-21 17:12:44 +00:00
John Baldwin
a5dd141db6 Add a missing ~ so that the LO_INITIALIZED flag actually gets turned off
in witness_destroy().
2001-07-20 23:29:25 +00:00
Jonathan Lemon
5f5c2e958f Introduce EVFILT_TIMER, which allows a process to establish an
arbitrary number of timers, both oneshot and periodic.

Repeatedly reminded to commit by: jayanth
Reviewed by: peter (a while back)
2001-07-19 18:34:40 +00:00
Kris Kennaway
2d075e994c Don't use kp->arg0 as a format string, grr.
MFC after:	1 week
2001-07-19 02:18:54 +00:00
Dima Dorfman
ac60b28d35 Keep track of all "struct snoop"'s so that snp_modevent can fail with
EBUSY if there's a device still open.
2001-07-18 13:39:43 +00:00
David E. O'Brien
b46ba8880c Increase NMBCLUSTERS by 4x.
This takes a GENERIC kernel (MAXUSERS=32) from 1536 to 3072.
2001-07-17 15:51:12 +00:00
Peter Wemm
2fc4762c60 Move the hints gunk to a seperate file. It isn't really part of the
newbus structure (no more than subr_rman.c is anyway).
2001-07-14 08:25:18 +00:00
Peter Wemm
9516fbd6d9 Go back to having either static OR dynamic hints, with fallback
support.  Trying to fix the merged set where dynamic overrode
static was getting more and more complicated by the day.

This should fix the duplicate atkbd, psm, fd* etc in GENERIC.  (which
paniced the alpha, but not the i386)
2001-07-14 00:23:10 +00:00
Dima Dorfman
b2c3fa70e3 Correct spelling in a comment and remove trailing newline from a
panic() call (panic() adds it itself).
2001-07-11 02:04:43 +00:00
Dag-Erling Smørgrav
f0cc1c6f81 Constify the fstype argument to vfs_mount(). This eliminates at least one
"call discards qualifier" warning (in sys/compat/linux/linux_file.c).
2001-07-09 19:11:51 +00:00
Guido van Rooij
333ea48563 Don't share sig handlers after an exec
Reviewed by:	Alfred Perlstein
2001-07-09 19:01:42 +00:00
Guido van Rooij
9b956e9897 Get rid of useless bcopy (the next statement was equivalent) 2001-07-09 19:00:08 +00:00
Jake Burkholder
d652b3d918 Backout mwakeup, etc. 2001-07-06 01:16:43 +00:00
Robert Watson
a0f75161f9 o Replace calls to p_can(..., P_CAN_xxx) with calls to p_canxxx().
The p_can(...) construct was a premature (and, it turns out,
  awkward) abstraction.  The individual calls to p_canxxx() better
  reflect differences between the inter-process authorization checks,
  such as differing checks based on the type of signal.  This has
  a side effect of improving code readability.
o Replace direct credential authorization checks in ktrace() with
  invocation of p_candebug(), while maintaining the special case
  check of KTR_ROOT.  This allows ktrace() to "play more nicely"
  with new mandatory access control schemes, as well as making its
  authorization checks consistent with other "debugging class"
  checks.
o Eliminate "privused" construct for p_can*() calls which allowed the
  caller to determine if privilege was required for successful
  evaluation of the access control check.  This primitive is currently
  unused, and as such, serves only to complicate the API.

Approved by:	({procfs,linprocfs} changes) des
Obtained from:	TrustedBSD Project
2001-07-05 17:10:46 +00:00
John Baldwin
f583b1d938 Spelling fix in a KASSERT: runq_chose -> runq_choose. 2001-07-04 20:00:48 +00:00
Matthew Dillon
7b9673fa28 cleanup: GIANT macros, rename DEPRECIATE to DEPRECATE
Move p_giant_optional to proc zero'd section
Remove (old) XXX zfree comment in pipe code
2001-07-04 17:11:03 +00:00
Matthew Dillon
0cddd8f023 With Alfred's permission, remove vm_mtx in favor of a fine-grained approach
(this commit is just the first stage).  Also add various GIANT_ macros to
formalize the removal of Giant, making it easy to test in a more piecemeal
fashion. These macros will allow us to test fine-grained locks to a degree
before removing Giant, and also after, and to remove Giant in a piecemeal
fashion via sysctl's on those subsystems which the authors believe can
operate without Giant.
2001-07-04 16:20:28 +00:00
Matthew Dillon
085be199c6 postsig() currently requires Giant to be held. Giant is held properly at
the first postsig() call, but not always held at the second place,
resulting in an occassional panic.
2001-07-04 15:36:30 +00:00
Jake Burkholder
9316aed2ef Implement mwakeup, mwakeup_one, cv_signal_drop and cv_broadcast_drop.
These take an additional mutex argument, which is dropped before any
processes are made runnable.  This can avoid contention on the mutex
if the processes would immediately acquire it, and is done in such a
way that wakeups will not be lost.

Reviewed by:	jhb
2001-07-04 00:32:50 +00:00
Dag-Erling Smørgrav
2687c8741b Constify the format string.
Submitted by:	Mike Barcroft <mike@q9media.com>
2001-07-03 21:46:43 +00:00
Thomas Moestl
948d3d9484 Make the code to read the kernel message buffer via sysctl machine-
independent and rename the corresponding sysctls from machdep.msgbuf and
machdep.msgbuf_clear (i386 only) to kern.msgbuf and kern.msgbuf_clear.
2001-07-03 19:44:07 +00:00
John Baldwin
29905510e0 Remove spl's in uio_yield() that are covered by the sched_lock. 2001-07-03 15:58:37 +00:00
John Baldwin
d68a8cc0ab Remove commented-out garbage that skipped updating schedcpu() stats for
ithreads in SWAIT.
2001-07-03 08:03:56 +00:00
John Baldwin
97b4306f0f Just check p_oncpu when determining if a process is executing or not.
We already did this in the SMP case, and it is now maintained in the UP
case as well, and makes the code slightly more readable.  Note that
curproc is always executing, thus the p != curproc test does not need to
be performed if the p_oncpu check is made.
2001-07-03 08:00:57 +00:00
John Baldwin
9d36b83e2c Axe spl's that are covered by the sched_lock (and have been for quite
some time.)
2001-07-03 07:53:35 +00:00
John Baldwin
36f1548b96 Include the wait message and channel for msleep() in the KTR tracepoint. 2001-07-03 07:39:06 +00:00
John Baldwin
8f451b4114 Remove bogus need_resched() of the current CPU in roundrobin().
We don't actually need to force a context switch of the current process.
The act of firing the event triggers a context switch to softclock() and
then switching back out again which is equivalent to a preemption, thus
no further work is needed on the local CPU.
2001-07-03 05:33:09 +00:00
John Baldwin
64acb05b1c Grab Giant around postsig() since sendsig() can call into the vm to
grow the stack and we already needed Giant for KTRACE.
2001-07-03 05:27:53 +00:00
Robert Watson
e84b7987bc o Unfold p31b_proc() into the individual posix4 system calls so as to
allow call-specific authorization.
o Modify the authorization model so that p_can() is used to check
  scheduling get/set events, using P_CAN_SEE for gets, and P_CAN_SCHED
  for sets.  This brings the checks in line with get/setpriority().

Obtained from:	TrustedBSD Project
2001-06-30 07:55:19 +00:00
John Baldwin
aa3cefd06c Remove the p_spinlocks spin lock count that was obsoleted by the
per-CPU spinlocks list.
2001-06-30 03:35:22 +00:00
Robert Watson
1af55356f8 Replace some use of 'p' with 'targetp' so as to not scarily overload the
passed 'p' argument.  No functional change.

Obtained from:	USENIX Emporium, Cheap Tricks Department
2001-06-30 03:13:36 +00:00
John Baldwin
a300519d41 Make the schedlock saved critical section state a per-thread property. 2001-06-30 03:11:26 +00:00
John Baldwin
7aa7260e4a Move ast() and userret() to sys/kern/subr_trap.c now that they are MI. 2001-06-29 19:51:37 +00:00
John Baldwin
6be523bca7 Add a new MI pointer to the process' trapframe p_frame instead of using
various differently named pointers buried under p_md.

Reviewed by:	jake (in principle)
2001-06-29 11:10:41 +00:00
John Baldwin
92809bc001 Grab Giant around trap_pfault() for now. 2001-06-29 04:18:10 +00:00
Jonathan Lemon
84241bd0dc Fix up indentation. 2001-06-29 04:01:38 +00:00
Robert Watson
64e55bf47b Remove a fascinating but confusing construct involving chaining
conditional clauses in the following way:

	(0 || a || b);

No functional change.
2001-06-28 23:02:09 +00:00
Robert Watson
e8f7a95298 Add error checking for copyin() operations in posix4 scheduling code. 2001-06-28 22:53:42 +00:00
John Baldwin
ec178c1e4c Don't check witness assertions if the lock doesn't use witness or witness
is dead.
2001-06-28 22:22:20 +00:00
John Baldwin
cd2f721557 - Fix a mntvnode and vnode interlock reversal.
- Protect the mnt_vnode list with the mntvnode lock.
2001-06-28 04:05:54 +00:00
John Baldwin
5f36700a32 - Add trylock variants of shared and exclusive locks.
- The sx assertions don't actually need the internal sx mutex lock, so
  don't bother doing so.
- Add a new assertion SX_ASSERT_LOCKED() that asserts that either a
  shared or exclusive lock should be held.  This assertion should be used
  instead of SX_ASSERT_SLOCKED() in almost all cases.
- Adjust some KASSERT()'s to include file and line information.
- Use the new witness_assert() function in the WITNESS case for sx slock
  asserts to verify that the current thread actually owns a slock.
2001-06-27 06:39:37 +00:00
John Baldwin
04297fe609 - Add a new witness_assert() to perform arbitrary locking assertions.
- Clean up the KTR tracepoints to be slighlty more consistent and useful
- Fix a bug in WITNESS where we would recurse indefinitely and blow the
  stack when acquiring Giant after sleeping with a sleepable lock held.

Reported by:	tanimura (3)
2001-06-27 06:27:29 +00:00
John Baldwin
776e0b3693 - Always use the proc lock of the task leader to protect the peers list of
processes.
- Don't construct fake call args and then call kill().  psignal is not
  anymore complicated and is quicker and not prone to locking problems.
  Calling psignal() avoids having to do a pfind() since we already have a
  proc pointer and also allows us to keep the task leader locked while we
  kill all the peer processes so the list is kept coherent.
- When a kthread exits, do a wakeup() on its proc pointers.  This can be
  used by kernel modules that have kthreads and want to ensure they have
  safely exited before completely the MOD_UNLOAD event.

Connectivity provided by:	Usenix wireless
2001-06-27 06:15:44 +00:00
John Baldwin
b7e554f5d6 - Move the 'clk' spinlock below other spin locks since KTR trace events
may need the clock lock for nanotime().
- Add KTR trace events for lock list manipulations and other witness
  operations.
- Use a temporary variable instead of setting the lock list head directly
  and then setting up the links to add a new lock list entry to the lock
  list.  This small race could result in witness "forgetting" about all
  the locks held by this process temporarily during an interrupt.
- Close a more fatal race condition when removing a lock from a list.
  Removing a lock from the list entails both decrementing the count of
  items in this bucket as well as shuffling items in the current bucket up
  a notch to replace the gap left by the removed item.  Wrap these
  operations in a critical section.
2001-06-25 23:17:52 +00:00
John Baldwin
1715f07da3 - Replace the unused KTR_IDLELOOP trace class with a new KTR_WITNESS trace
class to trace witness events.
- Make the ktr_cpu field of ktr_entry be a standard field rather than one
  present only in the KTR_EXTEND case.
- Move the default definition of KTR_ENTRIES from sys/ktr.h to
  kern/kern_ktr.c.  It has not been needed in the header file since KTR
  was un-inlined.
- Minor include cleanup in kern/kern_ktr.c.
- Fiddle with the ktr_cpumask in ktr_tracepoint() to disable KTR events
  on the current CPU while we are processing an event.
- Set the current CPU inside of the critical section to ensure we don't
  migrate CPU's after the critical section but before we set the CPU.
2001-06-25 23:09:31 +00:00
John Baldwin
1d79f1bb9a - Sort includes.
- Count the context switches during shutdown when we give ithreads a chance
  to run as volutary context switches.

Submitted by:	bde (2)
2001-06-25 18:30:42 +00:00
John Baldwin
c4f7a18726 Count the context switch when blocking on a mutex as a voluntary context
switch.  Count the context switch when preempting the current thread to let
a higher priority thread blocked on a mutex we just released run as an
involuntary context switch.

Reported by:	bde
2001-06-25 18:29:32 +00:00
John Baldwin
84bbc4dbda Count the switch when an ithread goes idle as a voluntary context switch.
Submitted by:	bde
2001-06-25 18:27:33 +00:00
David Malone
db3cc2d09f Don't dereference a NULL pointer if we fail to get a sendfilebuf. 2001-06-24 12:27:30 +00:00
Matthew Dillon
c7503f60c4 After exhaustive discussions and some meandering and confusion, enough
people are on track with the cause and effect of this, and although
fixing this severely degenerate case appears to violate the letter of
POSIX.1-200x, Bruce and I (and enough others) agree that it should be
comitted.

So, this patch generates an ENOENT error for any attempt to do a path lookup
through an empty symlink (e.g. open(), stat()).

Submitted by: "Andrey A. Chernov" <ache@nagual.pp.ru>
Reviewed by: bde
Discussed exhaustively on: freebsd-current
Previously committed to: NetBSD 4 years ago
2001-06-24 05:24:41 +00:00
John Baldwin
1df95969b5 - Lock CURSIG() with the proc lock to close the signal race with psignal.
- Grab Giant around ktrace points.
- Clean up KTR_PROC tracepoints to not display the value of
  sched_lock.mtx_lock as it isn't really needed anymore and just obfuscates
  the messages.
- Add a few if conditions to replace gotos.
- Ensure that every msleep KTR event ends up with a matching msleep resume
  KTR event (this was broken when we didn't do a mi_switch()).
- Only note via ktrace that we resumed from a switch once rather than twice
  in several places in msleep().
- Remove spl's rom asleep and await as the proc lock and sched_lock provide
  all the needed locking.
- In mawait() add in a needed ktrace point for noting that we are about to
  switch out.
2001-06-22 23:11:26 +00:00
John Baldwin
87f9ffb805 - Lock CURSIG with the proc lock and don't release the proc lock until
after grabbing the sched lock to close a race.
- Lock ktrace points with Giant.
2001-06-22 23:06:38 +00:00
John Baldwin
06c836bbca - Grab the proc lock around CURSIG and postsig(). Don't release the proc
lock until after grabbing the sched_lock to avoid CURSIG racing with
  psignal.
- Don't grab Giant for addupc_task() as it isn't needed.

Reported by:	tegge (signal race), bde (addupc_task a while back)
2001-06-22 23:05:11 +00:00
John Baldwin
2ad7d3049a - Change CURSIG() and postsig() to require that the proc lock is held
rather than grabbing it and releasing it themselves.  This allows callers
  of these functions to get the lock to close race conditions.
- Grab Giant around ktrace in postsig.
- Count the switches performed on SIGSTOP's as involuntary context switches
  in the resource usage stats.

Reported by:	tegge (signal race), bde (missing csw stats)
2001-06-22 23:02:37 +00:00
Matt Jacob
2f7f966cb8 int -> size_t fix 2001-06-22 19:54:38 +00:00
Matt Jacob
8f5a1742c2 Temporary fix at least- define NCPU_PRESENT which will be mp_npcus for
SMP kernels, one (1) for non-SMP.
2001-06-22 16:03:23 +00:00
Jim Pirzyk
f83ae79fbe changed hostid from long to unsigned long to be able to store values > 2GB
on i386 platforms.  Also changed SYSCTL type from INT to ULONG and removed
comment about it.

PR:		kern/21132
MFC after:	1 month
2001-06-22 16:03:14 +00:00
Bosko Milekic
08442f8a82 Introduce numerous SMP friendly changes to the mbuf allocator. Namely,
introduce a modified allocation mechanism for mbufs and mbuf clusters; one
which can scale under SMP and which offers the possibility of resource
reclamation to be implemented in the future. Notable advantages:

 o Reduce contention for SMP by offering per-CPU pools and locks.
 o Better use of data cache due to per-CPU pools.
 o Much less code cache pollution due to excessively large allocation macros.
 o Framework for `grouping' objects from same page together so as to be able
   to possibly free wired-down pages back to the system if they are no longer
   needed by the network stacks.

 Additional things changed with this addition:

  - Moved some mbuf specific declarations and initializations from
    sys/conf/param.c into mbuf-specific code where they belong.
  - m_getclr() has been renamed to m_get_clrd() because the old name is really
    confusing. m_getclr() HAS been preserved though and is defined to the new
    name. No tree sweep has been done "to change the interface," as the old
    name will continue to be supported and is not depracated. The change was
    merely done because m_getclr() sounds too much like "m_get a cluster."
  - TEMPORARILY disabled mbtypes statistics displaying in netstat(1) and
    systat(1) (see TODO below).
  - Fixed systat(1) to display number of "free mbufs" based on new per-CPU
    stat structures.
  - Fixed netstat(1) to display new per-CPU stats based on sysctl-exported
    per-CPU stat structures. All infos are fetched via sysctl.

 TODO (in order of priority):

  - Re-enable mbtypes statistics in both netstat(1) and systat(1) after
    introducing an SMP friendly way to collect the mbtypes stats under the
    already introduced per-CPU locks (i.e. hopefully don't use atomic() - it
    seems too costly for a mere stat update, especially when other locks are
    already present).
  - Optionally have systat(1) display not only "total free mbufs" but also
    "total free mbufs per CPU pool."
  - Fix minor length-fetching issues in netstat(1) related to recently
    re-enabled option to read mbuf stats from a core file.
  - Move reference counters at least for mbuf clusters into an unused portion
    of the cluster itself, to save space and need to allocate a counter.
  - Look into introducing resource freeing possibly from a kproc.

Reviewed by (in parts): jlemon, jake, silby, terry
Tested by: jlemon (Intel & Alpha), mjacob (Intel & Alpha)
Preliminary performance measurements: jlemon (and me, obviously)
URL: http://people.freebsd.org/~bmilekic/mb_alloc/
2001-06-22 06:35:32 +00:00
John Baldwin
fbd26f7594 Fix some lock order reversals where we called free() while holding a proc
lock.  We now use temporary variables to save the process argument pointer
and just update the pointer while holding the lock.  We then perform the
free on the cached pointer after releasing the lock.
2001-06-20 23:10:06 +00:00
Bosko Milekic
f5eece3fb9 Change m_devget()'s outdated and unused `offset' argument to actually mean
something: offset into the first mbuf of the target chain before copying
the source data over.

Make drivers using m_devget() with a first argument "data - ETHER_ALIGN"
to use the offset argument to pass ETHER_ALIGN in. The way it was previously
done is potentially dangerous if the source data was at the top of a page
and the offset caused the previous page to be copied (if the
previous page has not yet been appropriately mapped).

The old `offset' argument in m_devget() is not used anywhere (it's always
0) and dates back to ~1995 (and earlier?) when support for ethernet trailers
existed. With that support gone, it was merely collecting dust.

Tested on alpha by: jlemon
Partially submitted by: jlemon
Reviewed by: jlemon
MFC after: 3 weeks
2001-06-20 19:48:35 +00:00
John Baldwin
2e1aacccac Preemption by an interrupt thread is an involuntary switch, not a voluntary
one.

Pointy-hat to:	me
2001-06-20 18:26:41 +00:00
Dag-Erling Smørgrav
0e79fe6f0e Constify (silence warnings introduced by last commit to sys/module.h) 2001-06-20 16:08:45 +00:00
Garrett Wollman
37336173d3 After one too many PRs on the subject, bite the bullet and define IOV_MAX
and its associated constants.  Implement _SC_IOV_MAX in the usual way.
Be a bit sloppy about the namespace question; this should get cleared up
in time for 5.0.

MFC after:	1 month
2001-06-18 20:24:54 +00:00
John Baldwin
6fad32afc9 Lock Giant in postsig() for the KTRACE case as ktrpsig() needs Giant when
it writes out to the trace file.

Reported by:	peter, gallatin, and others
2001-06-18 19:23:43 +00:00
Brian Somers
09dbb40410 Add linker_reference_module().
This function loads a module if required, otherwise bumps the reference
count -- the opposite of linker_file_unload().
2001-06-18 15:09:33 +00:00
Brian Somers
21ff14e0f9 Don't remove the SI_CHEAPCLONE for unsupported minors 2001-06-18 09:22:30 +00:00
Peter Wemm
b85db19691 Move setugid() a little sooner to before we release tracing in case
crdup() or change_e*id() block on malloc() or mutex.
2001-06-16 23:34:23 +00:00
Peter Wemm
5a280d9cd1 Add INTR_TYPE_AV so that we can get to the PI_AV priority in the ithread
handlers.  This is beneficial since it means that pcm's MPSAFE handler
can get run before things that will block on Giant in the shared irq
case.
2001-06-16 22:42:19 +00:00
Jonathan Lemon
9fa416ca19 Fix warnings:
112: warning: cast to pointer from integer of different size
125: warning: cast to pointer from integer of different size
2001-06-16 07:02:47 +00:00
Jonathan Lemon
7b748f0a21 Correctly hook up the write kqfilter to pipes.
Submitted by:  Niels Provos <provos@citi.umich.edu>
2001-06-15 20:45:01 +00:00
Peter Wemm
b93c3c5ed6 Fix some warnings in kern_environment.c. Make the getenv*() family
take a const 'name', since they dont modify anything.
159: warning: passing arg 1 of `getenv_int' discards qualifiers...
167: warning: passing arg 1 of `getenv' discards qualifiers from pointer..
2001-06-15 07:29:17 +00:00
Peter Wemm
ee24290963 As per comments in sys/linker_set.h:
BANG! BANG! BANG! BANG! BANG! BANG! CLICK! CLICK! CLICK! CLICK! CLICK!
<reload>
BANG! BANG! BANG! BANG! BANG! BANG! CLICK! CLICK! CLICK! CLICK! CLICK!
2001-06-14 01:28:56 +00:00
Peter Wemm
f41325db5f With this commit, I hereby pronounce gensetdefs past its use-by date.
Replace the a.out emulation of 'struct linker_set' with something
a little more flexible.  <sys/linker_set.h> now provides macros for
accessing elements and completely hides the implementation.

The linker_set.h macros have been on the back burner in various
forms since 1998 and has ideas and code from Mike Smith (SET_FOREACH()),
John Polstra (ELF clue) and myself (cleaned up API and the conversion
of the rest of the kernel to use it).

The macros declare a strongly typed set.  They return elements with the
type that you declare the set with, rather than a generic void *.

For ELF, we use the magic ld symbols (__start_<setname> and
__stop_<setname>).  Thanks to Richard Henderson <rth@redhat.com> for the
trick about how to force ld to provide them for kld's.

For a.out, we use the old linker_set struct.

NOTE: the item lists are no longer null terminated.  This is why
the code impact is high in certain areas.

The runtime linker has a new method to find the linker set
boundaries depending on which backend format is in use.

linker sets are still module/kld unfriendly and should never be used
for anything that may be modular one day.

Reviewed by:	eivind
2001-06-13 10:58:39 +00:00
Peter Wemm
db957588c9 Patch up a blunder I made a few days ago. nmbcnt was being initialized
too late.

Noted by:      bmilekic
Pointy-hat to: peter
2001-06-13 00:36:41 +00:00
Peter Wemm
2398f0cd1d Hints overhaul:
- Replace some very poorly thought out API hacks that should have been
  fixed a long while ago.
- Provide some much more flexible search functions (resource_find_*())
- Use strings for storage instead of an outgrowth of the rather
  inconvenient temporary ioconf table from config().  We already had a
  fallback to using strings before malloc/vm was running anyway.
2001-06-12 09:40:04 +00:00
Dag-Erling Smørgrav
8f7e4eb568 Rename nextpid to lastpid and externalize it. 2001-06-11 21:54:19 +00:00
Dag-Erling Smørgrav
fe46349692 Blah, I cut out a tad too much in the previous commit. (thanks again, Jake!) 2001-06-11 18:43:32 +00:00
Dag-Erling Smørgrav
e3b373228c copyin(9) doesn't return ENAMETOOLONG. (thanks, Jake!) 2001-06-11 18:36:18 +00:00
Dag-Erling Smørgrav
b0def2b548 Add sbuf_copyin(). Also add 'b' variants of sbuf_{cat,copyin,cpy}() which
ignore NUL bytes in the source string.
2001-06-11 17:05:52 +00:00
Hajimu UMEMOTO
3384154590 Sync with recent KAME.
This work was based on kame-20010528-freebsd43-snap.tgz and some
critical problem after the snap was out were fixed.
There are many many changes since last KAME merge.

TODO:
  - The definitions of SADB_* in sys/net/pfkeyv2.h are still different
    from RFC2407/IANA assignment because of binary compatibility
    issue.  It should be fixed under 5-CURRENT.
  - ip6po_m member of struct ip6_pktopts is no longer used.  But, it
    is still there because of binary compatibility issue.  It should
    be removed under 5-CURRENT.

Reviewed by:	itojun
Obtained from:	KAME
MFC after:	3 weeks
2001-06-11 12:39:29 +00:00
David Malone
c7fd62da6c Try to make the setting of the SIGCHLD handler the same as setting of
the NOCLDWAI flag. Susv2 seems to require this.

Submitted by:	Cejka Rudolf <cejkar@dcse.fee.vutbr.cz>
Reviewed by:	dillon
2001-06-11 09:15:41 +00:00
Dag-Erling Smørgrav
d647935801 sbuf_new(9) now returns a struct sbuf * instead of an int. If the caller
does not provide a struct sbuf, sbuf_new(9) will allocate one and return
a pointer to it.
2001-06-10 15:48:04 +00:00
Peter Wemm
0978669829 "Fix" the previous initial attempt at fixing TUNABLE_INT(). This time
around, use a common function for looking up and extracting the tunables
from the kernel environment.  This saves duplicating the same function
over and over again.  This way typically has an overhead of 8 bytes + the
path string, versus about 26 bytes + the path string.
2001-06-08 05:24:21 +00:00
Peter Wemm
4422746fdf Back out part of my previous commit. This was a last minute change
and I botched testing.  This is a perfect example of how NOT to do
this sort of thing. :-(
2001-06-07 03:17:26 +00:00
Thomas Moestl
c0a0fb85e2 Fix an instance of NDINIT in the extattrctl syscall: LOCKLEAF was or'ed
to the operation parameter, not to the flags as it should be.

Reviewed by:	rwatson
2001-06-06 23:34:38 +00:00
Peter Wemm
81930014ef Make the TUNABLE_*() macros look and behave more consistantly like the
SYSCTL_*() macros.  TUNABLE_INT_DECL() was an odd name because it didn't
actually declare the int, which is what the name suggests it would do.
2001-06-06 22:17:08 +00:00
John Baldwin
5beb572b41 We don't need to hold a lock just to test a flag. 2001-06-06 22:05:48 +00:00
Ruslan Ermilov
4589be70fe Unbreak setregid(2).
Spotted by:	Alexander Leidinger <Alexander@Leidinger.net>
2001-06-06 13:58:03 +00:00
John Baldwin
262c9f8a3b Don't hold sched_lock across addupc_task().
Reported by:	David Taylor <davidt@yadt.co.uk>
Submitted by:	bde
2001-06-06 00:57:24 +00:00
Dima Dorfman
ddf5b79683 Add a line discipline close routine which restores some functionality
I accidently nuked in rev. 1.54.  Also rework the error handling in
snplwrite a little.
2001-06-05 05:07:53 +00:00
Dima Dorfman
f09f49f136 Style and cosmetic cleanups. This driver is now reasonably stlye(9)
compliant.  All the variable definitions and function names are
reasonably consistent, and the functions which should be static (i.e.,
all of them) are.  Other assorted fixes were made.  The majority of
the delta is indentation fixes.

Partially reviewed by:	bde
2001-06-05 05:00:17 +00:00