Commit Graph

2762 Commits

Author SHA1 Message Date
Matthew Dillon
8f95b97072 Reimplement buf_daemon / getnewbuf() interaction for dealing with
stressful situations.  buf_daemon now makes a distinction between
    being woken up and its sleep timing out, and as a consequence is now
    much better able to dynamically tune itself to its environment.

Reviewed by:	Alfred Perlstein <bright@wintelcom.net>
1999-12-20 20:28:40 +00:00
Eivind Eklund
6357e7b507 Make m_print const correct (avoids a warning) 1999-12-20 18:10:00 +00:00
Greg Lehey
580e7e5a0f If we fail to find init, print out the search path used. This helps
differentiate between one of three different scenarios:

1.  No init.
2.  Path to init munged by an incorrect loader configuration.
3.  Root file system not mounted.

Reviewed-by:  billf
1999-12-20 02:50:49 +00:00
Poul-Henning Kamp
1b4ce5ce9b Don't ignore return value from tsleep().
Spotted by:	charnier
1999-12-19 12:36:41 +00:00
Robert Watson
91f37dcba1 Second pass commit to introduce new ACL and Extended Attribute system
calls, vnops, vfsops, both in /kern, and to individual file systems that
require a vfsop_ array entry.

Reviewed by:	eivind
1999-12-19 06:08:07 +00:00
Robert Watson
ef351daa32 First pass commit to introduce new ACL and Extended Attribute system calls.
The second pass commit with all the supporting code will happen shortly
afterwards.

Reviewed by:	eivind
1999-12-19 05:54:46 +00:00
Eivind Eklund
d776d82c4d Since VOP_LOCK can be used to up and downgrade locks, it is not possible
to say anything about the lockstate before and after it.  Thus, change the
lockspec from U L U to ? ? ?.
1999-12-18 23:01:52 +00:00
Brian Feldman
2cfa173c50 Woops, I'm so sorry I forgot this! From the last mbuf.h change:
m_mballoc_wakeup() (inline) -> MMBWAKEUP() (macro)
	m_clalloc_wakeup() (inline) -> MCLWAKEUP() (macro)

Noticed by:	peter
1999-12-18 20:04:19 +00:00
Eivind Eklund
762e6b856c Introduce NDFREE (and remove VOP_ABORTOP) 1999-12-15 23:02:35 +00:00
Brian Feldman
6c54410230 Bug fix:
The variables "m_mclalloc_wid" and "m_mballoc_wid" were not in the
proper place.  They should have been in uipc_mbuf.c and have been global,
not in mbuf.h and local per each file that uses mbuf.h.

Sorta bug fix:
   In mbuf.h, the definitions of various things for KERNEL and not
KERNEL cases were very screwy.   This fixes all of that which I could
find.
1999-12-14 02:23:14 +00:00
Tor Egge
f52523701c Fix two problems with pipe_write():
1. Data written beyond end of pipe buffer, causing kernel memory corruption.

    - Check that space is still valid after obtaining the pipe lock.

    - Defer the calculation of transfer size until the pipe
      lock has been obtained.

    - Update the pipe buffer pointers while holding the pipe lock.

 2. Writes of size <= PIPE_BUF not always atomic.

    - Allow an internal write to span two contiguous segments,
      so writes of size <= PIPE_BUF can be kept atomic
      when wrapping around from the end to the start of the
      pipe buffer.

PR:		15235
Reviewed by:	Matt Dillon <dillon@FreeBSD.org>
1999-12-13 02:55:47 +00:00
Peter Wemm
6f77b2defc Use a seperate -c and -h mode. The vnode_if.c file is compiled only into
the kernel while the vnode_if.h header is a bunch of inlines to call the
code that is in the kernel. Generating the .h file on the fly is kinda
bogus because it has to match the one compiled into the kernel.

IMHO we should have kern/vnode_if.c and sys/vnode_if.h committed in the
tree but that's another battle.
1999-12-12 16:43:05 +00:00
Peter Wemm
cfdd238335 Put on asbestos suit and put a splcam() around the 'Mounting root from..'
message to stop it splitting.  Every single scsi machine I've seen seems
to reliably collide with this and it's rather annoying.
1999-12-12 16:34:43 +00:00
Peter Wemm
016fc47da3 The sysctl mod_xx hack is no longer required now that we have totally
dynamic sysctl registration.
1999-12-12 16:30:34 +00:00
Brian Feldman
f48b807fc0 This is Bosko Milekic's mbuf allocation waiting code. Basically, this
means that running out of mbuf space isn't a panic anymore, and code
which runs out of network memory will sleep to wait for it.

Submitted by:	Bosko Milekic <bmilekic@dsuper.net>
Reviewed by:	green, wollman
1999-12-12 05:52:51 +00:00
Matthew Dillon
3854a87ef3 Remove accidental pollution unrelated to previous commit. The issue
here is real but has not yet been discussed with Eivind.
1999-12-12 03:28:14 +00:00
Matthew Dillon
4f79d873c1 Add MAP_NOSYNC feature to mmap(), and MADV_NOSYNC and MADV_AUTOSYNC to
madvise().

    This feature prevents the update daemon from gratuitously flushing
    dirty pages associated with a mapped file-backed region of memory.  The
    system pager will still page the memory as necessary and the VM system
    will still be fully coherent with the filesystem.  Modifications made
    by other means to the same area of memory, for example by write(), are
    unaffected.  The feature works on a page-granularity basis.

    MAP_NOSYNC allows one to use mmap() to share memory between processes
    without incuring any significant filesystem overhead, putting it in
    the same performance category as SysV Shared memory and anonymous memory.

Reviewed by: julian, alc, dg
1999-12-12 03:19:33 +00:00
Eivind Eklund
6bdfe06ad9 Lock reporting and assertion changes.
* lockstatus() and VOP_ISLOCKED() gets a new process argument and a new
  return value: LK_EXCLOTHER, when the lock is held exclusively by another
  process.
* The ASSERT_VOP_(UN)LOCKED family is extended to use what this gives them
* Extend the vnode_if.src format to allow more exact specification than
  locked/unlocked.

This commit should not do any semantic changes unless you are using
DEBUG_VFS_LOCKS.

Discussed with:	grog, mch, peter, phk
Reviewed by:	peter
1999-12-11 16:13:02 +00:00
Peter Wemm
4537138981 Zap c_index() and c_rindex(). Bruce prefers these to implicitly convert
a const into a non-const as they do in libc.  I feel that defeating the
type checking like that quite evil, but that's the way it is.
1999-12-10 17:38:41 +00:00
Poul-Henning Kamp
2ac0d0a1f7 Make adjtime(2) adjust boottime so it doesn't cause non-monotonous
uptime.
1999-12-08 10:02:12 +00:00
Poul-Henning Kamp
e8452d59e3 Scan cdevs for potential root devices, rather than bdevs. 1999-12-08 10:01:18 +00:00
Poul-Henning Kamp
7d5961670c Remove BAD144 support, it has already been disabled for some time. 1999-12-08 09:33:00 +00:00
Mike Smith
9eec696993 Change the default poweroff delay from 0 to 5 seconds. This seems to be
adequate for the IDE disks that I have available for testing.  Most seem
to wait between 1 and 3 seconds before flushing their caches.

Add the ability to override the delay at compile time via the
undocumented option POWEROFF_DELAY.  The delay can still be set via
sysctl as it was originally implemented.
1999-12-07 04:35:37 +00:00
Poul-Henning Kamp
72dfe7a3d7 I always forget to check before I reboot a system, and while it
boots I try in vain to remember which month or even year this system
was last booted in.

Print out the uptime before rebooting, and give people like me
less (or more as it may be) to think about while the systems boots.
1999-12-06 22:35:51 +00:00
Peter Wemm
bb6a234e47 Put on my asbestos underwear and commit the patch that I posted to -arch
some time ago that changes kern.randompid from a boolean to a randomness
range for the next pid assigment.  Too high causes a lot of extra work
to scan for free pids, and too low merely wastes randomness entropy.  It's
still possible to select a completely random range by using PID_MAX (100k)
or -1 as a shortcut to mean "the whole range".
Also, don't waste randomness when doing a wraparound.
1999-12-06 11:13:50 +00:00
Luoqi Chen
91c28bfde0 User ldt sharing. 1999-12-06 04:53:08 +00:00
Matt Jacob
9efed284d5 correct incomplete last change 1999-12-03 09:10:04 +00:00
Matthew N. Dodd
fe0d408987 Remove the 'ivars' arguement to device_add_child() and
device_add_child_ordered().  'ivars' may now be set using the
device_set_ivars() function.

This makes it easier for us to change how arbitrary data structures are
associated with a device_t.  Eventually we won't be modifying device_t
to add additional pointers for ivars, softc data etc.

Despite my best efforts I've probably forgotten something so let me know
if this breaks anything.  I've been running with this change for months
and its been quite involved actually isolating all the changes from
the rest of the local changes in my tree.

Reviewed by:	peter, dfr
1999-12-03 08:41:24 +00:00
Nick Hibma
39e86a12aa Remove check for attached state.
sc = devclass_get_softc(devclass, unit);

doesn't return NULL during attach anymore, and produces the sc,
identical to (for devclass_get_unit(devclass, unit) != NULL that is):

   sc = device_get_softc(devclass_get_unit(devclass, unit));

Reviewed-by:   dfr
1999-12-02 16:30:21 +00:00
Archie Cobbs
1c38f2ea70 The functions m_copym() and m_copypacket() return read-only copies,
because in the case of mbuf clusters they only increment the reference
count rather than actually copying the data.

Add comments to this effect, and add a new routine called m_dup() that
returns a real, writable copy of an mbuf chain.

This is preliminary work required for implementing 'ipfw tee'.

Reviewed by:	julian
1999-12-01 22:31:32 +00:00
Brian Feldman
226420a464 Separate some common sysctl code into sysctl_find_oid() and calling
thereof.  Also, make the errno returns  _correct_, and add a new one
which is more appropriate.
1999-12-01 02:25:19 +00:00
Kirk McKusick
e9cc475851 Collect read and write counts for filesystems. This new code
drops the counting in bwrite and puts it all in spec_strategy.
I did some tests and verified that the counts collected for writes
in spec_strategy is identical to the counts that we previously
collected in bwrite. We now also get read counts (async reads
come from requests for read-ahead blocks). Note that you need
to compile a new version of mount to get the read counts printed
out. The old mount binary is completely compatible, the only
reason to install a new mount is to get the read counts printed.

Submitted by:	Craig A Soules <soules+@andrew.cmu.edu>
Reviewed by:	Kirk McKusick <mckusick@mckusick.com>
1999-12-01 02:09:30 +00:00
Peter Wemm
ebc49c5654 Don't make the ktrace hook in tsleep() deref a null curproc after a panic.
PR:		15169
Submitted by:	David Gilbert <dgilbert@velocet.ca>
1999-11-30 09:01:46 +00:00
Matthew N. Dodd
44a451ba24 Reduce code duplication.
Hopefully this clears up some confusion about the nature of
devclass_get_softc() vs. device_get_softc() as well.

The check against DS_ATTACHED remains as this is not
a change that modifies functionality.

Reviewed by:	Peter "in principle" Wemm
1999-11-30 07:06:03 +00:00
Matthew Dillon
245efbba4d Remove vfs_getrootfsid() function (a temporary hack added a few months
ago to make BOOTP work again).  It is no longer required by BOOTP and
    no longer used.
1999-11-29 22:25:36 +00:00
Poul-Henning Kamp
c464420c89 Report swapdevices as cdevs rather than bdevs.
Remove unused dev2budev() function.
1999-11-29 21:37:18 +00:00
Poul-Henning Kamp
38941f351c Remove the now unused chrtoblk() function. 1999-11-29 20:50:58 +00:00
Matthew Dillon
99e659dcfa Make BOOTP work again.
Submitted by:	Doug Ambrisko <ambrisko@whistle.com>
1999-11-29 18:51:04 +00:00
Poul-Henning Kamp
8f04f6c729 Add a bit of sanity checking and problem avoidance in case the
timecounter hardware is bogus.

This will produce a new warning "microuptime() went backwards"
and try to not screw up the process resource accounting.
1999-11-29 11:29:04 +00:00
Mike Smith
c0da4cacd0 Use the correct mounted-from path when allocating the root mount, if we know
what it is.

Be more correct in unbusying the mountpoint (especially before freeing it).

Remove support for mounting 'r' devices as root.  You don't mount 'r'
devices anywhere else, and they're going away anyway.

Submitted by:	bde
1999-11-28 22:20:18 +00:00
Dan Moschuk
ee3fd60126 Introduce OpenBSD-like Random PIDs. Controlled by a sysctl knob
(kern.randompid), which is currently defaulted off.  Use ARC4 (RC4) for our
random number generation, which will not get me executed for violating
crypto laws; a Good Thing(tm).

Reviewed and Approved by: bde, imp
1999-11-28 17:51:09 +00:00
Poul-Henning Kamp
ee072c08d0 Convert dumpon to work on character devices instead of block devices.
NB: You may need to change your /etc/rc.conf!
1999-11-28 16:25:17 +00:00
Bruce Evans
bdf423572e Scheduler fixes equivalent to the ones logged in the following NetBSD
commit to kern_synch.c:

----------------------------
revision 1.55
date: 1999/02/23 02:56:03;  author: ross;  state: Exp;  lines: +39 -10
Scheduler bug fixes and reorganization
* fix the ancient nice(1) bug, where nice +20 processes incorrectly
  steal 10 - 20% of the CPU, (or even more depending on load average)
* provide a new schedclk() mechanism at a new clock at schedhz, so high
  platform hz values don't cause nice +0 processes to look like they are
  niced
* change the algorithm slightly, and reorganize the code a lot
* fix percent-CPU calculation bugs, and eliminate some no-op code

=== nice bug === Correctly divide the scheduler queues between niced and
compute-bound processes. The current nice weight of two (sort of, see
`algorithm change' below) neatly divides the USRPRI queues in half; this
should have been used to clip p_estcpu, instead of UCHAR_MAX.  Besides
being the wrong amount, clipping an unsigned char to UCHAR_MAX is a no-op,
and it was done after decay_cpu() which can only _reduce_ the value.  It
has to be kept <= NICE_WEIGHT * PRIO_MAX - PPQ or processes can
scheduler-penalize themselves onto the same queue as nice +20 processes.
(Or even a higher one.)

=== New schedclk() mechansism === Some platforms should be cutting down
stathz before hitting the scheduler, since the scheduler algorithm only
works right in the vicinity of 64 Hz. Rather than prescale hz, then scale
back and forth by 4 every time p_estcpu is touched (each occurance an
abstraction violation), use p_estcpu without scaling and require schedhz
to be generated directly at the right frequency. Use a default stathz (well,
actually, profhz) / 4, so nothing changes unless a platform defines schedhz
and a new clock.  Define these for alpha, where hz==1024, and nice was
totally broke.

=== Algorithm change === The nice value used to be added to the
exponentially-decayed scheduler history value p_estcpu, in _addition_ to
be incorporated directly (with greater wieght) into the priority calculation.
At first glance, it appears to be a pointless increase of 1/8 the nice
effect (pri = p_estcpu/4 + nice*2), but it's actually at least 3x that
because it will ramp up linearly but be decayed only exponentially, thus
converging to an additional .75 nice for a loadaverage of one. I killed
this, it makes the behavior hard to control, almost impossible to analyze,
and the effect (~~nothing at for the first second, then somewhat increased
niceness after three seconds or more, depending on load average) pointless.

=== Other bugs === hz -> profhz in the p_pctcpu = f(p_cpticks) calcuation.
Collect scheduler functionality. Try to put each abstraction in just one
place.
----------------------------

The details are a little different in FreeBSD:

=== nice bug ===   Fixing this is the main point of this commit.  We use
essentially the same clipping rule as NetBSD (our limit on p_estcpu
differs by a scale factor).  However, clipping at all is fundamentally
bad.  It gives free CPU the hoggiest hogs once they reach the limit, and
reaching the limit is normal for long-running hogs.  This will be fixed
later.

=== New schedclk() mechanism ===  We don't use the NetBSD schedclk()
(now schedclock()) mechanism.  We require (real)stathz to be about 128
and scale by an extra factor of 2 compared with NetBSD's statclock().
We scale p_estcpu instead of scaling the clock.  This is more accurate
and flexible.

=== Algorithm change ===  Same change.

=== Other bugs ===  The p_pctcpu bug was fixed long ago.  We don't try as
hard to abstract functionality yet.

Related changes: the new limit on p_estcpu must be exported to kern_exit.c
for clipping in wait1().

Agreed with by:		dufault
1999-11-28 12:12:14 +00:00
Bruce Evans
f0ebe4973f Scheduler fixes equivalent to the ones logged in the following NetBSD
commit to kern_synch.c:

  ----------------------------
  revision 1.55
  date: 1999/02/23 02:56:03;  author: ross;  state: Exp;  lines: +39 -10
  Scheduler bug fixes and reorganization
  * fix the ancient nice(1) bug, where nice +20 processes incorrectly
    steal 10 - 20% of the CPU, (or even more depending on load average)
  * provide a new schedclk() mechanism at a new clock at schedhz, so high
    platform hz values don't cause nice +0 processes to look like they are
    niced
  * change the algorithm slightly, and reorganize the code a lot
  * fix percent-CPU calculation bugs, and eliminate some no-op code

  === nice bug === Correctly divide the scheduler queues between niced and
  compute-bound processes. The current nice weight of two (sort of, see
  `algorithm change' below) neatly divides the USRPRI queues in half; this
  should have been used to clip p_estcpu, instead of UCHAR_MAX.  Besides
  being the wrong amount, clipping an unsigned char to UCHAR_MAX is a no-op,
  and it was done after decay_cpu() which can only _reduce_ the value.  It
  has to be kept <= NICE_WEIGHT * PRIO_MAX - PPQ or processes can
  scheduler-penalize themselves onto the same queue as nice +20 processes.
  (Or even a higher one.)

  === New schedclk() mechansism === Some platforms should be cutting down
  stathz before hitting the scheduler, since the scheduler algorithm only
  works right in the vicinity of 64 Hz. Rather than prescale hz, then scale
  back and forth by 4 every time p_estcpu is touched (each occurance an
  abstraction violation), use p_estcpu without scaling and require schedhz
  to be generated directly at the right frequency. Use a default stathz (well,
  actually, profhz) / 4, so nothing changes unless a platform defines schedhz
  and a new clock.  Define these for alpha, where hz==1024, and nice was
  totally broke.

  === Algorithm change === The nice value used to be added to the
  exponentially-decayed scheduler history value p_estcpu, in _addition_ to
  be incorporated directly (with greater wieght) into the priority calculation.
  At first glance, it appears to be a pointless increase of 1/8 the nice
  effect (pri = p_estcpu/4 + nice*2), but it's actually at least 3x that
  because it will ramp up linearly but be decayed only exponentially, thus
  converging to an additional .75 nice for a loadaverage of one. I killed
  this, it makes the behavior hard to control, almost impossible to analyze,
  and the effect (~~nothing at for the first second, then somewhat increased
  niceness after three seconds or more, depending on load average) pointless.

  === Other bugs === hz -> profhz in the p_pctcpu = f(p_cpticks) calcuation.
  Collect scheduler functionality. Try to put each abstraction in just one
  place.
  ----------------------------

The details are a little different in FreeBSD:

=== nice bug ===   Fixing this is the main point of this commit.  We use
essentially the same clipping rule as NetBSD (our limit on p_estcpu
differs by a scale factor).  However, clipping at all is fundamentally
bad.  It gives free CPU the hoggiest hogs once they reach the limit, and
reaching the limit is normal for long-running hogs.  This will be fixed
later.

=== New schedclk() mechanism ===  We don't use the NetBSD schedclk()
(now schedclock()) mechanism.  We require (real)stathz to be about 128
and scale by an extra factor of 2 compared with NetBSD's statclock().
We scale p_estcpu instead of scaling the clock.  This is more accurate
and flexible.

=== Algorithm change ===  Same change.

=== Other bugs ===  The p_pctcpu bug was fixed long ago.  We don't try as
hard to abstract functionality yet.

Related changes: the new limit on p_estcpu must be exported to kern_exit.c
for clipping in wait1().

Agreed with by:		dufault
1999-11-28 12:12:13 +00:00
Peter Wemm
64f86df1ed Take a shot at implementing the fix for PR 15014 for the a.out kernel
linker as well.

PR:		15014
Submitted by:	Vladimir N. Silyaev <vns@delta.odessa.ua>
1999-11-28 12:06:29 +00:00
Peter Wemm
b5abfb708c Fix an embarresing mistake in the kld symbol lookup for DDB. It should
now correctly do a traceback when crashing inside a KLD module.

PR:		15014
Submitted by:	Vladimir N. Silyaev <vns@delta.odessa.ua>
1999-11-28 11:59:18 +00:00
Bruce Evans
9bc8d885ed Updated comments for the move in the previous commit. 1999-11-27 15:27:11 +00:00
Bruce Evans
71a62f8a05 Fixed some comments in statclock(). The previous commit made it clearer
that one comment was attached to null code.
1999-11-27 14:37:34 +00:00
Bruce Evans
8a9d4d98b1 Moved scheduling-related code to kern_synch.c so that it is easier to fix
and extend.  The new function containing the code is named schedclock()
as in NetBSD, but it has slightly different semantics (it already handles
incrementation of p->p_cpticks, and it should handle any calling frequency).

Agreed with in principle by:	dufault
1999-11-27 12:32:27 +00:00
Poul-Henning Kamp
71e4fff823 Retire MFS_ROOT and MFS_ROOT_SIZE options from the MFS implementation.
Add MD_ROOT and MD_ROOT_SIZE options to the md driver.

Make the md driver handle MFS_ROOT and MFS_ROOT_SIZE options for compatibility.

Add md driver to GENERIC, PCCARD and LINT.

This is a cleanup which removes the need for some of the worse hacks in
MFS:  We really want to have a rootvnode but MFS on a preloaded image
doesn't really have one.  md is a true device, so it is less trouble.

This has been tested with make release, and if people remember to add
the "md" pseudo-device to their kernels, PicoBSD should be just fine
as well.  If people have no other use for MFS, it can be removed from
the kernel.
1999-11-26 20:08:44 +00:00