Commit Graph

2894 Commits

Author SHA1 Message Date
Poul-Henning Kamp
8177437d85 Complete the bio/buf divorce for all code below devfs::strategy
Exceptions:
        Vinum untouched.  This means that it cannot be compiled.
        Greg Lehey is on the case.

        CCD not converted yet, casts to struct buf (still safe)

        atapi-cd casts to struct buf to examine B_PHYS
2000-04-15 05:54:02 +00:00
Doug Rabson
f7b7769172 * Factor out the object system from new-bus so that it can be used by
non-device code.
* Re-implement the method dispatch to improve efficiency. The new system
  takes about 40ns for a method dispatch on a 300Mhz PII which is only
  10ns slower than a direct function call on the same hardware.

This changes the new-bus ABI slightly so make sure you re-compile any
driver modules which you use.
2000-04-08 14:17:18 +00:00
Archie Cobbs
b76f24f759 Fix a bug where SIGIO was not being delivered to a process requesting
async I/O when a tty device became writable.

PR:		kern/8324
Submitted by:	Don Lewis <Don.Lewis@tsc.tdk.com>
2000-04-05 18:38:21 +00:00
Alfred Perlstein
6288517674 regenerate with MPSAFE from syscalls.master 2000-04-03 06:36:57 +00:00
Alfred Perlstein
c01df63183 Make makesyscalls.sh parse an optional field 'MPSAFE' that specifies
that a syscall does not want the BGL to be grabbed automatically.

Add the new MPSAFE flag to the syscalls that dillon has determined to
be MPSAFE.
2000-04-03 06:36:14 +00:00
Poul-Henning Kamp
282ac69ede Clone bio versions of certain bits of infrastructure:
devstat_end_transaction_bio()
        bioq_* versions of bufq_* incl bioqdisksort()
the corresponding "buf" versions will disappear when no longer used.

Move b_offset, b_data and b_bcount to struct bio.

Add BIO_FORMAT as a hack for fd.c etc.

We are now largely ready to start converting drivers to use struct
bio instead of struct buf.
2000-04-02 19:08:05 +00:00
Matthew Dillon
7c8fdcbd19 Make the sigprocmask() and geteuid() system calls MP SAFE. Expand
commentary for copyin/copyout to indicate that they are MP SAFE as
    well.

Reviewed by: msmith
2000-04-02 17:52:43 +00:00
Poul-Henning Kamp
c244d2de43 Move B_ERROR flag to b_ioflags and call it BIO_ERROR.
(Much of this done by script)

Move B_ORDERED flag to b_ioflags and call it BIO_ORDERED.

Move b_pblkno and b_iodone_chain to struct bio while we transition, they
will be obsoleted once bio structs chain/stack.

Add bio_queue field for struct bio aware disksort.

Address a lot of stylistic issues brought up by bde.
2000-04-02 15:24:56 +00:00
Poul-Henning Kamp
8c125869a9 Draw the outline of "struct bio".
Struct bio is the future carrier of I/O requests for "struct buf".
2000-04-02 09:26:51 +00:00
Matthew Dillon
e4649cfac3 Change the write-behind code to take more care when starting
async I/O's.  The sequential read heuristic has been extended to
    cover writes as well.  We continue to call cluster_write() normally,
    thus blocks in the file will still be reallocated for large (but still
    random) I/O's, but I/O will only be initiated for truely sequential
    writes.

    This solves a number of annoying situations, especially with DBM (hash
    method) writes, and also has the side effect of fixing a number of
    (stupid) benchmarks.

Reviewed-by: mckusick
2000-04-02 00:55:28 +00:00
Brian Feldman
76e90dbcc9 Unstaticize this driver. You can have as many snoop devices as you can
mknod :)

Clean things up a lot while I'm here.  A lot of KNF changes.
2000-04-02 00:35:37 +00:00
Warner Losh
ce73953a1e device_set_unit() DO NOT USE THIS. This was approved before 4.0
release for inclusion into the release, but bde talked me out of
committing the module that needs this until after the release.  It is
after the release now. :-)
2000-04-01 06:06:37 +00:00
Peter Wemm
a84e0a1cfe Remove #ifdef for sem_wakeup() - we just use wakeup(). 2000-03-30 11:35:25 +00:00
Peter Wemm
255108f385 Make sysv-style shared memory tuneable params fully runtime adjustable
via sysctl.  It's done pretty simply but it should be quite adequate.
Also move SHMMAXPGS from $machine/include/vmparam.h as the comments that
went with it were wrong... we don't allocate KVM space for the pages so
that comment is bogus..  The only practical limit is how much physical
ram you want to lock up as this stuff isn't paged out or swap backed.
2000-03-30 07:17:05 +00:00
Matthew Dillon
db6a426158 The SMP cleanup commit broke UP compiles. Make UP compiles work again. 2000-03-28 18:06:49 +00:00
Matthew Dillon
36e9f877df Commit major SMP cleanups and move the BGL (big giant lock) in the
syscall path inward.  A system call may select whether it needs the MP
    lock or not (the default being that it does need it).

    A great deal of conditional SMP code for various deadended experiments
    has been removed.  'cil' and 'cml' have been removed entirely, and the
    locking around the cpl has been removed.  The conditional
    separately-locked fast-interrupt code has been removed, meaning that
    interrupts must hold the CPL now (but they pretty much had to anyway).
    Another reason for doing this is that the original separate-lock for
    interrupts just doesn't apply to the interrupt thread mechanism being
    contemplated.

    Modifications to the cpl may now ONLY occur while holding the MP
    lock.  For example, if an otherwise MP safe syscall needs to mess with
    the cpl, it must hold the MP lock for the duration and must (as usual)
    save/restore the cpl in a nested fashion.

    This is precursor work for the real meat coming later: avoiding having
    to hold the MP lock for common syscalls and I/O's and interrupt threads.
    It is expected that the spl mechanisms and new interrupt threading
    mechanisms will be able to run in tandem, allowing a slow piecemeal
    transition to occur.

    This patch should result in a moderate performance improvement due to
    the considerable amount of code that has been removed from the critical
    path, especially the simplification of the spl*() calls.  The real
    performance gains will come later.

Approved by: jkh
Reviewed by: current, bde (exception.s)
Some work taken from: luoqi's patch
2000-03-28 07:16:37 +00:00
Matthew Dillon
7c58e473f5 Commit the buffer cache cleanup patch to 4.x and 5.x. This patch fixes a
fragmentation problem due to geteblk() reserving too much space for the
    buffer and imposes a larger granularity (16K) on KVA reservations for
    the buffer cache to avoid fragmentation issues.  The buffer cache size
    calculations have been redone to simplify them (fewer defines, better
    comments, less chance of running out of KVA).

    The geteblk() fix solves a performance problem that DG was able reproduce.

    This patch does not completely fix the KVA fragmentation problems, but
    it goes a long way

Mostly Reviewed by: bde and others
Approved by: jkh
2000-03-27 21:29:33 +00:00
Kris Kennaway
8c6ac5e5a5 Reword warning to make it clearer (I read it as "remove block devices created
before 2000-06-01" which is obviously not what was intended :-)
2000-03-25 21:10:20 +00:00
Matthew Dillon
f1924a54f8 Fix in-kernel infinite loop in pipe_write() when the reader goes away
at just the wrong time.
2000-03-24 00:47:37 +00:00
Poul-Henning Kamp
e004067750 Whine at users who still have block devices in /dev, give them until
june 1st to fix their system.
2000-03-21 19:25:56 +00:00
Paul Saab
e5a28db9f5 Add sysctl kern.coredump to enable/disable core dumps system wide. 2000-03-21 07:10:42 +00:00
Brian Feldman
16aae9cbc0 Split the logic of
static int setrootbyname(char *name);
out into
   dev_t getdiskbyname(char *name);

This makes it easy to create a new DDB command, which is the big reason
for the change.  You can now do the following in DDB:

Example rc.conf entry:
dumpdev="/dev/ad0s1b"   # Device name to crashdump to (if enabled).

db> show disk/ad0s1b
dev_t = 0xc0b7ea00
db> p *dumpdev
c0b7ea00
2000-03-20 16:28:35 +00:00
Poul-Henning Kamp
91266b96c4 Isolate the Timecounter internals in their own two files.
Make the public interface more systematically named.

Remove the alternate method, it doesn't do any good, only ruins performance.

Add counters to profile the usage of the 8 access functions.

Apply the beer-ware to my code.

The weird +/- counts are caused by two repocopies behind the scenes:
	kern/kern_clock.c -> kern/kern_tc.c
	sys/time.h -> sys/timetc.h
(thanks peter!)
2000-03-20 14:09:06 +00:00
Poul-Henning Kamp
ce6acbb664 diff, patch and cvs didn't like these three last time around, try again. 2000-03-20 12:34:21 +00:00
Poul-Henning Kamp
b99c307a21 Rename the existing BUF_STRATEGY() to DEV_STRATEGY()
substitute BUF_WRITE(foo) for VOP_BWRITE(foo->b_vp, foo)

substitute BUF_STRATEGY(foo) for VOP_STRATEGY(foo->b_vp, foo)

This patch is machine generated except for the ccd.c and buf.h parts.
2000-03-20 11:29:10 +00:00
Poul-Henning Kamp
21144e3bf1 Remove B_READ, B_WRITE and B_FREEBUF and replace them with a new
field in struct buf: b_iocmd.  The b_iocmd is enforced to have
exactly one bit set.

B_WRITE was bogusly defined as zero giving rise to obvious coding
mistakes.

Also eliminate the redundant struct buf flag B_CALL, it can just
as efficiently be done by comparing b_iodone to NULL.

Should you get a panic or drop into the debugger, complaining about
"b_iocmd", don't continue.  It is likely to write on your disk
where it should have been reading.

This change is a step in the direction towards a stackable BIO capability.

A lot of this patch were machine generated (Thanks to style(9) compliance!)

Vinum users:  Greg has not had time to test this yet, be careful.
2000-03-20 10:44:49 +00:00
Bill Fenner
95b2b777b5 Make sure to free the socket in soabort() if the protocol couldn't
free it (this could happen if the protocol already freed its part
 and we just kept the socket around to make sure accept(2) didn't block)
2000-03-18 08:56:56 +00:00
Chris Costello
b081a64afb In vn_isdisk(), check whether vp->v_rdev is NULL. If it is, then
return ENXIO (Device not configured).  Without this, vn_isdisk()
could (and did in the case of lstat() under fdesc) pass a NULL pointer
to devsw(), which caused a page fault.

Reviewed by:	alfred
2000-03-18 01:27:44 +00:00
Nick Hibma
846664235c Instead of using the next unit available, use the first unit available.
This avoids the unit number from going up indefinitely when
diconnecting and connecting 2 devices alternately.

Noticed by: nsayer (quite a while ago)

And stop calling DEVICE_NOMATCH at probe repeatedly. This stops the
message on the PCI VGA board from being printed when loading a PCI driver.
2000-03-16 09:32:59 +00:00
Poul-Henning Kamp
db5f635acc Eliminate the undocumented, experimental, non-delivering and highly
dangerous MAX_PERF option.
2000-03-16 08:51:55 +00:00
Jun Kuriyama
dc76063419 Print "previous type" correctly when INVARIANTS is defined.
Reviewed by:	current@FreeBSD.org
2000-03-14 14:58:04 +00:00
Bruce Evans
05ecdd7037 Don't try so hard to make the lower 16 bits of fsids unique. It tended
to recycle full fsids after only 16 mount/unmount's.  This is probably
too often for exported fsids.  Now we recycle the full fsids only
after 2^16 mount/ umount's and only ensure uniqueness in the lower 16
bits if there have been <= 256 calls to vfs_getnewfsid() since the
system started.
2000-03-14 14:19:49 +00:00
Brian S. Dean
56fc73ff9b In 'ipcperm()', only call 'suser()' if it is actually required.
Previously, it was being called whether it was needed or not and the
ASU flag was being set (as a side affect of calling 'suser()') in
cases where superuser privileges were not actually needed.  This was
all pointed out to me by Bruce Evans.

Reviewed by:	bde
2000-03-13 23:00:08 +00:00
Poul-Henning Kamp
7de472559c Remove unused 3rd argument from vsunlock() which abused B_WRITE. 2000-03-13 10:47:24 +00:00
Bruce Evans
61214975da Try harder to make the lower 16 bits of fsids unique. The vfs type
number was packed very wastefully, giving perfect non-uniqeness in
the lower 16 bits of fsids for filesystems with the same vfs type.
This made linux_stat() return perfectly non-unique (broken) 16-bit
st_dev's for nfs mount points, and effectively reduced mntid_base to
8 bits so that the vfs_getnewfsid() looped endlessly when there are
already 256 mounted filesystems with the required vfs type.

Approved by:	jkh
2000-03-12 14:23:21 +00:00
Alan Cox
af25d10c91 shmat: If VM_PROT_READ_IS_EXEC is defined and prot includes VM_PROT_READ,
VM_PROT_EXECUTE must be added to prot before calling vm_map_find.

Without this change, an mprotect on a shmat'ed region fails (when
it shouldn't).  This bug was reported Feb 28 by Brooks Davis
<brooks@one-eyed-alien.net> on -hackers.

Reviewed by:	bde
Approved by:	jkh
2000-03-10 09:11:24 +00:00
Yoshinobu Inoue
8692c02553 Enable SCM_RIGHTS on alpha. Allocate necessary buffer as conversion between
int and struct file *.

Approved by: jkh

Submitted by: brian
Reviewed by: bde, brian, peter
2000-03-09 15:15:27 +00:00
Bruce Evans
a4fcac54a1 Fixed a null pointer panic for dumpon(8) on a nonexistent device whose
driver uses the new disk layer.

Reviewed by:	phk
Approved by:	jkh
2000-03-09 12:40:41 +00:00
Yoshinobu Inoue
7d0d8dc306 CMSG_XXX macros alignment fixes to follow RFC2292.
Approved by: jkh

Submitted by: Partly from tech@openbsd
Reviewed by: itojun
2000-03-03 11:13:12 +00:00
Peter Dufault
6d9a8d3e8f I applied the wrong patch set. Back out anything associated
with the known bogus currtpriority.  This undoes the previous changes to
sys/i386/i386/trap.c, sys/alpha/alpha/trap.c, sys/sys/systm.h

Now we have the patch set approved by bde.

Approved by:	bde
2000-03-02 22:03:49 +00:00
Peter Dufault
383774c417 Patches that eliminate extra context switches in FIFO case.
Fixes p1003_1b regression test in the simple case of no RR and
FIFO processes competing.

Reviewed by:	jkh, bde
2000-03-02 16:20:07 +00:00
Brian S. Dean
e777d9c31a Fix a superuser credential check.
Reviewed by:	phk
Approved by:	jkh
2000-02-29 22:58:59 +00:00
Doug Rabson
1d9a6ae08b If a driver probe fails, unset it from the device. This fixes a problem
with certain multiport cards.

Approved by: jkh
2000-02-29 09:36:25 +00:00
Paul Saab
77ac690c97 Update a comment in elf_coredump to reflect that if you madvise
with MADV_NOCORE, its address space is also excluded from a core
file.

Pointed out by:	alc
2000-02-28 06:36:45 +00:00
Paul Saab
9730a5daab Add MAP_NOCORE to mmap(2), and MADV_NOCORE and MADV_CORE to madvise(2).
This
This feature allows you to specify if mmap'd data is included in
an application's corefile.

Change the type of eflags in struct vm_map_entry from u_char to
vm_eflags_t (an unsigned int).

Reviewed by:	dillon,jdp,alfred
Approved by:	jkh
2000-02-28 04:10:35 +00:00
Jordan K. Hubbard
8d0bf3d6f8 Add new oid, debug.boothowto. This allows userland apps to see
how the kernel was booted and perhaps do conditional things
based upon it (sysinstall, for example, will now turn Debug mode
on automatically if boot -v was done).

Submitted by:	msmith
Suggested by:	ulf
2000-02-25 11:43:08 +00:00
Yoshinobu Inoue
0b97e97cd2 Add length check to sbcreatecontrol().
Now this check is necessary because IPv6 source routing might use
  control data bigger than MLEN. (e.g. 16bytes IPv6 addr x 23 hops)
  Actually mbuf cluster should be used in uipc_socket.c:sbcreatecontrol()
  and uipc_syscalls.c:sockargs() when data size is bigger then MLEN,
  and such patches were already in KAME environment and have been
  confirmed to work well. I just forgot to merge them into 4.0, sorry.

  For safety, I'll postpone such patches until after 4.0 release.
  The effect of postponement is followings.
    -Ping6 source routing hops are limitted to around 6 or so.
    -If some apps do setsockopt IPV6_RTHDR and try to receive
     incoming IPv6 source routing info, it can't receive more
     than 6 hops source routing info.
     (But currently, no apps seems to be doing it.)

Approved by: jkh
2000-02-24 19:21:26 +00:00
Jason Evans
dd85920a4f Add the VFS_AIO config option and leave it off by default. Unless the
VFS_AIO option is specified, all aio-related syscalls return ENOSYS.

The aio code is very fragile right now, and is unsuitable for default
inclusion in a production shell box.

Approved by:	jkh
2000-02-23 07:44:25 +00:00
Brian S. Dean
de8050f9b8 Don't forget to reset the hardware debug registers when a process that
was using them exits.

Don't allow a user process to cause the kernel to take a TRCTRAP on a
user space address.

Reviewed by:	jlemon, sef
Approved by:	jkh
2000-02-20 20:51:23 +00:00
Peter Wemm
f082218c18 Fix select(2) for the Alpha. (!!) It was never returning true for
fd's in the range of 32-63, 96-127 etc.  The first problem was the
FD_*() macros were shifting a 32 bit integer "1" left by more than
32 bits.  The same problem happened in selscan().  ffs() also takes
an int argument and causes failure.  For cases where int == long
(ie: the usual case for x86, but not always as gcc can have long
being a 64 bit quantity) ffs() could be used.

Reported by:	Marian Stagarescu <marian@bile.skycache.com>
Reviewed by:	dfr, gallatin (sys/types.h only)
Approved by:	jkh
2000-02-20 13:36:26 +00:00