5731 Commits

Author SHA1 Message Date
David Xu
81fd489272 detect idle kse correctly. 2002-10-22 02:27:19 +00:00
Kirk McKusick
9e4b381a54 This update removes a race between unmount and lookup. The lookup
locks the mount point directory while waiting for vfs_busy to clear.
Meanwhile the unmount which holds the vfs_busy lock tried to lock
the mount point vnode. The fix is to observe that it is safe for the
unmount to remove the vnode from the mount point without locking it.
The lookup will wait for the unmount to complete, then recheck the
mount point when the vfs_busy lock clears.

Sponsored by:	DARPA & NAI Labs.
2002-10-22 01:06:44 +00:00
Kirk McKusick
e03486d198 This checkin reimplements the io-request priority hack in a way
that works in the new threaded kernel. It was commented out of
the disksort routine earlier this year for the reasons given in
kern/subr_disklabel.c (which is where this code used to reside
before it moved to kern/subr_disk.c):

----------------------------
revision 1.65
date: 2002/04/22 06:53:20;  author: phk;  state: Exp;  lines: +5 -0
Comment out Kirks io-request priority hack until we can do this in a
civilized way which doesn't cause grief.

The problem is that it is not generally safe to cast a "struct bio
*" to a "struct buf *".  Things like ccd, vinum, ata-raid and GEOM
constructs bio's which are not entrails of a struct buf.

Also, curthread may or may not have anything to do with the I/O request
at hand.

The correct solution can either be to tag struct bio's with a
priority derived from the requesting threads nice and have disksort
act on this field, this wouldn't address the "silly-seek syndrome"
where two equal processes bang the diskheads from one edge to the
other of the disk repeatedly.

Alternatively, and probably better: a sleep should be introduced
either at the time the I/O is requested or at the time it is completed
where we can be sure to sleep in the right thread.

The sleep also needs to be in constant timeunits, 1/hz can be practicaly
any sub-second size, at high HZ the current code practically doesn't
do anything.
----------------------------

As suggested in this comment, it is no longer located in the disk sort
routine, but rather now resides in spec_strategy where the disk operations
are being queued by the thread that is associated with the process that
is really requesting the I/O. At that point, the disk queues are not
visible, so the I/O for positively niced processes is always slowed
down whether or not there is other activity on the disk.

On the issue of scaling HZ, I believe that the current scheme is
better than using a fixed quantum of time. As machines and I/O
subsystems get faster, the resolution on the clock also rises.
So, ten years from now we will be slowing things down for shorter
periods of time, but the proportional effect on the system will
be about the same as it is today. So, I view this as a feature
rather than a drawback. Hence this patch sticks with using HZ.

Sponsored by:	DARPA & NAI Labs.
Reviewed by:	Poul-Henning Kamp <phk@critter.freebsd.dk>
2002-10-22 00:59:49 +00:00
Poul-Henning Kamp
c177d125bf GEOM does not (and shall not) propagate flags like D_MEMDISK, so we will
revert to checking the name to determine if our root device is a ramdisk,
md(4) specifically to determine if we should attempt the root-mount RW

Sponsored by:	DARPA & NAI Labs.
2002-10-21 20:09:59 +00:00
Dag-Erling Smørgrav
6d0369001a Reduce the overhead of the mutex statistics gathering code, try to produce
shorter lines in the report, and clean up some minor style issues.
2002-10-21 18:48:28 +00:00
Olivier Houchard
e3bf3aea25 One #include <sys/sysctl.h> should be enough.
Approved by:	mux (mentor)
2002-10-21 18:40:40 +00:00
Brooks Davis
29e1b85f97 Use if_printf(ifp, "blah") instead of
printf("%s%d: blah", ifp->if_name, ifp->if_xname).
2002-10-21 02:51:56 +00:00
Thomas Moestl
5775150869 Fix the calculations of the length of the unread message buffer
contents. The code was subtracting two unsigned ints, stored the
result in a log and expected it to be the same as of a signed
subtraction; this does only work on platforms where int and long
have the same size (due to overflows).
Instead, cast to long before the subtraction; the numbers are
guaranteed to be small enough so that there will be no overflows
because of that.
2002-10-20 23:13:05 +00:00
Poul-Henning Kamp
962414a120 We have memset() and memcpy() in the kernel now, so we don't need to
#define them to bzero and bcopy.

Spotted by:	FlexeLint
2002-10-20 22:33:42 +00:00
Julian Elischer
2f030624b1 Add an actual implementation of kse_wakeup()
Submitted by:	Davidxu
2002-10-20 21:08:47 +00:00
Thomas Moestl
e381d2455b Add kernel dump support, based on the ia64 version (which was committed
as sparc64/sparc64/dump_machdep.c a while back).
Other than ia64 (which uses ELF), sparc64 uses a homegrown format for
the dumps (headers are required because the physical address and size of
the tsb must be noted, and because physical memory may be discontiguous);
ELF would not offer any advantages here.

Reviewed by:	jake
2002-10-20 17:03:15 +00:00
Poul-Henning Kamp
ab33958276 #unifdef the code for checking blessed lock collisions until we need it.
Spotted by:	DARPA & NAI Labs.
2002-10-20 08:48:39 +00:00
Robert Watson
a13c67da35 If MAC_MAX_POLICIES isn't defined, don't try to define it, just let the
compile fail.  MAC_MAX_POLICIES should always be defined, or we have
bigger problems at hand.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-10-20 03:41:09 +00:00
Peter Wemm
8556393bb2 Stake a claim on 418 (__xstat), 419 (__xfstat), 420 (__xlstat) 2002-10-19 22:25:31 +00:00
Peter Wemm
c8447553b5 Grab 416/417 real estate before I get burned while testing again.
This is for the not-quite-ready signal/fpu abi stuff.  It may not see
the light of day, but I'm certainly not going to be able to validate it
when getting shot in the foot due to syscall number conflicts.
2002-10-19 22:09:23 +00:00
Robert Watson
b614dd131a Add a new 'NOMACCHECK' flag to namei() NDINIT flags, which permits the
caller to indicate that MAC checks are not required for the lookup.
Similar to IO_NOMACCHECK for vn_rdwr(), this indicates that the caller
has already performed all required protections and that this is an
internally generated operation.  This will be used by the NFS server
code, as we don't currently enforce MAC protections against requests
delivered via NFS.

While here, add NOCROSSMOUNT to PARAMASK; apparently this was used at
one point for name lookup flag checking, but isn't any longer or it
would have triggered from the NFS server code passing it to indicate
that mountpoints shouldn't be crossed in lookups.

Approved by:	re
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-10-19 21:25:51 +00:00
Robert Watson
3ab93f0958 Regen from addition of execve_mac placeholder. 2002-10-19 21:15:10 +00:00
Robert Watson
bc5245d94c Add a placeholder for the execve_mac() system call, similar to SELinux's
execve_secure() system call, which permits a process to pass in a label
for a label change during exec.  This permits SELinux to change the
label for the resulting exec without a race following a manual label
change on the process.  Because this interface uses our general purpose
MAC label abstraction, we call it execve_mac(), and wrap our port of
SELinux's execve_secure() around it with appropriate sid mappings.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-10-19 21:06:57 +00:00
Robert Watson
89c61753a0 Drop in the MAC check for file creation as part of open().
Approved by:	re
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-10-19 20:56:44 +00:00
Robert Watson
9aeffb2b28 Make sure to clear the 'registered' flag for MAC policies when they
unregister.  Under some obscure (perhaps demented) circumstances,
this can result in a panic if a policy is unregistered, and then someone
foolishly unregisters it again.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-10-19 20:30:12 +00:00
Robert Watson
7587203c2f Hook up most of the MAC entry points relating to file/directory/node
creation, deletion, and rename.  There are one or two other stray
cases I'll catch in follow-up commits (such as unix domain socket
creation); this permits MAC policy modules to limit the ability to
perform these operations based on existing UNIX credential / vnode
attributes, extended attributes, and security labels.  In the rename
case using MAC, we now have to lock the from directory and file
vnodes for the MAC check, but this is done only in the MAC case,
and the locks are immediately released so that the remainder of the
rename implementation remains the same.  Because the create check
takes a vattr to know object type information, we now initialize
additional fields in the VATTR passed to VOP_SYMLINK() in the MAC
case.

Approved by:	re
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-10-19 20:25:57 +00:00
Marcel Moolenaar
1aeb23cdfa Add two hooks to signal module load and module unload to MD code.
The primary reason for this is to allow MD code to process machine
specific attributes, segments or sections in the ELF file and
update machine specific state accordingly. An immediate use of this
is in the ia64 port where unwind information is updated to allow
debugging and tracing in/across modules. Note that this commit
does not add the functionality to the ia64 port. See revision 1.9
of ia64/ia64/elf_machdep.c.

Validated on: alpha, i386, ia64
2002-10-19 19:16:03 +00:00
Marcel Moolenaar
c143d6c24a Reduce code duplication by moving the common actions in
link_elf_init(), link_elf_link_preload_finish() and
link_elf_load_file() to link_elf_link_common_finish().
Since link_elf_init() did initializations as a side-effect
of doing the common actions, keep the initialization in
that function. Consequently, link_elf_add_gdb() is now also
called to insert the very first link_map() (ie the kernel).
2002-10-19 18:59:33 +00:00
Marcel Moolenaar
1720979bc5 Non-functional change in preparation of the next commit:
Move link_elf_add_gdb(), link_elf_delete_gdb() and link_elf_error()
near the top of the file. The *_gdb() functions are moved inside
the #ifdef DDB already present there.
2002-10-19 18:43:37 +00:00
Marcel Moolenaar
f5b07e11ad In link_elf_load_file(), when SPARSE_MAPPING is defined and we
cannot allocate ef->object, we freed ef before bailing out with
an error. This is wrong because ef=lf and when we have an error
and lf is non-NULL (which holds if we try to alloc ef->object),
we free lf and thus ef as part of the bailing-out.
2002-10-19 05:01:54 +00:00
Alfred Perlstein
871de19fab Don't leak memory in semop(2). (Fix a bug I introduced in rev 1.55.)
Detective work by: jake
2002-10-19 02:07:35 +00:00
John Baldwin
6222047300 Do not lock the process when calling fdfree() (this would have recursed on
a non-recursive lock, the proc lock, before) since we don't need it to
change p_fd.
2002-10-18 17:45:41 +00:00
John Baldwin
6d345e2a45 fdfree() clears p_fd for us, no need to do it again. 2002-10-18 17:44:39 +00:00
John Baldwin
4562d72638 Don't lock the proc lock to clear p_fd. p_fd isn't protected by the proc
lock.
2002-10-18 17:42:28 +00:00
Kirk McKusick
3a096f6c09 Have lockinit() initialize the debugging fields of a lock
when DEBUG_LOCKS is defined.

Sponsored by:	DARPA & NAI Labs.
2002-10-18 01:34:10 +00:00
Kirk McKusick
bc7bdd50c1 When the number of dirty buffers rises too high, the buf_daemon runs
to help clean up. After selecting a potential buffer to write, this
patch has it acquire a lock on the vnode that owns the buffer before
trying to write it. The vnode lock is necessary to avoid a race with
some other process holding the vnode locked and trying to flush its
dirty buffers. In particular, if the vnode in question is a snapshot
file, then the race can lead to a deadlock. To avoid slowing down the
buf_daemon, it does a non-blocking lock request when trying to lock
the vnode. If it fails to get the lock it skips over the buffer and
continues down its queue looking for buffers to flush.

Sponsored by:	DARPA & NAI Labs.
2002-10-18 01:29:59 +00:00
Maxim Sobolev
2e307eb8c9 Separate fiels reported by disk_err() with spaces, so that output doesn't
look cryptic.

MFC after:	1 week
2002-10-17 23:48:29 +00:00
Robert Drehmel
bb8992b32c Instead of (sizeof(source_buffer) - 1) bytes, copy at most
(sizeof(destination_buffer) - 1) bytes into the destination buffer.
This was not harmful because they currently both provide space for
(MAXCOMLEN + 1) bytes.
2002-10-17 21:02:02 +00:00
Robert Drehmel
e80fb43467 Use strlcpy() instead of strncpy() to copy NUL terminated strings
for safety and consistency.
2002-10-17 20:03:38 +00:00
Sam Leffler
3b132a615f fix kldload error return when a module is rejected because it's statically
linked in the kernel.  When this condition is detected deep in the linker
internals the EEXIST error code that's returned is stomped on and instead
an ENOEXEC code is returned.  This makes apps like sysinstall bitch.
2002-10-17 17:28:57 +00:00
Robert Drehmel
55c8556834 - Allocate only enough space for a temporary buffer to hold
the path including the terminating NUL character from
   `struct sockaddr_un' rather than SOCK_MAXADDRLEN bytes.
 - Use strlcpy() instead of strncpy() to copy strings.
2002-10-17 15:52:42 +00:00
Bosko Milekic
a91db09ec0 Fix a fairly subtle bug in mbuf_init() where the reference counter
contiguous space was being allocated from the clust_map
instead of the mbuf_map as the comments indicated.  This resulted in
some address space wastage in mbuf_map.

Submitted by: Rohit Jalan <rohjal@yahoo.co.in>
2002-10-16 19:59:08 +00:00
John Baldwin
5c0cc63c40 Add a missing PROC_UNLOCK in ptrace() for the PT_IO case.
PR:		kern/44065
Submitted by:	Mark Kettenis <kettenis@chello.nl>
2002-10-16 16:28:33 +00:00
John Baldwin
bf3e55aa2c Many style and whitespace fixes.
Submitted by:	bde (mostly)
2002-10-16 15:45:37 +00:00
John Baldwin
18d9bd8f65 Sort includes a bit.
Submitted by:	bde
2002-10-16 15:14:31 +00:00
Poul-Henning Kamp
c3053131ca Be consistent about funtions being static.
Spotted by:	FlexeLint
2002-10-16 10:42:13 +00:00
Sam Leffler
5d84645305 Replace aux mbufs with packet tags:
o instead of a list of mbufs use a list of m_tag structures a la openbsd
o for netgraph et. al. extend the stock openbsd m_tag to include a 32-bit
  ABI/module number cookie
o for openbsd compatibility define a well-known cookie MTAG_ABI_COMPAT and
  use this in defining openbsd-compatible m_tag_find and m_tag_get routines
o rewrite KAME use of aux mbufs in terms of packet tags
o eliminate the most heavily used aux mbufs by adding an additional struct
  inpcb parameter to ip_output and ip6_output to allow the IPsec code to
  locate the security policy to apply to outbound packets
o bump __FreeBSD_version so code can be conditionalized
o fixup ipfilter's call to ip_output based on __FreeBSD_version

Reviewed by:	julian, luigi (silent), -arch, -net, darren
Approved by:	julian, silence from everyone else
Obtained from:	openbsd (mostly)
MFC after:	1 month
2002-10-16 01:54:46 +00:00
Poul-Henning Kamp
7c61d7858c Plug a memory-leak.
"I think you're right" by:	jake
2002-10-15 18:58:38 +00:00
Poul-Henning Kamp
9736c8f03a Use ; not , as statement separator in PDEBUG() macro.
Ignoring a NULL dev in device_set_ivars() sounds wrong, KASSERT it to
non-NULL instead.

Do the same for device_get_ivars() for reasons of symmetry, though
it probably would have yielded a panic anyway, this gives more precise
diagnostics.

Absentmindedly nodded OK to by:	jhb
2002-10-15 18:56:13 +00:00
John Baldwin
7fd1f2b8bc Argh. Put back setting of P_ADVLOCK for the F_WRLCK case that was
accidentally lost in the previous revision.

Submitted by:	bde
Pointy hat to:	jhb
2002-10-15 18:10:13 +00:00
Marcel Moolenaar
47f750125b Fix kernel module loading on ia64. Cross-module function calls
were improperly relocated due to faulty logic in lookup_fdesc()
in elf_machdep.c. The symbol index (symidx) was bogusly used for
load modules other than the one the relocation applied to. This
resulted in bogus bindings and consequently runtime failures.

The fix is to use the symbol index only for the module being
relocated and to use the symbol name for look-ups in the
modules in the dependent list. As such, we need a function to
return the symbol name given the linker file and symbol index.
2002-10-15 05:40:07 +00:00
Peter Wemm
803cc8aa8f Restore pointer that was removed in 1.128. This wasn't a merge-o. 2002-10-15 01:36:45 +00:00
John Baldwin
c65440644e - Add a new global mutex 'ppeers_lock' to protect the p_peers list of
processes forked with RFTHREAD.
- Use a goto to a label for common code when exiting from fork1() in case
  of an error.
- Move the RFTHREAD linkage setup code later in fork since the ppeers_lock
  cannot be locked while holding a proc lock.  Handle the race of a task
  leader exiting and killing its peers while a peer is forking a new child.
  In that case, go ahead and let the peer process proceed normally as the
  parent is about to kill it.  However, the task leader may have already
  gone to sleep to wait for the peers to die, so the new child process may
  not receive a SIGKILL from the task leader.  Rather than try to destruct
  the new child process, just go ahead and send it a SIGKILL directly and
  add it to the p_peers list.  This ensures that the task leader will wait
  until both the peer process doing the fork() and the new child process
  have received their KILL signals and exited.

Discussed with:	truckman (earlier versions)
2002-10-15 00:14:32 +00:00
John Baldwin
60a6965a88 Remove the leaderp variable and just access p_leader directly. The
p_leader field is not protected by the proc lock but is only set during
fork1() by the parent process and never changes.
2002-10-15 00:03:40 +00:00
Alfred Perlstein
8ced1eb281 Remove a KASSERT I added in 1.73 to catch uninitialized pipes.
It must be removed because it is done without the pipe being locked
via pipelock() and therefore is vulnerable to races with pipespace()
erroneously triggering it by temporarily zero'ing out the structure
backing the pipe.

It looks as if this assertion is not needed because all manipulation
of the data changed by pipespace() _is_ protected by pipelock().

Reported by: kris, mckusick
2002-10-14 21:15:04 +00:00