Commit Graph

4672 Commits

Author SHA1 Message Date
Dag-Erling Smørgrav
e97c3e3d5c Rename runq_find() to runq_findproc(), and hide it behind #ifdef DIAGNOSTIC,
as it can have a severe impact on performance under high load, and the bug
it was meant to catch was fixed ages ago.
2002-03-06 15:34:07 +00:00
Maxim Konovalov
cf11f48256 Fix a typo, unbreak the world.
Thanks to:	mux
Approved by:	ru
2002-03-06 12:28:51 +00:00
Bruce Evans
3006e31679 Don't (blindly) truncate the unit number to 4 digits when formatting the
string returned by device_get_nameunit().
2002-03-06 11:34:02 +00:00
Maxim Konovalov
9dfd307b10 Maximum semid is seminfo.semmni not seminfo.semmsl.
PR:		kern/34979
Submitted by:	James Gritton <jamie@gritton.org>
Reviewed by:	alfred, ru
Approved by:	ru
MFC after:	1 week
2002-03-06 10:52:49 +00:00
Robert Watson
89e1164ee2 Three p_ucred -> td_ucred's missed in jhb's earlier pass; all appear to
be safe.
2002-03-05 19:45:45 +00:00
Robert Watson
b0ad6e203a The change from td->td_proc->p_ucred to td->td_ucred has shortened some
lines: more agressively line wrap under those circumstances.
2002-03-05 19:31:25 +00:00
John Baldwin
c6f55f33ea - Use td_ucred for jail checks.
- Move jail checks and some other checks involving constants and stack
  variables out from under Giant.  This isn't perfectly safe atm because
  jail_sysvipc_allowed is read w/o a lock meaning that its value could be
  stale.  This global variable will soon become a per-jail flag, however,
  at which time it will either not need a lock or will use the prison lock.
2002-03-05 18:57:36 +00:00
Eivind Eklund
f52bd684f3 * Move bswlist declaration and initialization from kern/vfs_bio.c to
vm/vm_pager.c, which is the only place it is used.
* Make the QUEUE_* definitions and bufqueues local to vfs_bio.c.
* constify buf_wmesg.
2002-03-05 18:20:58 +00:00
Eivind Eklund
04858e7ee4 Change wmesg to const char * instead of char * 2002-03-05 17:45:12 +00:00
Robert Watson
ba51c2659d Part II: update various mechanically generated files to allow for new
system call number allocations.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-03-05 16:13:01 +00:00
Robert Watson
11ffd032ff Reserve system call numbers for the MAC framework. This will prevent
people working on the MAC tree from getting toasted whenever system call
numbers are allocated in the main tree (for example, for KSE :-).
Calls allocated: __mac_{get,set}_proc, __mac_{get,set}_{fd,file}().

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-03-05 16:11:11 +00:00
Eivind Eklund
eb8e6d5276 Document all functions, global and static variables, and sysctls.
Includes some minor whitespace changes, and re-ordering to be able to document
properly (e.g, grouping of variables and the SYSCTL macro calls for them, where
the documentation has been added.)

Reviewed by:	phk (but all errors are mine)
2002-03-05 15:38:49 +00:00
Robert Drehmel
6f60771b6d Fix a warning. 2002-03-05 15:19:33 +00:00
Jeff Roberson
88c99cfbc8 Add a new variable mp_maxid. This is used so that per cpu datastructures may
be allocated as arrays indexed by the cpu id.  Previously the only reliable
way to know the max cpu id was through MAXCPU. mp_ncpus isn't useful here
because cpu ids may be sparsely mapped, although x86 and alpha do not do this.

Also, call cpu_mp_probe much earlier so the max cpu id is known before the VM
starts up.  This is intended to help support per cpu queues for the new
allocator, but may be useful elsewhere.

Reviewed by:	jake
Approved by:	jake
2002-03-05 10:01:46 +00:00
Seigo Tanimura
996abba928 Track the number of wired pages to avoid unwiring unwired pages.
Reviewed by:	alfred
2002-03-05 00:51:03 +00:00
Mitsuru IWASAKI
899ccf541a Add generalized power profile code.
This makes other power-management system (APM for now) to be able to
generate power profile change events (ie. AC-line status changes), and
other kernel components, not only the ACPI components, can be notified
the events.

 - move subroutines in acpi_powerprofile.c (removed) to kern/subr_power.c
 - call power_profile_set_state() also from APM driver when AC-line
   status changes
 - add call-back function for Crusoe LongRun controlling on power
   profile changes for a example
2002-03-04 18:46:13 +00:00
Bosko Milekic
5a4f147089 Fix bug in mb_alloc that made systems configured with
PAGE_SIZE / MCLBYTES == 1 crash. Fix them by changing the
appropriate "allocate new page and bucket" code in mb_alloc to use
the macro for properly grabbing an allocated object from a bucket,
the one that checks whether the bucket is empty.
This should allow ken to continue testing zero-copy stuff on -CURRENT.

Noticed and provided debug info: ken
2002-03-03 22:10:04 +00:00
Dima Dorfman
e74d483140 Check the version of ex_anon (a `struct xucred') before using it to
fill out netc_anon (a `struct ucred'), and add an XXX around the
entire operation since it isn't clear whether it's doing the right
thing with things like cr_uidinfo and cr_prison.
2002-03-03 06:07:57 +00:00
Seigo Tanimura
92c914f936 Fix lock leakage and late unlock.
Submitted by:	bde
2002-03-02 12:42:24 +00:00
Ian Dowse
167b8d0334 In sosend(), enforce the socket buffer limits regardless of whether
the data was supplied as a uio or an mbuf. Previously the limit was
ignored for mbuf data, and NFS could run the kernel out of mbufs
when an ipfw rule blocked retransmissions.
2002-02-28 11:22:40 +00:00
Warner Losh
0cf3c909d8 Remove now unused struct proc *p.
Approved by: jhb
2002-02-27 20:57:57 +00:00
John Baldwin
bdd67d483c - Change namei() to use td_ucred instead of p_ucred.
- Change the hack in access() that uses a temporary credential to set
  td_ucred to the temp cred instead of p_ucred.
2002-02-27 19:15:29 +00:00
John Baldwin
6f105b3444 - Change unp_listen() to accept a thread rather than a proc as its second
argument.
- Use td_ucred in unp_listen() instead of p_ucred.
2002-02-27 19:14:01 +00:00
John Baldwin
4a7d6cd251 Fix Giant leakage in several error cases in __semctl(). 2002-02-27 19:12:14 +00:00
John Baldwin
6bd7ad69a1 Add a comment about an unlocked access to p_ucred that will go away in
the near future.
2002-02-27 19:10:50 +00:00
Alfred Perlstein
9f01374de5 kill __P. 2002-02-27 18:51:53 +00:00
Alfred Perlstein
566c1313a3 add assertions in the places where giant is required to catch when
the pipe is locked and shouldn't be.

initialize pipe->pipe_mtxp to NULL when creating pipes in order not
to trip the above assertions.

swap pipe lock with giant around calls to pipe_destroy_write_buffer()

pipe_destroy_write_buffer issue noticed by: jhb
2002-02-27 18:49:58 +00:00
John Baldwin
a854ed9893 Simple p_ucred -> td_ucred changes to start using the per-thread ucred
reference.
2002-02-27 18:32:23 +00:00
John Baldwin
65e3406d28 Temporarily lock Giant while we update td_ucred. The proc lock doesn't
fully protect p_ucred yet so Giant is needed until all the p_ucred
locking is done.  This is the original reason td_ucred was not used
immediately after its addition.  Unfortunately, not using td_ucred is
not enough to avoid problems.  Since p_ucred could be stale, we could
actually be dereferencing a stale pointer to dink with the refcount, so
we really need Giant to avoid foot-shooting.  This allows td_ucred to
be safely used as well.
2002-02-27 18:30:01 +00:00
Alfred Perlstein
21dbcfd500 Fix a NULL deref panic in pipe_write, we can't blindly lock
pipe->pipe_peer->pipe_mtxp because it may be NULL, so lock the
passed in pipe's mutex instead.
2002-02-27 17:23:16 +00:00
Robert Drehmel
ad1ff0997e Make getcredhostname() take a buffer and the buffer's size
as arguments.  The correct hostname is copied into the buffer
while having the prison's lock acquired in a jailed process'
case.

Reviewed by:	jhb, rwatson
2002-02-27 16:43:20 +00:00
Robert Drehmel
9484d0c0e8 Add a function which returns the correct hostname for a given
credential.

Reviewed by:	phk
2002-02-27 14:58:32 +00:00
Alfred Perlstein
ffddaaeeeb MPsafe fixes:
use SYSINIT to initialize pipe_zone.
use PIPE_LOCK to protect kevent ops.
2002-02-27 11:27:48 +00:00
Seigo Tanimura
2f9325870d Return ESRCH if the target process is not inferior to the curproc.
Spotted by:	HIROSHI OOTA <oota@LSi.nec.co.jp>
2002-02-27 10:38:14 +00:00
Alfred Perlstein
e6be967434 Don't hardcode /sys when making tags, instead use ${.CURDIR}/.. this
fixes a problem where one tries to make tags when the source isn't in
/sys.

Submitted by: Jihui Yang <yangjihui@yahoo.com>
2002-02-27 10:07:15 +00:00
Peter Wemm
d1693e1701 Back out all the pmap related stuff I've touched over the last few days.
There is some unresolved badness that has been eluding me, particularly
affecting uniprocessor kernels.  Turning off PG_G helped (which is a bad
sign) but didn't solve it entirely.  Userland programs still crashed.
2002-02-27 09:51:33 +00:00
Alfred Perlstein
f81b04d96c First rev at making pipe(2) pipe's MPsafe.
Both ends of the pipe share a pool_mutex, this makes allocation
and deadlock avoidance easy.

Remove some un-needed FILE_LOCK ops while I'm here.

There are some issues wrt to select and the f{s,g}etown code that
we'll have to deal with, I think we may also need to move the calls
to vfs_timestamp outside of the sections covered by PIPE_LOCK.
2002-02-27 07:35:59 +00:00
Dima Dorfman
76183f3453 Introduce a version field to `struct xucred' in place of one of the
spares (the size of the field was changed from u_short to u_int to
reflect what it really ends up being).  Accordingly, change users of
xucred to set and check this field as appropriate.  In the kernel,
this is being done inside the new cru2x() routine which takes a
`struct ucred' and fills out a `struct xucred' according to the
former.  This also has the pleasant sideaffect of removing some
duplicate code.

Reviewed by:	rwatson
2002-02-27 04:45:37 +00:00
Peter Wemm
bd1e3a0f89 Jake further reduced IPI shootdowns on sparc64 in loops by using ranged
shootdowns in a couple of key places.  Do the same for i386.  This also
hides some physical addresses from higher levels and has it use the
generic vm_page_t's instead.  This will help for PAE down the road.

Obtained from:	jake (MI code, suggestions for MD part)
2002-02-27 02:14:58 +00:00
Matthew Dillon
181df8c9d4 revert last commit temporarily due to whining on the lists. 2002-02-26 20:33:41 +00:00
Matthew Dillon
f96ad4c223 STAGE-1 of 3 commit - allow (but do not require) interrupts to remain
enabled in critical sections and streamline critical_enter() and
critical_exit().

This commit allows an architecture to leave interrupts enabled inside
critical sections if it so wishes.  Architectures that do not wish to do
this are not effected by this change.

This commit implements the feature for the I386 architecture and provides
a sysctl, debug.critical_mode, which defaults to 1 (use the feature).  For
now you can turn the sysctl on and off at any time in order to test the
architectural changes or track down bugs.

This commit is just the first stage.  Some areas of the code, specifically
the MACHINE_CRITICAL_ENTER #ifdef'd code, is strictly temporary and will
be cleaned up in the STAGE-2 commit when the critical_*() functions are
moved entirely into MD files.

The following changes have been made:

	* critical_enter() and critical_exit() for I386 now simply increment
	  and decrement curthread->td_critnest.  They no longer disable
	  hard interrupts.  When critical_exit() decrements the counter to
	  0 it effectively calls a routine to deal with whatever interrupts
	  were deferred during the time the code was operating in a critical
	  section.

	  Other architectures are unaffected.

	* fork_exit() has been conditionalized to remove MD assumptions for
	  the new code.  Old code will still use the old MD assumptions
	  in regards to hard interrupt disablement.  In STAGE-2 this will
	  be turned into a subroutine call into MD code rather then hardcoded
	  in MI code.

	  The new code places the burden of entering the critical section
	  in the trampoline code where it belongs.

	* I386: interrupts are now enabled while we are in a critical section.
	  The interrupt vector code has been adjusted to deal with the fact.
	  If it detects that we are in a critical section it currently defers
	  the interrupt by adding the appropriate bit to an interrupt mask.

	* In order to accomplish the deferral, icu_lock is required.  This
	  is i386-specific.  Thus icu_lock can only be obtained by mainline
	  i386 code while interrupts are hard disabled.  This change has been
	  made.

	* Because interrupts may or may not be hard disabled during a
	  context switch, cpu_switch() can no longer simply assume that
	  PSL_I will be in a consistent state.  Therefore, it now saves and
	  restores eflags.

	* FAST INTERRUPT PROVISION.  Fast interrupts are currently deferred.
	  The intention is to eventually allow them to operate either while
	  we are in a critical section or, if we are able to restrict the
	  use of sched_lock, while we are not holding the sched_lock.

	* ICU and APIC vector assembly for I386 cleaned up.  The ICU code
	  has been cleaned up to match the APIC code in regards to format
	  and macro availability.  Additionally, the code has been adjusted
	  to deal with deferred interrupts.

	* Deferred interrupts use a per-cpu boolean int_pending, and
	  masks ipending, spending, and fpending.  Being per-cpu variables
	  it is not currently necessary to lock; bus cycles modifying them.

	  Note that the same mechanism will enable preemption to be
	  incorporated as a true software interrupt without having to
	  further hack up the critical nesting code.

	* Note: the old critical_enter() code in kern/kern_switch.c is
	  currently #ifdef to be compatible with both the old and new
	  methodology.  In STAGE-2 it will be moved entirely to MD code.

Performance issues:

	One of the purposes of this commit is to enhance critical section
	performance, specifically to greatly reduce bus overhead to allow
	the critical section code to be used to protect per-cpu caches.
	These caches, such as Jeff's slab allocator work, can potentially
	operate very quickly making the effective savings of the new
	critical section code's performance very significant.

	The second purpose of this commit is to allow architectures to
	enable certain interrupts while in a critical section.  Specifically,
	the intention is to eventually allow certain FAST interrupts to
	operate rather then defer.

	The third purpose of this commit is to begin to clean up the
	critical_enter()/critical_exit()/cpu_critical_enter()/
	cpu_critical_exit() API which currently has serious cross pollution
	in MI code (in fork_exit() and ast() for example).

	The fourth purpose of this commit is to provide a framework that
	allows kernel-preempting software interrupts to be implemented
	cleanly.  This is currently used for two forward interrupts in I386.
	Other architectures will have the choice of using this infrastructure
	or building the functionality directly into critical_enter()/
	critical_exit().

	Finally, this commit is designed to greatly improve the flexibility
	of various architectures to manage critical section handling,
	software interrupts, preemption, and other highly integrated
	architecture-specific details.
2002-02-26 17:06:21 +00:00
Bruce Evans
ffe4d2f7c7 Fixed 3 regressions in rev.1.99 (clobbering of the English fix in rev.1.98,
and 2 unformattings).
2002-02-26 16:17:45 +00:00
Søren Schmidt
ed57cfc480 Hide "bla bla exists, skipping it" behind bootverbose. 2002-02-26 10:38:33 +00:00
Poul-Henning Kamp
c91f7a7332 Cast the variable, not the constant to 64 bits. 2002-02-26 09:27:39 +00:00
Poul-Henning Kamp
0f5c7c4b1c Fix warning in !SMP case.
Submitted by:	 Maxime Henrion <mux@mu.org>
2002-02-26 09:21:52 +00:00
Poul-Henning Kamp
1634e90817 Remove unused variable. 2002-02-26 09:16:27 +00:00
Peter Wemm
e2256f43ed Fix warning. s/microuptime()/binuptime()/ for switchtime initial value. 2002-02-26 01:03:39 +00:00
Peter Wemm
bd47bef5aa Fix a warning. Do not assume pointer == long. 2002-02-26 00:55:27 +00:00
Peter Wemm
6bd95d70db Work-in-progress commit syncing up pmap cleanups that I have been working
on for a while:
- fine grained TLB shootdown for SMP on i386
- ranged TLB shootdowns.. eg: specify a range of pages to shoot down with
  a single IPI, since the IPI is very expensive.  Adjust some callers
  that used to trigger this inside tight loops to do a ranged shootdown
  at the end instead.
- PG_G support for SMP on i386 (options ENABLE_PG_G)
- defer PG_G activation till after we decide what we are going to do with
  PSE and the 4MB pages at the start of the kernel.  This should solve
  some rumored strangeness about stale PG_G entries getting stuck
  underneath the 4MB pages.
- add some instrumentation for the fine TLB shootdown
- convert some asm instruction wrappers from functions to inlines.  gcc
  seems to do a fair bit better with this.
- [temporarily!] pessimize the tlb shootdown IPI handlers.  I will fix
  this again shortly.

This has been working fairly well for me for a while, but I have tweaked
it again prior to commit since my last major testing round.  The only
outstanding problem that I know of is PG_G related, which is why there
is an option for it (not on by default for SMP).  I have seen a world
speedups by a few percent (as much as 4 or 5% in one case) but I have
*not* accurately measured this - I am a bit sceptical of these numbers.
2002-02-25 23:49:51 +00:00
Ian Dowse
ddb7d629f1 Sockets passed into uipc_abort() have been allocated by sonewconn()
but never accept'ed, so they must be destroyed. Originally, unp_drop()
detected this situation by checking if so->so_head is non-NULL.
However, since revision 1.54 of uipc_socket.c (Feb 1999), so->so_head
is set to NULL before calling soabort(), so any unix-domain sockets
waiting to be accept'ed are leaked if the server socket is closed.

Resolve this by moving the socket destruction code into uipc_abort()
itself, and making it unconditional (the other caller of unp_drop()
never needs the socket to be destroyed). Use unp_detach() to avoid
the original code duplication when destroying the socket.

PR:		kern/17895
Reviewed by:	dwmalone (an earlier version of the patch)
MFC after:	1 week
2002-02-25 00:03:34 +00:00
Poul-Henning Kamp
5b7d8efa8d Add a generation number to timecounters and spin if it changes under
our feet when we look inside timecounter structures.

Make the "sync_other" code more robust by never overwriting the
tc_next field.

Add counters for the bin[up]time functions.

Call tc_windup() in tc_init() and switch_timecounter() to make sure
we all the fields set right.
2002-02-24 20:04:07 +00:00
Poul-Henning Kamp
e9be968e95 Fix a typo (?) in previous commit told ttyprintf() to print the integer
part of the user-time as a 64bit quantity.  This resulted in weird
output from SIGINFO.
2002-02-24 19:56:41 +00:00
Seigo Tanimura
f591779bb5 Lock struct pgrp, session and sigio.
New locks are:

- pgrpsess_lock which locks the whole pgrps and sessions,
- pg_mtx which protects the pgrp members, and
- s_mtx which protects the session members.

Please refer to sys/proc.h for the coverage of these locks.

Changes on the pgrp/session interface:

- pgfind() needs the pgrpsess_lock held.

- The caller of enterpgrp() is responsible to allocate a new pgrp and
  session.

- Call enterthispgrp() in order to enter an existing pgrp.

- pgsignal() requires a pgrp lock held.

Reviewed by:	jhb, alfred
Tested on:	cvsup.jp.FreeBSD.org
		(which is a quad-CPU machine running -current)
2002-02-23 11:12:57 +00:00
Jake Burkholder
39dda4e363 Make this compile.
Pointy hat to:	julian
2002-02-23 01:42:13 +00:00
Julian Elischer
77c4066424 Add some DIAGNOSTIC code.
While in userland, keep the thread's ucred reference in a shadow
field so that the usual place to store it is NULL.
If DIAGNOSTIC is not set, the thread ucred is kept valid until the next
kernel entry, at which time it is checked against the process cred
and possibly corrected. Produces a BIG speedup in
kernels with INVARIANTS set. (A previous commit corrected it
for the non INVARIANTS case already)

Reviewed by:	dillon@freebsd.org
2002-02-22 23:58:22 +00:00
Andrew R. Reiter
e68baa7073 - Whitespace fixes leftover from previous commit.
Submitted by:	bde
2002-02-22 13:43:56 +00:00
Andrew R. Reiter
54c94c8a35 - Whitespace fixup left over from previous commit.
- Remove bogus cast.

Submitted by:	bde
2002-02-22 13:33:10 +00:00
Poul-Henning Kamp
1cbb9c3b03 Convert p->p_runtime and PCPU(switchtime) to bintime format. 2002-02-22 13:32:01 +00:00
Poul-Henning Kamp
4e2befc031 Use better scaling factor for NTPs correction.
Explain the magic.
2002-02-22 12:59:20 +00:00
Poul-Henning Kamp
57c10583aa GC: BIO_ORDERED, various infrastructure dealing with BIO_ORDERED. 2002-02-22 09:26:35 +00:00
Poul-Henning Kamp
986066d065 Replace bowrite() with BUF_WRITE in ufs.
Remove bowrite(), it is now unused.

This is the first step in getting entirely rid of BIO_ORDERED which is
a generally accepted evil thing.

Approved by:	mckusick
2002-02-22 09:03:00 +00:00
Andrew R. Reiter
8e92b63c6f - Massive style fixup.
Reviewed by: mike
Approved by: dfr
2002-02-22 04:14:49 +00:00
Boris Popov
cebcee2e9e Add support for iovcnt greater than 1. This should resolve problems
with some applications.

Obtained from:	Darwin project
MFC after:	2 weeks
2002-02-21 16:23:38 +00:00
Bruce Evans
19610b66d8 Fixed some style bugs. Added a comment about a bug in PT_SSTEP.
Approved by:	des
2002-02-21 04:47:38 +00:00
Bruce Evans
4b1aa58b5f Recover bits that were lost in transition in rev.1.76:
- P_INMEM checks in all the functions.  P_INMEM must be checked because
  PHOLD() is broken.  The old bits had bogus locking (using sched_lock)
  to lock P_INMEM.  After removing the P_INMEM checks, we were left with
  just the bogus locking.
- large comments.  They were too large, but better than nothing.

Remove obfuscations that were gained in transition in rev.1.76:
- PROC_REG_ACTION() is even more of an obfuscation than PROC_ACTION().

The change copies procfs_machdep.c rev.1.22 of i386/procfs_machdep.c
verbatim except for "fixing" the old-style function headers and adjusting
function names and comments.  It doesn't remove the bogus locking.

Approved by:	des
2002-02-21 04:37:55 +00:00
Julian Elischer
fd21c2b51c Oops, used wrong error value for unimplemented syscalls. 2002-02-20 22:27:09 +00:00
Peter Wemm
114730b0a8 Tidy up some unused variables 2002-02-20 21:25:44 +00:00
Andrew R. Reiter
b65420f968 - Fix style further by adding parentheses around return values so that
they look like:
	return (val);  instead of:  return val;
2002-02-20 16:05:30 +00:00
Andrew R. Reiter
287698b4f1 - Style.9 formatting fix; this commit is mostly white space related with
the next commit actually doing the:
	return val; -> return (val);
  changes.  This commit was done in preparation for getting ``struct
  modules'' locked down.

Reviewed by: bde
Approved by: dfr
2002-02-20 14:30:02 +00:00
Robert Watson
ec20f901a2 More cleanups relating to vm object allocation failure: make sure we
call VOP_CLOSE() with vp unlocked; clean up the return path a little,
in as much as our namei/vnode operation return paths can be cleared
up.  For a return case that was apparently never taken, this sure
is ugly.

Reviewed by:	jeffr
2002-02-20 00:11:57 +00:00
Mike Silbersack
cc6712ea04 A few misc forkbomb defenses:
- Leave 10 processes for root-only use, the previous
  value of 1 was insufficient to run ps ax | more.
- Remove the printing of "proc: table full".  When the table
  really is full, this would flood the screen/logs, making
  the problem tougher to deal with.
- Force any process trying to fork beyond its user's maximum
  number of processes to sleep for .5 seconds before returning
  failure.  This turns 2000 rampaging fork monsters into 2000
  harmlessly snoozing fork monsters.

Reviewed by:	dillon, peter
MFC after:	1 week
2002-02-19 03:15:28 +00:00
Julian Elischer
c28841c1da Add stub syscalls and definitions for KSE calls.
"Book'em Danno"
2002-02-19 02:40:31 +00:00
Julian Elischer
8a2c87e7c7 Add 5 KSE syscalls. Two will be implemented with the next KSE
step and the others are reservations for coming code.
All will be stubbed in this kernel in the next commit.
This will allow people to easily make KSE binaries for userland testing
(the syscalls will be in libc) but they will still need a real KSE kernel
to test it. (libc looks in /sys to decide what it should add stubs for).
2002-02-19 02:19:36 +00:00
Matthew Dillon
3e1ce344ba Load the current timecounter into tc. The timecounter global can change
at any time and we do not want to call one timercounter's function with
another timecounter's structural pointer.

MFC after:	3 days
2002-02-18 19:49:30 +00:00
Matthew Dillon
735da6de88 Add kern_giant_ucred to instrument Giant around ucred related operations
such a getgid(), setgid(), etc...
2002-02-18 17:51:47 +00:00
Poul-Henning Kamp
68edc1b939 Make v_addpollinfo() visible and non-inline.
Have callers only call it as needed.
Add necessary call in ufs_kqfilter().

Test-case found by:	Andrew Gallatin <gallatin@cs.duke.edu>
2002-02-18 16:18:02 +00:00
Robert Watson
b541b65d91 Rehash of 1.43: simply remove the comment, since it's highly redundant
and only partially correct.
2002-02-18 16:02:24 +00:00
Ian Dowse
b01bcf4c74 Add the braces missed by revision 1.131.
Pointy hat to:	rwatson
2002-02-18 12:46:18 +00:00
Poul-Henning Kamp
21dcdb38e1 Take the common case of gettimeofday(&tv, NULL) out from under Giant. 2002-02-18 08:40:28 +00:00
Poul-Henning Kamp
90737495aa Remove yet a redundant VN_KNOTE() macro. 2002-02-18 08:24:48 +00:00
Matthew Dillon
5638baf0c6 The ICANON flag is an lflag, not an iflag.
Submitted by:	Neelkanth Natu <neelnatu@yahoo.com>
MFC after:	3 days
2002-02-18 06:07:11 +00:00
Robert Watson
4729fbd85f When vn_open() is failing because it cannot allocate a vm object, call
VOP_CLOSE() on the vnode, so that VOP_OPEN() and VOP_CLOSE() calls
are symmetric in all failure cases.  This prevents an 'open' reference
from being leaked in that unlikely failure scenario.
2002-02-18 00:26:10 +00:00
Robert Watson
3056874a81 style(9) prefers formatted comments in '/*' ... '*/' as opposed to
#if 0'd.
2002-02-18 00:23:44 +00:00
Robert Watson
eae1306746 Per discussion at BSDCon, note that the vop_getattr locking protocol
should require a shared lock, rather than an exclusive lock, which can
improve performance.  No actual code change here, since a number of
VFS locking fixes are in the works.
2002-02-18 00:22:57 +00:00
Poul-Henning Kamp
4b55dbe36b Move the stuff related to select and poll out of struct vnode.
The use of the zone allocator may or may not be overkill.
There is an XXX: over in ufs/ufs/ufs_vnops.c that jlemon may need
to revisit.

This shaves about 60 bytes of struct vnode which on my laptop means
600k less RAM used for vnodes.
2002-02-17 21:15:36 +00:00
Poul-Henning Kamp
362912ebcc Remove cache_purgeleafdirs(), it has been #if 0 for quite some time. 2002-02-17 20:40:29 +00:00
Daniel Eischen
1e599eee20 Regenerate these files after change to syscalls.master. 2002-02-17 17:42:47 +00:00
Daniel Eischen
bc874287e9 Fix prototype to sigreturn to use struct __ucontext instead of ucontext_t. 2002-02-17 17:41:28 +00:00
Matthew Dillon
e1bca29fae replace the embedded cr_mtx in the ucred structure with cr_mtxp (a mutex
pointer), and use the mutex pool routines.  This greatly reduces the size
of the ucred structure.
2002-02-17 07:30:34 +00:00
Julian Elischer
2eb927e2bb If the credential on an incoming thread is correct, don't bother
reaquiring it. In the same vein, don't bother dropping the thread cred
when goinf ot userland. We are guaranteed to nned it when we come back,
(which we are guaranteed to do).

Reviewed by:	jhb@freebsd.org, bde@freebsd.org (slightly different version)
2002-02-17 01:09:56 +00:00
Brian Feldman
1b56782026 (Doing that whole test-immediately-after-commit-thing like obrien sez:)
Forgot to include lock.h and mutex.h for GIANT_REQUIRED.
2002-02-16 17:44:43 +00:00
Brian Feldman
1fd9f8f438 Add revoke_and_destroy_dev(), to be used by devices which decide when
they choose to destroy themselves without regard to whether or not
they are open.
2002-02-16 17:35:05 +00:00
Bruce Evans
8c3d74f4bf Fixed a typo in rev.1.65 that gave a reference to a nonexistent variable.
This was not detected by LINT because LINT is missing COMPAT_SUNOS.
2002-02-15 03:54:01 +00:00
Luigi Rizzo
e522304423 Make this compile after changes to kse structures.
This escaped because DEVICE_POLLING is disabled in LINT being
not compatible with SMP. In fact, it is only a runtime problem,
so if we could recognize that we are building a LINT kernel
we could as well disable the check for SMP being defined.

Reported-by: Joe Clarke
2002-02-15 02:50:07 +00:00
Alan Cox
9fbd7ccf00 o Clearing p/td_retval[0] after aio_newproc() is unnecessary. (We stopped
calling rfork() to create aio threads in revision 1.46.)
 o Don't recompute the FILE * when it's already stored in the kernel's AIOCB.
2002-02-12 17:40:41 +00:00
Alan Cox
96347d1e6d The previous commit included a change to fill_kinfo_proc() that results
in a NULL pointer dereference.  Repair this mistake.
2002-02-12 04:21:28 +00:00
Luigi Rizzo
daccb6386b MFS: synchronize the code with the version in -stable, specifically:
+ SYSCTL_ULONG -> SYSCTL_UINT
 + some procedure renaming and variable rearrangement
 + fix the 'interface going deaf' problem same as in -stable.
2002-02-11 23:56:18 +00:00
Julian Elischer
2c1007663f In a threaded world, differnt priorirites become properties of
different entities.  Make it so.

Reviewed by:	jhb@freebsd.org (john baldwin)
2002-02-11 20:37:54 +00:00
David E. O'Brien
952539e39a Allow one to specify the AWK used in the environment(commandline).
Gawk is blowing up when run natively on the sparc64 -- leading to totally
bogus kernel values (all "0x0").  Good ole BWK awk works fine however.
2002-02-11 03:54:30 +00:00
Poul-Henning Kamp
d9888e41d5 GC the unused einval()
Obtained from:	~bde/sys.dif.gz
2002-02-10 22:07:41 +00:00
Poul-Henning Kamp
58a24f7938 Style(9) nits.
Obtained from:	~bde/sys.dif.gz
2002-02-10 22:04:44 +00:00
Robert Watson
1745909176 Add a comment indicating that the locking protocol should be updated
to be 'L L L' for vop_getattr().  Don't update it yet, because there
are still many offenders.
2002-02-10 21:46:16 +00:00
Robert Watson
5da271f5a6 Add a comment indicating that VOP_GETATTR() is called without appropriate
locking in the core dump code.  This should be fixed.
2002-02-10 21:45:16 +00:00
Robert Watson
1ea030d8fe Make sure to hold vnode lock when calling into VOP_GETATTR().
Discussed with:	mckusick, phk
2002-02-10 21:44:30 +00:00
Robert Watson
894c9fe04e Add a comment indicating that the vnode locking in this section of the
kernel linker code may be wrong: it fails to hold a lock across the
call to VOP_GETATTR(), and vn_rdwr() with IO_NODELOCKED.
2002-02-10 21:29:02 +00:00
Robert Watson
c0a9dc83c8 Make sure to grab vnode lock on a vnode before calling VOP_GETATTR()
to perform an ownership test in revoke().  This is also required for
MAC hooks so that the vnode lock is held during a call to the MAC
framework.  Release the lock before calling VOP_REVOKE().

Discussed with:	phk, mckusick
2002-02-10 20:45:43 +00:00
Robert Watson
56e04d01c0 Remove a stray 'const' that slept into extattr_set_vp(), and could
result in compiler warnings.
2002-02-10 05:31:55 +00:00
Robert Watson
1aa1d02a98 Part II: Update system calls for extended attributes. Rebuild of
generated files.
2002-02-10 04:44:37 +00:00
Robert Watson
74237f55b0 Part I: Update extended attribute API and ABI:
o Modify the system call syntax for extattr_{get,set}_{fd,file}() so
  as not to use the scatter gather API (which appeared not to be used
  by any consumers, and be less portable), rather, accepts 'data'
  and 'nbytes' in the style of other simple read/write interfaces.
  This changes the API and ABI.

o Modify system call semantics so that extattr_get_{fd,file}() return
  a size_t.  When performing a read, the number of bytes read will
  be returned, unless the data pointer is NULL, in which case the
  number of bytes of data are returned.  This changes the API only.

o Modify the VOP_GETEXTATTR() vnode operation to accept a *size_t
  argument so as to return the size, if desirable.  If set to NULL,
  the size will not be returned.

o Update various filesystems (pseodofs, ufs) to DTRT.

These changes should make extended attributes more useful and more
portable.  More commits to rebuild the system call files, as well
as update userland utilities to follow.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-02-10 04:43:22 +00:00
Julian Elischer
237a8a02da Replace accidentally removed setrunqueue()
solves problem with machines failing to sync in booting.
Submitted by: Tor.Egge@cvsup.no.freebsd.org
2002-02-09 01:38:16 +00:00
John Baldwin
18fc2ba9ff Use the mtx_owner() macro in one spot in _mtx_lock_sleep() to make the
code easier to read.
2002-02-09 00:12:53 +00:00
Thomas Moestl
2333d112fb Fix a bug introduced in r. 1.28: when copy{in,out} would fail for an
iovec that was not the last one in the uio, the error would be ignored
silently.

Bug found and fix proposed by:	jhb
2002-02-08 20:19:44 +00:00
Peter Wemm
1037bbb195 Fix broken Giant locking protocol introduced in rev 1.114. You cannot
unlock Giant if it is not locked in the first place.  This make the
nfstat(2) syscall (#278) a nice panic(2) implementation.
2002-02-08 09:16:57 +00:00
Peter Wemm
fe0d0493ac Bah, I managed to turn cosmetic things into real bugs. Fix shadowed
variable declarations. :-(  Definately not my day today.
2002-02-08 08:56:01 +00:00
Robert Watson
143bb598d0 o Merge various recent fixes from the MAC branch relating to extattrctl():
- Fix null-pointer dereference introduced when snapshotting
	  was introduced.  This occured because unlike the previous code,
	  vn_start_write() doesn't always return a non-NULL mp, as
	  filesystems may not support the VOP_GETWRITEMOUNT() call.  For
	  now, rely on two pointers, so that vn_finished_write() works
	  properly.
	- Fix locking problems on exit, introduced at some past time,
	  some when snapshots came in, where a vnode might not be
	  unlocked before being vrele'd in various error situations.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-02-08 05:58:41 +00:00
Peter Wemm
de9ac44a24 Fix a fatal trap when using ksched_setscheduler() (eg: mozilla, netscape
etc) which use:  td->td_last_kse->ke_flags |= KEF_NEEDRESCHED;
2002-02-08 02:56:10 +00:00
Julian Elischer
045e854101 remove superfluous blank line 2002-02-08 01:38:32 +00:00
Peter Wemm
2b8a08af6b Fix a couple of style bugs introduced (or touched by) previous commit. 2002-02-07 23:06:26 +00:00
Peter Wemm
2d008b444d Fix a whole bunch of long lines introduced by previous commit by using
td = FIRST_THREAD_IN_PROC(p) once, after we have identified the process
that we are operating on.
2002-02-07 23:05:40 +00:00
Poul-Henning Kamp
2028c0cdb9 Revise timercounters to use binary fixed point format internally.
The binary format "bintime" is a 32.64 format, it will go to 64.64
when time_t does.

The bintime format is available to consumers of time in the kernel,
and is preferable where timeintervals needs to be accumulated.

This change simplifies much of the magic math inside the timecounters
and improves the frequency and time precision by a couple of bits.

I have not been able to measure a performance difference which was not
a tiny fraction of the standard deviation on the measurements.
2002-02-07 21:21:55 +00:00
Julian Elischer
079b7badea Pre-KSE/M3 commit.
this is a low-functionality change that changes the kernel to access the main
thread of a process via the linked list of threads rather than
assuming that it is embedded in the process. It IS still embeded there
but remove all teh code that assumes that in preparation for the next commit
which will actually move it out.

Reviewed by: peter@freebsd.org, gallatin@cs.duke.edu, benno rice,
2002-02-07 20:58:47 +00:00
John Baldwin
78a1485fd1 Fixes for alpha pmap on SMP machines:
- Create a private list of active pmaps rather than abusing the list of all
  processes when we need to look up pmaps.  The process list needs a sx lock
  and we can't be getting sx locks in the middle of cpu_switch()
  (pmap_activate() can call pmap_get_asn() from cpu_switch()).  Instead, we
  protect the list with a spinlock.  This also means the list is shorter
  since a pmap can be used by more than one process and we could (at least
  in thoery) dink with pmap's more than once, but now we only touch each
  pmap once when we have to update all of them.
- Wrap pmap_activate()'s code to get a new ASN in an explicit critical section
  so that when it is called while doing an exec() we can't get preempted.
- Replace splhigh() in pmap_growkernel() with a critical section to prevent
  preemption while we are adjusting the kernel page tables.
- Fixes abuse of PCPU_GET(), which doesn't return an L-value.
- Also adds some slight cleanups to the ASN handling by adding some macros
  instead of magic numbers in relation to the ASN and ASN generations.

Reviewed by:	dfr
2002-02-06 04:30:26 +00:00
Matthew Dillon
0b94a0e9f9 Allow the kern.maxusers boot tuneable to be set to 0 (previously only
the kernel config's maxusers could be set to 0 for autosizing to work).
Reviewed by:	rwatson, imp
MFC after:	3 days
2002-02-06 01:19:19 +00:00
Alfred Perlstein
582ec34cd8 Fix a race with free'ing vmspaces at process exit when vmspaces are
shared.

Also introduce vm_endcopy instead of using pointer tricks when
initializing new vmspaces.

The race occured because of how the reference was utilized:
  test vmspace reference,
  possibly block,
  decrement reference

When sharing a vmspace between multiple processes it was possible
for two processes exiting at the same time to test the reference
count, possibly block and neither one free because they wouldn't
see the other's update.

Submitted by: green
2002-02-05 21:23:05 +00:00
Poul-Henning Kamp
a305896436 Let the number of timecounters follow hz, otherwise people with
HZ=BIGNUM will strain the assumptions behind timecounters to the
point where they break.

This may or may not help people seeing microuptime() backwards messages.

Make the global timecounter variable volatile, it makes no difference in
the code GCC generates, but it makes represents the intent correctly.

Thanks to:	jdp
MFC after:	2 weeks
2002-02-05 20:44:56 +00:00
Matthew Dillon
ecde8f7c29 Get rid of the twisted MFREE() macro entirely.
Reviewed by:	dg, bmilekic
MFC after:	3 days
2002-02-05 02:00:56 +00:00
Robert Watson
4e1123c738 o Scatter vn_start_write() and vn_finished_write() through ACL code so
that it interacts properly with snapshotting.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-02-04 17:58:15 +00:00
Robert Watson
eccbb13cb5 Note that Kirk apparently missed adding vn_start_write() and friends
to kern_acl.c when he added snapshotting.  This will need to be added
at some point.
2002-02-04 16:41:59 +00:00
Kirk McKusick
64011154e5 In the routines vrele() and vput(), we must lock the vnode and
call VOP_INACTIVE before placing the vnode back on the free list.
Otherwise there is a race condition on SMP machines between
getnewvnode() locking the vnode to reclaim it and vrele()
locking the vnode to inactivate it. This window of vulnerability
becomes exaggerated in the presence of filesystems that have
been suspended as the inactive routine may need to temporarily
release the lock on the vnode to avoid deadlock with the syncer
process.
2002-02-02 01:49:18 +00:00
Alfred Perlstein
3865fa138b Remove bogus assertion in dup2 that can lead to panics when kernel
threads race for a file slot.

dup2(2) incorrectly assumes that if it needs to grow the ofiles
array that it will get what it wants.  This assertion was valid
before we allowed shared filedescriptor tables but is now incorrect.

The assertion can trigger superfolous panics if the thread doing a
dup2 looses a race with another thread while possibly blocked in
the MALLOC call in fdalloc.  Another thread may grab the slot we
are requesting which makes fdalloc return something other than what
we asked for, this will triggering the bogus assertion.

MFC after: 2 weeks
Reviewed by: phk
2002-02-01 19:25:36 +00:00
Alfred Perlstein
2b39743941 Avoid lock order reversal filedesc/Giant when calling FREE() in fdalloc
by unlocking the filedesc before calling FREE().

Submitted by: bde
2002-02-01 19:19:54 +00:00
Alfred Perlstein
b7184973ed Don't recurse on filedesc lock in chroot_refuse_vdir_fds().
Noticed by: Michael Nottebrock <michaelnottebrock@gmx.net>
2002-02-01 18:27:16 +00:00
Bruce Evans
b6fe6a5d88 Regenerate to make osigreturn standard. 2002-02-01 17:41:45 +00:00
Bruce Evans
860965f144 Made osigreturn(2) standard so that SYS_osigreturn can be used in the
signal trampoline for old signals.  The arches that support old signals
currently abuse sigreturn(2) instead.  This mainly complicates things
and slightly breaks the the new sigreturn(2).

COMPAT is too limited to support the correct configuration of osigreturn,
and this commit doesn't attempt to fix it; it just moves the bogusness:
osigreturn() must now be provided unconditionally even on arches that
don't really need it; previously it had to be provided under the bogus
condition defined(COMPAT_43).
2002-02-01 17:27:14 +00:00
Matthew Dillon
027df6bdd7 GC P_BUFEXHAUST leftovers, we've had a new mechanism to avoid buffer
cache lockups for over a year now.

MFC after:		0 days
2002-01-31 18:39:44 +00:00
Alfred Perlstein
4658f926c0 Remove unused variables in select(2) from previous delta.
Pointed out by: bde
2002-01-30 19:48:25 +00:00
Bruce Evans
9a7f62c577 Oops, fix previous commit to not generate a C comment in syscall.mk. 2002-01-30 15:12:12 +00:00
Bruce Evans
581cad5a9c Regenerate _after_ the commit to syscalls.master. 2002-01-30 10:29:12 +00:00
Bruce Evans
0878983ab4 Escape $FreeBSD$ in a different way to avoid using the bogus escapes \$
and \F.  Awk just started warning about these.
2002-01-30 10:22:05 +00:00
Alfred Perlstein
eb20931127 Attempt to fixup select(2) and poll(2), this should fix some races with
other threads as well as speed up the interfaces.

To fix the race and accomplish the speedup, remove selholddrop and
pollholddrop.  The entire concept is somewhat bogus because holding
the individual struct file pointers offers us no guarantees that
another thread context won't close it on us thereby removing our
access to our own reference.

Selholddrop and pollholddrop also would do multiple locks and unlocks
of mutexes _per-file_ in the fd arrays to be scanned, this needed to
be sped up.

Instead of using selholddrop and pollholddrop, simply hold the
filedesc lock over the selscan and pollscan functions.  This should
protect us against close(2)'s on the files as reduce the multiple
lock/unlock pairs per fd into a single lock over the filedesc.
2002-01-29 22:54:19 +00:00
Alfred Perlstein
5980a85f08 Backout 1.120, EINVAL isn't a proper error return when the passed fd is
negative, the 'pointer' referred to by the manpage is actually the
struct file's f_offset field.

Pointed out by: bde
2002-01-29 17:12:10 +00:00
Poul-Henning Kamp
05a2f79888 Be more conservative about interrupt latency, it aint getting better it seems. 2002-01-25 21:22:34 +00:00
Poul-Henning Kamp
3e72822404 Make st_blksize default to PAGE_SIZE instead of zero. 2002-01-25 16:39:57 +00:00
Matthew Dillon
4fbd563eb8 Make the 'maxusers 0' auto-sizing code slightly more conservative. Change
from 1 megabyte of ram per user to 2 megabytes of ram per user, and
reduce the cap from 512 to 384.  512 leaves around 240 MB of KVM available
while 384 leaves 270 MB of KVM available.  Available KVM is important
in order to deal with zalloc and kernel malloc area growth.

Reviewed by:	mckusick
MFC: either before 4.5 if re's agree, or after 4.5
2002-01-25 01:54:16 +00:00
Poul-Henning Kamp
9118ec5a27 Yet a bug with extensible sbufs being marked as OVERFLOWED. This time
because of a signed/unsigned problem.

Approved by:	DES
2002-01-24 20:57:56 +00:00
Jonathan Lemon
cd75bfa75f Add entry for EVFILT_NETDEV, which was inadverdently omitted back in Sept. 2002-01-24 17:20:55 +00:00
Alfred Perlstein
095f670d4e in fget() return EINVAL when the descriptor requested is negative. 2002-01-23 08:40:35 +00:00
Alfred Perlstein
97fa4397d3 make pread use fget_read instead of holdfp. 2002-01-23 08:22:59 +00:00
David Greenman
7228268aaa Fixed bug in calculation of amount of file to send when nbytes !=0 and
headers or trailers are supplied. Reported by Vladislav Shabanov
<vs@rambler-co.ru>.

PR:		33771
Submitted by:	Maxim Konovalov <maxim@macomnet.ru>
MFC after:	3 days
2002-01-22 17:32:10 +00:00
Poul-Henning Kamp
1a25c86b3b In certain cases sbuf_printf() and sbuf_vprintf() could mistakely
make extendable sbufs as overflowed.

Approved by:	des
2002-01-22 11:22:55 +00:00