Commit Graph

7494 Commits

Author SHA1 Message Date
Poul-Henning Kamp
3dfe213e61 Convert the vfsconf list to a TAILQ.
Introduce vfs_byname() function to find things on it.

Staticize vfs_nmount() function under the name vfs_donmount().

Various cleanups.
2004-07-27 22:32:01 +00:00
Robert Watson
1a8cfbc450 Pass a thread argument into cpu_critical_{enter,exit}() rather than
dereference curthread.  It is called only from critical_{enter,exit}(),
which already dereferences curthread.  This doesn't seem to affect SMP
performance in my benchmarks, but improves MySQL transaction throughput
by about 1% on UP on my Xeon.

Head nodding:	jhb, bmilekic
2004-07-27 16:41:01 +00:00
Robert Watson
a9abdce44a Add "options ADAPTIVE_GIANT" which causes Giant to also be treated in
an adaptive fashion when adaptive mutexes are enabled.  The theory
behind non-adaptive Giant is that Giant will be held for long periods
of time, and therefore spinning waiting on it is wasteful.  However,
in MySQL benchmarks which are relatively Giant-free, running Giant
adaptive makes an observable difference on SMP (5% transaction rate
improvement).  As such, make adaptive behavior on Giant an option so
it can be more widely benchmarked.
2004-07-27 16:34:48 +00:00
Alan Cox
1a276a3f91 - Use atomic ops for updating the vmspace's refcnt and exitingcnt.
- Push down Giant into shmexit().  (Giant is acquired only if the vmspace
   contains shm segments.)
 - Eliminate the acquisition of Giant from proc_rwmem().
 - Reduce the scope of Giant in exit1(), uncovering the destruction of the
   address space.
2004-07-27 03:53:41 +00:00
Bosko Milekic
0047b9a96a Move the schedlock owner state update following the context
switch in fork_exit() to before anything else is done (but keep
schedlock for the deadthread check).  This means one less
nasty bug if ever in the future whatever might have been called
before the update played with schedlock or critical sections.

Discussed with: tjr
2004-07-27 03:46:31 +00:00
Colin Percival
66d5c640fa In revision 1.228, I accidentally broke the "total number of processes in
the system" resource limit code: When checking if the caller has superuser
privileges, we should be checking the *real* user, not the *effective*
user.  (In general, resource limiting is done based on the real user, in
order to avoid resource-exhaustion-by-setuid-program attacks.)

Now that a SUSER_RUID flag to suser_cred exists, use it here to return
this code to its correct behaviour.

Pointed out by:	rwatson
2004-07-26 07:54:39 +00:00
Colin Percival
56f21b9d74 Rename suser_cred()'s PRISON_ROOT flag to SUSER_ALLOWJAIL. This is
somewhat clearer, but more importantly allows for a consistent naming
scheme for suser_cred flags.

The old name is still defined, but will be removed in a few days (unless I
hear any complaints...)

Discussed with:	rwatson, scottl
Requested by:	jhb
2004-07-26 07:24:04 +00:00
Robert Watson
feb9bd18c6 Revert modification of subr_turnstile.c accidentally included in the
last commit; this assertion was provided by jhb for local debugging
and not intended for broader consumption.
2004-07-25 23:32:32 +00:00
Robert Watson
fd179ee91d In uipc_connect(), assert that the passed thread is curthread, and pass
td into unp_connect() instead of reading curthread.
2004-07-25 23:30:43 +00:00
Robert Watson
99901d0afb Do some initial locking on accept filter registration and attach. While
here, close some races that existed in the pre-locking world during low
memory conditions.  This locking isn't perfect, but it's closer than
before.
2004-07-25 23:29:47 +00:00
Poul-Henning Kamp
cf95b5c381 Eliminate unused second argument to reassignbuf() and simplify it
accordingly.
2004-07-25 21:24:23 +00:00
Robert Watson
3ed994c6c3 Add netatalk mutexes to hard-coded WITNESS lock order. 2004-07-25 20:16:51 +00:00
Warner Losh
4411688509 Expand the generic, but bogusly formed, copyright notice to include
the license from /usr/src/COPYRIGHT.  Since cvs annotate shows that
this was written by jasone, julian, jhb, peter, bmilekic and obrien.
cvs log shows that many others may have contributed to this file.  As
such, go ahead and use the author of 'FreeBSD Project' for this file.
If this is a problem, please notify me.

# this eliminates the last file in the kernel with an indirect reference
# to /usr/src/COPYRIGHT in the kernel.  A few more in userland remain.
2004-07-25 19:49:01 +00:00
Poul-Henning Kamp
a3d57cfbfd Neuter this warning for now, I think I know the remaining issues. 2004-07-25 08:09:21 +00:00
Julian Elischer
aa3c8c02ae White space fix..
diff reduction for upcoming commit.
2004-07-24 04:57:41 +00:00
Scott Long
e038d35422 Clean up whitespace, increase consistency and correctness.
Submitted by: bde
2004-07-23 23:09:00 +00:00
Robert Watson
ff381670df Don't include a "\n" in KTR output, it confuses automatic parsing. 2004-07-23 20:12:56 +00:00
Scott Long
18f480f8f6 Remove the previous hack since it doesn't make a difference and is getting
in the way of debugging.
2004-07-23 19:59:16 +00:00
Alan Cox
b332cea583 Use kmem_alloc_nofault() rather than kmem_alloc_pageable() for allocating
KVA for explicitly managed mappings, i.e., mappings created with
pmap_qenter().
2004-07-23 19:36:18 +00:00
Robert Watson
4da86f8826 Export KTR_COMPILE as a sysctl so you can easily check from user space
what event mask has been compiled into the kernel.
2004-07-23 17:41:44 +00:00
Robert Watson
46b25cb5f6 Don't perform pipe endpoint locking during pipe_create(), as the pipe
can't yet be referenced by other threads.

In microbenchmarks, this appears to reduce the cost of
pipe();close();close() on UP by 10%, and SMP by 7%.  The vast majority
of the cost of allocating a pipe remains VM magic.

Suggested by:	silby
2004-07-23 14:11:04 +00:00
Robert Watson
71a057bc73 In setpgid(), since td is passed in as a system call argument, use it
in preference to curthread, which costs slightly more.
2004-07-23 04:26:49 +00:00
Robert Watson
a6719c82b1 Push Giant acquisition down into fo_stat() from most callers. Acquire
Giant conditional on debug.mpsafenet in the socket soo_stat() routine,
unconditionally in vn_statfile() for VFS, and otherwise don't acquire
Giant.  Accept an unlocked read in kqueue_stat(), and cryptof_stat() is
a no-op.  Don't acquire Giant in fstat() system call.

Note: in fdescfs, fo_stat() is called while holding Giant due to the VFS
stack sitting on top, and therefore there will still be Giant recursion
in this case.
2004-07-22 20:40:23 +00:00
Robert Watson
1c1ce9253f Push acquisition of Giant from fdrop_closed() into fo_close() so that
individual file object implementations can optionally acquire Giant if
they require it:

- soo_close(): depends on debug.mpsafenet
- pipe_close(): Giant not acquired
- kqueue_close(): Giant required
- vn_close(): Giant required
- cryptof_close(): Giant required (conservative)

Notes:

  Giant is still acquired in close() even when closing MPSAFE objects
  due to kqueue requiring Giant in the calling closef() code.
  Microbenchmarks indicate that this removal of Giant cuts 3%-3% off
  of pipe create/destroy pairs from user space with SMP compiled into
  the kernel.

  The cryptodev and opencrypto code appears MPSAFE, but I'm unable to
  test it extensively and so have left Giant over fo_close().  It can
  probably be removed given some testing and review.
2004-07-22 18:35:43 +00:00
Robert Watson
df04411ac4 suser() accepts a thread argument; as suser() dereferences td_ucred, a
thread-local pointer, in practice that thread needs to be curthread.  If
we're running with INVARIANTS, generate a warning if not.  If we have
KDB compiled in, generate a stack trace.  This doesn't fire at all in my
local test environment, but could be irritating if it fires frequently
for someone, so there will be motivation to fix things quickly when it
does.
2004-07-22 17:05:04 +00:00
Scott Long
9493183e77 Disable the PREEMPTION-enabled code in critical_exit() that encourages
switching to a different thread.  This is just a hack to try to improve
stability some more, but likely points closer to the real culprit.
2004-07-22 14:32:48 +00:00
Bosko Milekic
01e9ccbd9c Back out just a portion of Alfred's last commit. Remove the MBUF_CHECK
(WITNESS) for code paths that always call uma_zalloc_arg() shortly
after where the check was, because uma_zalloc_arg() already does
a similar check.

No objections from Alfred.  Thanks Alfred.
2004-07-21 21:03:01 +00:00
Robert Watson
46e38ce826 Don't sync the file system on panic by default. This seems to basically
work very infrequently, and often results in a compound panic which
confuses debugging; locking/SMP have made the layering violation (and
risks) of this more obvious over time.

Discussed with:	green, bde, et al.
2004-07-21 16:04:46 +00:00
Alfred Perlstein
05656b6e2b put several of the options for DEBUG_VFS_LOCKS under control of sysctls. 2004-07-21 07:13:14 +00:00
Alfred Perlstein
063d811465 Make sure we don't call mbuf allocation functions with mutexes held.
Discussed with: rwatson
2004-07-21 07:12:24 +00:00
Marcel Moolenaar
3d4f313695 Add kdb_thr_from_pid(), which given a PID returns the first thread
in the process. This is useful when working from or with a process.
2004-07-21 04:49:48 +00:00
Mike Silbersack
eb3d2c61b4 Fix a minor error in pipe_stat - st_size was always reported as 0
when direct writes kicked in.  Whether this affected any applications
is unknown.
2004-07-20 07:06:43 +00:00
Peter Wemm
b09cb1027b #ifdef __i386__ -> __i386__ || __amd64__ 2004-07-20 02:15:10 +00:00
Julian Elischer
3a63b92c12 You always spot the typos after you have committed.. Start sentence
with a Cap.
2004-07-19 18:06:12 +00:00
Julian Elischer
f6449d9d31 Allow the user who calls doadump() from the kernel debugger
to not get a page fault if he has not defined a dump device.
Panic can often not do a dump as it can hang forever in some cases.
 The original PR was for amd64 only. This is a generalised version of
that change.

PR:		amd64/67712
Submitted by:	wjw@withagen.nl <Willen Jan Withagen>
2004-07-19 18:03:02 +00:00
Brian Feldman
4362fada8f Reimplement contigmalloc(9) with an algorithm which stands a greatly-
improved chance of working despite pressure from running programs.
Instead of trying to throw a bunch of pages out to swap and hope for
the best, only a range that can potentially fulfill contigmalloc(9)'s
request will have its contents paged out (potentially, not forcibly)
at a time.

The new contigmalloc operation still operates in three passes, but it
could potentially be tuned to more or less.  The first pass only looks
at pages in the cache and free pages, so they would be thrown out
without having to block.  If this is not enough, the subsequent passes
page out any unwired memory.  To combat memory pressure refragmenting
the section of memory being laundered, each page is removed from the
systems' free memory queue once it has been freed so that blocking
later doesn't cause the memory laundered so far to get reallocated.

The page-out operations are now blocking, as it would make little sense
to try to push out a page, then get its status immediately afterward
to remove it from the available free pages queue, if it's unlikely to
have been freed.  Another change is that if KVA allocation fails, the
allocated memory segment will be freed and not leaked.

There is a sysctl/tunable, defaulting to on, which causes the old
contigmalloc() algorithm to be used.  Nonetheless, I have been using
vm.old_contigmalloc=0 for over a month.  It is safe to switch at
run-time to see the difference it makes.

A new interface has been used which does not require mapping the
allocated pages into KVA: vm_page.h functions vm_page_alloc_contig()
and vm_page_release_contig().  These are what vm.old_contigmalloc=0
uses internally, so the sysctl/tunable does not affect their operation.

When using the contigmalloc(9) and contigfree(9) interfaces, memory
is now tracked with malloc(9) stats.  Several functions have been
exported from kern_malloc.c to allow other subsystems to use these
statistics, as well.  This invalidates the BUGS section of the
contigmalloc(9) manpage.
2004-07-19 06:21:27 +00:00
Julian Elischer
55d44f79ea When calling scheduler entrypoints for creating new threads and processes,
specify "us" as the thread not the process/ksegrp/kse.
You can always find the others from the thread but the converse is not true.
Theorotically this would lead to runtime being allocated to the wrong
entity in some cases though it is not clear how often this actually happenned.
(would only affect threaded processes and would probably be pretty benign,
but it WAS a bug..)

Reviewed by: peter
2004-07-18 23:36:13 +00:00
Pawel Jakub Dawidek
ece2d9891e Now we have NO_ADAPTIVE_MUTEXES option, so use it here too.
Missed by:	scottl
2004-07-18 23:27:14 +00:00
Marcel Moolenaar
1f7a1baa37 After maintaining previous behaviour in writing out the core notes, it's
time now to break with the past: do not write the PID in the first note.
Rationale:
1.  [impact of the breakage] Process IDs in core files serve no immediate
    purpose to the debugger itself. They are only useful to relate a core
    file to a process. This can provide context to the person looking at
    the core file, provided one keeps track of this. Overall, not having
    the PID in the core file is only in very rare occasions unfortunate.
2.  [reason of the breakage] Having one PRSTATUS note contain the PID,
    while all others contain the LWPID of the corresponding kernel thread
    creates an irregularity for the debugger that cannot easily be worked
    around. This is caused by libthread_db correlating user thread IDs to
    kernel thread (aka LWP) IDs and thus aware of the actual LWPIDs.

Update comments accordingly.
2004-07-18 20:28:07 +00:00
David Malone
cdb71f7526 The recent changes to control message passing broke some things
that get certain types of control messages (ping6 and rtsol are
examples). This gets the new code closer to working:

	1) Collect control mbufs for processing in the controlp ==
	NULL case, so that they can be freed by externalize.

	2) Loop over the list of control mbufs, as the externalize
	function may not know how to deal with chains.

	3) In the case where there is no externalize function,
	remember to add the control mbuf to the controlp list so
	that it will be returned.

	4) After adding stuff to the controlp list, walk to the
	end of the list of stuff that was added, incase we added
	a chain.

This code can be further improved, but this is enough to get most
things working again.

Reviewed by:	rwatson
2004-07-18 19:10:36 +00:00
Doug Rabson
4c4392e791 Add doxygen doc comments for most of newbus and the BUS interface. 2004-07-18 16:30:31 +00:00
Scott Long
701f140800 Enable ADAPTIVE_MUTEXES by default by changing the sense of the option to
NO_ADAPTIVE_MUTEXES.  This option has been enabled by default on amd64 for
quite some time, and has been extensively tested on i386 and sparc64.  It
shows measurable performance gains in many circumstances, and few negative
effects.  It would be nice in t he future if adaptive mutexes actually went
to sleep after a certain amount of spinning, but that will require quite a
bit more testing.
2004-07-18 15:59:03 +00:00
Alan Cox
d8582da660 Remove GIANT_REQUIRED from vmapbuf(). 2004-07-18 04:57:49 +00:00
Robert Watson
2260c03d77 Drop Giant and acquire the UNIX domain socket subsystem lock a bit
earlier in unp_connect() so that vp->v_socket can't change between
our copying its value to a local variable and later use of that
variable.  This may have been responsible for a panic during
shutdown that I experienced where simultaneous closing of a listen
socket by rpcbind and a new connection being made to rpcbind by
mountd.
2004-07-18 01:29:43 +00:00
David Xu
c3d88cbab8 Fix typo. 2004-07-17 23:15:41 +00:00
David Malone
e140eb430c Add a kern_setsockopt and kern_getsockopt which can read the option
values from either user land or from the kernel. Use them for
[gs]etsockopt and to clean up some calls to [gs]etsockopt in the
Linux emulation code that uses the stackgap.
2004-07-17 21:06:36 +00:00
John Baldwin
52eb84641d - Move TDF_OWEPREEMPT, TDF_OWEUPC, and TDF_USTATCLOCK over to td_pflags
since they are only accessed by curthread and thus do not need any
  locking.
- Move pr_addr and pr_ticks out of struct uprof (which is per-process)
  and directly into struct thread as td_profil_addr and td_profil_ticks
  as these variables are really per-thread.  (They are used to defer an
  addupc_intr() that was too "hard" until ast()).
2004-07-16 21:04:55 +00:00
John Baldwin
d3373e371b Whitespace fix. 2004-07-16 21:01:52 +00:00
John Baldwin
6dbc085016 Improve readability a bit by changing some code at the end of a function
that did:

	if (foo)
		return
	else
		blah

to just do the simpler

	if (!foo)
		blah

instead.
2004-07-16 21:00:50 +00:00
Colin Percival
24283cc01b Add a SUSER_RUID flag to suser_cred. This flag indicates that we want to
check if the *real* user is the superuser (vs. the normal behaviour, which
checks the effective user).

Reviewed by:	rwatson
2004-07-16 15:57:16 +00:00