Commit Graph

8295 Commits

Author SHA1 Message Date
Robert Watson
35a196154f The SO_NOSIGPIPE socket option allows a user process to mark a socket
so that the socket does not generate SIGPIPE, only EPIPE, when a write
is attempted after socket shutdown.  When the option was introduced in
2002, this required the logic for determining whether SIGPIPE was
generated to be pushed down from dofilewrite() to the socket layer so
that the socket options could be considered.  However, the change in
2002 omitted modification to soo_write() required to add that logic,
resulting in SIGPIPE not being generated even without SO_NOSIGPIPE when
the socket was written to using write() or related generic system calls.

This change adds the EPIPE logic to soo_write(), generating a SIGPIPE
signal to the process associated with the passed uio in the event that
the SO_NOSIGPIPE option is not set.

Notes:

- The are upsides and downsides to placing this logic in the socket
  layer as opposed to the file descriptor layer.  This is really fd
  layer logic, but because we need so_options, we have a choice of
  layering violations and pick this one.

- SIGPIPE possibly should be delivered to the thread performing the
  write, not the process performing the write.

- uio->uio_td and the td argument to soo_write() might potentially
  differ; we use the thread in the uio argument.

- The "sigpipe" regression test in src/tools/regression/sockets/sigpipe
  tests for the bug.

Submitted by:		Mikko Tyolajarvi <mbsd at pacbell dot net>
Talked with:		glebius, alfred
PR:			78478
MFC after:		1 week
2005-03-11 15:06:16 +00:00
John-Mark Gurney
74e620476c fix spelling of match in comment...
MFC after:	3 days
2005-03-10 21:23:06 +00:00
Poul-Henning Kamp
b43ab0e378 Try to fix the mess I made of devname, with the minimal subset of the
larger minor/major patch which was posted for testing.
2005-03-10 18:21:34 +00:00
Robert Watson
53358cc907 Document, via WITNESS, that the NFS server mutex falls ahead of the socket
buffer mutexes.
2005-03-09 21:38:53 +00:00
Dag-Erling Smørgrav
628b83cd08 My addled brains didn't realize that since vtp points into value, we
can't freeenv(value) before we're done inspecting vtp[0].

Tested by:	Anish Mistry <mistry.7@osu.edu>
2005-03-09 12:16:45 +00:00
Stefan Farfeleder
b26244446b Fix typo in comment. 2005-03-09 11:50:55 +00:00
Sam Leffler
a4e714295a allow the destination of m_move_pkthdr to have external
storage (e.g. a cluster)

Glanced at by:	rwatson, silby
2005-03-08 17:52:01 +00:00
Giorgos Keramidas
0a11e99990 Remove redundant initialization that is repeated in the for() loop
right below it.

Approved by:	jhb
2005-03-08 16:57:20 +00:00
Maxim Sobolev
8d6e40c3f1 Add kernel-only flag MSG_NOSIGNAL to be used in emulation layers to surpress
SIGPIPE signal for the duration of the sento-family syscalls. Use it to
replace previously added hack in Linux layer based on temporarily setting
SO_NOSIGPIPE flag.

Suggested by:	alfred
2005-03-08 16:11:41 +00:00
Poul-Henning Kamp
d9a54d5c23 Reengineer subr_unit
Add support for passing in a mutex.  If NULL is passed a global
	subr_unit mutex is used.

	Add alloc_unrl() which expects the mutex to be held.

	Allocating a unit will never sleep as it does not need to allocate
	memory.

	Cut possible range in half so we can use -1 to mean "out of number".

	Collapse first and last runs into the head by means of counters.
	This saves memory in the common case(s).
2005-03-08 10:40:48 +00:00
Poul-Henning Kamp
3238ec33e1 Fix signedness of minor2unit(). 2005-03-08 10:40:03 +00:00
Jeff Roberson
ec346d1040 - Lock access to the buffer_map with the vm_map lock. In 4.x this was
done with splbio, in 5.x this was done with Giant.

Discussed with:		alc
Reported by:		julian, pho
2005-03-08 09:34:54 +00:00
Giorgos Keramidas
46da8bf8fb Typo & grammar fixes in comments. 2005-03-08 00:58:50 +00:00
Robert Watson
9bfb7389bc When upcalling from a socket in soisconnected() for an accept filter,
call with flag M_DONTWAIT rather than M_TRYWAIT, as we don't want to
do blocking memory allocation (etc) in the netisr.

MFC after:	3 days
2005-03-07 13:50:16 +00:00
Poul-Henning Kamp
3b3f38ed7d Add placeholder mutex argument to new_unrhdr(). 2005-03-07 11:05:47 +00:00
Bill Paul
58a6edd121 When you call MiniportInitialize() for an 802.11 driver, it will
at some point result in a status event being triggered (it should
be a link down event: the Microsoft driver design guide says you
should generate one when the NIC is initialized). Some drivers
generate the event during MiniportInitialize(), such that by the
time MiniportInitialize() completes, the NIC is ready to go. But
some drivers, in particular the ones for Atheros wireless NICs,
don't generate the event until after a device interrupt occurs
at some point after MiniportInitialize() has completed.

The gotcha is that you have to wait until the link status event
occurs one way or the other before you try to fiddle with any
settings (ssid, channel, etc...). For the drivers that set the
event sycnhronously this isn't a problem, but for the others
we have to pause after calling ndis_init_nic() and wait for the event
to arrive before continuing. Failing to wait can cause big trouble:
on my SMP system, calling ndis_setstate_80211() after ndis_init_nic()
completes, but _before_ the link event arrives, will lock up or
reset the system.

What we do now is check to see if a link event arrived while
ndis_init_nic() was running, and if it didn't we msleep() until
it does.

Along the way, I discovered a few other problems:

- Defered procedure calls run at PASSIVE_LEVEL, not DISPATCH_LEVEL.
  ntoskrnl_run_dpc() has been fixed accordingly. (I read the documentation
  wrong.)

- Similarly, the NDIS interrupt handler, which is essentially a
  DPC, also doesn't need to run at DISPATCH_LEVEL. ndis_intrtask()
  has been fixed accordingly.

- MiniportQueryInformation() and MiniportSetInformation() run at
  DISPATCH_LEVEL, and each request must complete before another
  can be submitted. ndis_get_info() and ndis_set_info() have been
  fixed accordingly.

- Turned the sleep lock that guards the NDIS thread job list into
  a spin lock. We never do anything with this lock held except manage
  the job list (no other locks are held), so it's safe to do this,
  and it's possible that ndis_sched() and ndis_unsched() can be
  called from DISPATCH_LEVEL, so using a sleep lock here is
  semantically incorrect. Also updated subr_witness.c to add the
  lock to the order list.
2005-03-07 03:05:31 +00:00
Alan Cox
2b2c7a6b40 The m_ext reference counts are potentially shared and modified
asynchronously by different threads.  Thus, declare as volatile the
reference count that is accessed through m_ext's pointer, ref_cnt.
Revert the previous change, revision 1.144, that casts as volatile a
single dereference of ref_cnt.

Reviewed by: bmilekic, dwhite
Problem reported by: kris
MFC after: 3 days
2005-03-06 20:09:00 +00:00
Dag-Erling Smørgrav
f3301d15f1 Teach getenv_quad() to recognize k/m/g/t suffixes in both lower- and
upper-case.  This means (almost) all tunables now support those suffixes.
2005-03-05 15:52:12 +00:00
David Xu
bc8e6d817d Allocate umtx_q from heap instead of stack, this avoids
page fault panic in kernel under heavy swapping.
2005-03-05 09:15:03 +00:00
David Xu
627451c1d9 The td_waitset is pointing to a stack address when thread is waiting
for a signal, because kernel stack is swappable, this causes page fault
in kernel under heavy swapping case. Fix this bug by eliminating unneeded
code.
2005-03-04 22:46:31 +00:00
Maxim Sobolev
4b1783363f In linux emulation layer try to detect attempt to use linux_clone() to
create kernel threads and call rfork(2) with RFTHREAD flag set in this case,
which puts parent and child into the same threading group. As a result
all threads that belong to the same program end up in the same threading
group.

This is similar to what linuxthreads port does, though in this case we don't
have a luxury of having access to the source code and there is no definite
way to differentiate linux_clone() called for threading purposes from other
uses, so that we have to resort to heuristics.

Allow SIGTHR to be delivered between all processes in the same threading
group previously it has been blocked for s[ug]id processes.

This also should improve locking of the same file descriptor from different
threads in programs running under linux compat layer.

PR:			kern/72922
Reported by:		Andriy Gapon <avg@icyb.net.ua>
Idea suggested by:	rwatson
2005-03-03 16:57:55 +00:00
Doug White
a1d0c3f203 Insert volatile cast to discourage gcc from optimizing the read outside
of the while loop.

Suggested by:	alc
MFC after:	1 day
2005-03-03 02:41:37 +00:00
Joerg Wunsch
a5f50ef9e4 netchild's mega-patch to isolate compiler dependencies into a central
place.

This moves the dependency on GCC's and other compiler's features into
the central sys/cdefs.h file, while the individual source files can
then refer to #ifdef __COMPILER_FEATURE_FOO where they by now used to
refer to #if __GNUC__ > 3.1415 && __BARC__ <= 42.

By now, GCC and ICC (the Intel compiler) have been actively tested on
IA32 platforms by netchild.  Extension to other compilers is supposed
to be possible, of course.

Submitted by:	netchild
Reviewed by:	various developers on arch@, some time ago
2005-03-02 21:33:29 +00:00
David Xu
6675b36ec5 In kern_sigtimedwait, remove waitset bits for td_sigmask before
sleeping, so in do_tdsignal, we no longer need to test td_waitset.
now td_waitset is only used to give a thread higher priority when
delivering signal to multithreads process.
This also fixes a bug:
when a thread in sigwait states was suspended and later resumed
by SIGCONT, it can no longer receive signals belong to waitset.
2005-03-02 13:43:51 +00:00
Paul Saab
b8a4edc17e Use kern_kevent instead of the stackgap for 32bit syscall wrapping.
Submitted by:	jhb
Tested on:	amd64
2005-03-01 17:45:55 +00:00
Paul Saab
c1aa81b6d9 regen 2005-03-01 17:44:34 +00:00
Paul Saab
96d31285fe Change the prototype of kevent to remove the const from the changelist.
Reviewed by:	jhb
2005-03-01 17:43:08 +00:00
Robert Watson
081322613b When mac_check_system_acct() fails, make sure to unlock as well as close
the vnode.

Pointed out by:	jeff
2005-03-01 08:56:13 +00:00
Wes Peters
a09150446d Add a sysctl that records the amount of physical memory in the machine.
Submitted by:	Nicko Dehaine <nicko@stbernard.com>
MFC after:	1 day
2005-02-28 21:42:56 +00:00
Poul-Henning Kamp
8045ce213d Also handle d_maj hints from cloning drivers correctly. 2005-02-27 22:57:32 +00:00
Poul-Henning Kamp
84f580a093 Whine about any drivers which hardcode the device major number. 2005-02-27 22:41:07 +00:00
Poul-Henning Kamp
acd102e64b Use dynamic major number allocation. 2005-02-27 22:02:03 +00:00
Poul-Henning Kamp
78e253c8d5 Use dynamic major number allocation. 2005-02-27 22:00:45 +00:00
Poul-Henning Kamp
89685e2269 Use dynamic major number allocation for /dev/console, there is no
longer any benefit from hard wiring it.

Remove special hack used to wire major to zero despite zero having a
different magic meaning as well.
2005-02-27 21:52:42 +00:00
Nate Lawson
789f03ceb4 Add locking to handle multiple threads getting/setting frequencies at the
same time.  We use an sx lock and serialize the cpufreq device's
get/set/levels methods.
2005-02-27 01:34:08 +00:00
Nate Lawson
b070969b48 Allow users to reject levels below a given frequency (in MHz) via the
debug.cpufreq.lowest tunable and sysctl.  Some systems seem to have problems
with the lowest frequencies so setting this prevents them from being
available or used.
2005-02-26 22:37:49 +00:00
Tom Rhodes
183a16a3ec Remove recently added note about DEVICE_POLLING not working with SMP.
Remove warning from kern_poll.c to allow DEVICE_POLLING to be built with SMP.

Discussed with:	ru, glebius
2005-02-25 22:07:51 +00:00
Robert Watson
fa6fc5b819 Insert missing increment of (i) when walking the temporary semaphore
vector during fork.

Fix assertion which contained an off-by-one error.

Submitted by:	Antoine Brodin < antoine dot brodin at laposte dot net >
2005-02-25 21:00:14 +00:00
Robert Watson
590f242cc0 Add an exit hook, sem_forkhook(), which walks the list of POSIX semaphores
owned by a process when it forks, and creates a matching set of references
for the child process, as prescribed by POSIX.

In order to avoid races with other threads in the parent process during
fork(), it is necessary to allocate a temporary reference list while
holding the sem_lock, then transfer those references to the new process
once the sem_lock is released.  The implementation is inefficient but
appears functional; in order to improve the efficiency, it will be
necessary to modify the existing structures and logic, which generally
rely on O(n) operations over the global set of semaphores.
2005-02-25 19:10:51 +00:00
Robert Watson
955ec4156c Assert sem_lock in id_to_sem() and sem_lookup_byname(), since these
functions iterate over the global POSIX semaphore lists.

MFC after:	3 days
2005-02-25 17:01:35 +00:00
Maxim Sobolev
90dc539be0 Welcome to the 21st century: increase MAXSHELLCMDLEN from 128 bytes to
PAGE_SIZE.

Unlike originator of the PR suggests retain MAXSHELLCMDLEN definition
(he has been proposing to replace it with PAGE_SIZE everywhere), not only
this reduced the diff significantly, but prevents code obfuscation and also
allows to increase/decrease this parameter easily if needed.

PR:		kern/64196
Submitted by:	Magnus Bäckström <b@etek.chalmers.se>
2005-02-25 11:49:42 +00:00
Maxim Sobolev
6916a1da50 o Replace two while {} do loops with more appropriate do {} while loops. This
doesn't change functionality, but makes code more logical.

Obtained from:	DrafonFlyBSD

o Use VOP_GETATTR() to obtain actual size of file and parse no more than that.
  Previously, we parsed MAXSHELLCMDLEN characters regardless of the actual file
  size. This makes the following working:

$ printf '#!/bin/echo' > /tmp/test.sh
$ chmod 755 /tmp/test.sh
$ /tmp/test.sh

Previously, attempts to execve() that shell script has been failing with bogus
ENAMETOOLONG.

PR:		kern/64196
Submitted by:	Magnus B.ckstr.m <b@etek.chalmers.se>
2005-02-25 10:17:53 +00:00
Maxim Sobolev
b4305f8d91 Try harder to not exceed MAXSHELLCMDLEN when parsing first line of shell
script. Otherwise it's possible to panic kernel by constructing a shell
script with first line not ending in '\n'.

Also, treat '\0' as line terminating character, which may me useful in
some situations.

Submitted by:	gad
2005-02-25 08:42:04 +00:00
Nate Lawson
d269386a24 Bump the maximum number of levels to 64 and add warning messages about
what to do to fix reduced functionality if the number of levels is too low.
2005-02-24 20:21:41 +00:00
Sam Leffler
59d8b31002 change m_adj to reclaim unused mbufs instead of zero'ing m_len
when trim'ing space off the back of a chain; this is indirect
solution to a potential null ptr deref

Noticed by:	Coverity Prevent analysis tool (null ptr deref)
Reviewed by:	dg, rwatson
2005-02-24 00:40:33 +00:00
Christian S.J. Peron
cd13819433 Add locking assertions into vn_extattr_set, vn_extattr_get and
vn_extattr_rm. This is meant to catch conditions where IO_NODELOCKED
has been specified without the vnode being locked.

Discussed with:	rwatson
MFC after:	1 week
2005-02-24 00:13:16 +00:00
Christian S.J. Peron
df579737e5 Drop bzero and shove the responsibility of zeroing the kse upcall
object on to the zone allocator. It should be noted that uma_zalloc(9)
uses bzero to zero out the object so there probably wont be any
real performance benefit. If UMA grows the ability to supply
zeroed zones more efficiently in the future, we will not have to
modify all the existing consumers.

Discussed with:	rwatson,julian
MFC after:	1 week
2005-02-24 00:05:50 +00:00
Sam Leffler
9d8993bbc5 remove dead code
Noticed by:	Coverity Prevent analysis tool
Reviewed by:	silby
2005-02-23 19:34:44 +00:00
Sam Leffler
15ecf3968d eliminate potential null deref
Noticed by:	Coverity Prevent analysis tool
Reviewed by:	jhb
2005-02-23 19:32:29 +00:00
Jeff Roberson
d9a9c2c22c - Enable SMP VFS by default on current. More users are needed to turn up
any remaining bugs.  Anyone inconvenienced by this can still disable it
   in the loader.

Sponsored by:	Isilon Systems, Inc.
2005-02-23 10:05:43 +00:00
Jeff Roberson
7a9507b60e - A test in sched_switch() is no longer necessary and it is incorrect
when td0 is preempted before it voluntarily switches.

Discovered by:	Arjan Van Leeuwen <avleeuwen@gmail.com>
2005-02-23 00:50:26 +00:00
Sam Leffler
3e55226c46 kill dead code
Noticed by:	Coverity Prevent analysis tool
2005-02-23 00:43:00 +00:00
Jeff Roberson
d8a7c99a1c - Only the xlock holder should be calling VOP_LOCK on a vp once VI_XLOCK
has been set.  Assert that this is the case so that we catch filesystems
   who are using naked VOP_LOCKs in illegal cases.

Sponsored by:	Isilon Systems, Inc.
2005-02-23 00:11:14 +00:00
Jeff Roberson
4c11620bb9 - Add a check for xlock in vop_lock_assert. Presently the xlock is
considered to be as good as an exclusive lock, although there is still a
   possibility of someone acquiring a VOP LOCK while xlock is held.

Sponsored by:	Isilon Systems, Inc.
2005-02-22 23:59:11 +00:00
Poul-Henning Kamp
767056c0e8 Zero the v_un container field to make sure everything is gone. 2005-02-22 18:56:18 +00:00
Poul-Henning Kamp
aa2f6ddc3f Reap more benefits from DEVFS:
List devfs_dirents rather than vnodes off their shared struct cdev, this
saves a pointer field in the vnode at the expense of a field in the
devfs_dirent.  There are often 100 times more vnodes so this is bargain.
In addition it makes it harder for people to try to do stypid things like
"finding the vnode from cdev".

Since DEVFS handles all VCHR nodes now, we can do the vnode related
cleanup in devfs_reclaim() instead of in dev_rel() and vgonel().
Similarly, we can do the struct cdev related cleanup in dev_rel()
instead of devfs_reclaim().

	rename idestroy_dev() to destroy_devl() for consistency.

	Add LIST_ENTRY de_alias to struct devfs_dirent.
	Remove v_specnext from struct vnode.
	Change si_hlist to si_alist in struct cdev.
	String new devfs vnodes' devfs_dirent on si_alist when
	we create them and take them off in devfs_reclaim().

	Fix devfs_revoke() accordingly.  Also don't clear fields
	devfs_reclaim() will clear when called from vgone();

	Let devfs_reclaim() call dev_rel() instead of vgonel().

	Move the usecount tracking from dev_rel() to devfs_reclaim(),
	and let dev_rel() take a struct cdev argument instead of vnode.

	Destroy SI_CHEAPCLONE devices in dev_rel() (instead of
	devfs_reclaim()) when they are no longer used.   (This
	should maybe happen in devfs_close() instead.)
2005-02-22 15:51:07 +00:00
Poul-Henning Kamp
1a1457d427 Make dev_ref() require the dev_lock() to be held and use it from
devfs instead of directly frobbing the si_refcount.
2005-02-22 14:41:04 +00:00
Poul-Henning Kamp
7fc940b266 Remove vfinddev(), it is generally bogus when faced with jails and
chroot and has no legitimate use(r)s in the tree.
2005-02-22 14:11:47 +00:00
Robert Watson
4f7fd28ee1 When invoking callout_init(), spell '1' as "CALLOUT_MPSAFE".
MFC after:	3 days
2005-02-22 13:11:33 +00:00
Robert Watson
0daccb9c94 In the current world order, solisten() implements the state transition of
a socket from a regular socket to a listening socket able to accept new
connections.  As part of this state transition, solisten() calls into the
protocol to update protocol-layer state.  There were several bugs in this
implementation that could result in a race wherein a TCP SYN received
in the interval between the protocol state transition and the shortly
following socket layer transition would result in a panic in the TCP code,
as the socket would be in the TCPS_LISTEN state, but the socket would not
have the SO_ACCEPTCONN flag set.

This change does the following:

- Pushes the socket state transition from the socket layer solisten() to
  to socket "library" routines called from the protocol.  This permits
  the socket routines to be called while holding the protocol mutexes,
  preventing a race exposing the incomplete socket state transition to TCP
  after the TCP state transition has completed.  The check for a socket
  layer state transition is performed by solisten_proto_check(), and the
  actual transition is performed by solisten_proto().

- Holds the socket lock for the duration of the socket state test and set,
  and over the protocol layer state transition, which is now possible as
  the socket lock is acquired by the protocol layer, rather than vice
  versa.  This prevents additional state related races in the socket
  layer.

This permits the dual transition of socket layer and protocol layer state
to occur while holding locks for both layers, making the two changes
atomic with respect to one another.  Similar changes are likely require
elsewhere in the socket/protocol code.

Reported by:		Peter Holm <peter@holm.cc>
Review and fixes from:	emax, Antoine Brodin <antoine.brodin@laposte.net>
Philosophical head nod:	gnn
2005-02-21 21:58:17 +00:00
Robert Watson
c364c823d0 When aborting a UNIX domain socket bind() because VOP_CREATE() failed,
make sure to call vn_finished_write(mp) before returning.

MFC after:	3 days
2005-02-21 14:21:50 +00:00
Robert Watson
892af6b930 style(9)-ize function headers, remove use of 'register'.
MFC after:	3 days
2005-02-20 23:22:13 +00:00
David Schultz
e8ed933099 Remove VFS_START(). Its original purpose involved the mfs filesystem,
which is long gone.

Discussed with:	mckusick
Reviewed by:	phk
2005-02-20 23:02:20 +00:00
Robert Watson
d664e4fa50 In unp_attach(), allow uma_zalloc to zero the new unpcb rather than
explicitly using bzero().

Update copyright.

MFC after:	3 days
2005-02-20 20:05:11 +00:00
Robert Watson
2b85a170d1 Prefer NULL to returning 0 cast to a pointer type.
MFC after:	3 days
2005-02-20 15:56:13 +00:00
Robert Watson
a00428ef92 In soreceive(), when considering delivery to a socket in SS_ISCONFIRMING,
only call the protocol's pru_rcvd() if the protocol has the flag
PR_WANTRCVD set.  This brings that instance of pru_rcvd() into line with
the rest, which do check the flag.

MFC after:	3 days
2005-02-20 15:54:44 +00:00
Robert Watson
7301cf23ef Move assignment of UNIX domain socket pcb during unp_attach() outside
of the global UNIX domain socket mutex: no protection is needed that
early in the setup of the UNIX domain socket and socket structures.

MFC after:	3 days
2005-02-20 04:18:22 +00:00
Nate Lawson
e959a70bad Add the "freq_settings" sysctl to each device that registers with cpufreq
so their individual settings can be seen separately for debugging.
2005-02-20 00:59:15 +00:00
Poul-Henning Kamp
dfd4be14bd Try to unbreak the vnode locking around vop_reclaim() (based mostly on
patch from kan@).

Pull bufobj_invalbuf() out of vinvalbuf() and make g_vfs call it on
close.  This is not yet a generally safe function, but for this very
specific use it is safe.  This solves the problem with buffers not
being flushed by unmount or after failed mount attempts.
2005-02-19 11:44:57 +00:00
David Xu
1089f0319b Don't restart a timeout wait in kern_sigtimedwait, also allow it
to wait longer than a single integer can represent.
2005-02-19 06:05:49 +00:00
Paul Saab
b7820945ac Swap the arguments for CP so we copy the correct source and
destination.
2005-02-18 22:14:40 +00:00
Robert Watson
29bdd01910 Remove now unused 'int s' from spl().
MFC after:	3 days
2005-02-18 21:39:55 +00:00
Robert Watson
d8d716bef5 De-spl kern_connect().
MFC after:	3 days
2005-02-18 19:37:36 +00:00
Robert Watson
a7ae36bc45 Correct a typo in the comment describing soreceive_rcvoob().
MFC after:	3 days
2005-02-18 19:15:22 +00:00
Robert Watson
1b5c4b15b4 In soconnect(), when resetting so->so_error, the socket lock is not
required due to a straight integer write in which minor races are not
a problem.
2005-02-18 19:13:51 +00:00
Robert Watson
11d06c4b78 Re-style do_setopt_accept_filter() to match uipc_accf.c style, and fix
one other style nit in the file.

MFC after:	3 days
2005-02-18 19:01:22 +00:00
Robert Watson
78e436448f Move do_setopt_accept_filter() from uipc_socket.c to uipc_accf.c, where
the rest of the accept filter code currently lives.

MFC after:	3 days
2005-02-18 18:54:42 +00:00
Robert Watson
1ed716a149 Minor style tweaks: line wrap comments and lines more consistently.
MFC after:	3 days
2005-02-18 18:49:44 +00:00
Robert Watson
627de7fa2c Re-order checks in socheckuid() so that we check all deny cases before
returning accept.

MFC after:	3 days
2005-02-18 18:43:33 +00:00
Poul-Henning Kamp
900b7e2648 Make sure to drop the VI_LOCK in vgonel();
Spotted by: Taku YAMAMOTO <taku@tackymt.homeip.net>
2005-02-18 11:13:56 +00:00
Robert Watson
0d89301c51 In solisten(), unconditionally set the SO_ACCEPTCONN option in
so->so_options when solisten() will succeed, rather than setting it
conditionally based on there not being queued sockets in the completed
socket queue.  Otherwise, if the protocol exposes new sockets via the
completed queue before solisten() completes, the listen() system call
will succeed, but the socket and protocol state will be out of sync.
For TCP, this didn't happen in practice, as the TCP code will panic if
a new connection comes in after the tcpcb has been transitioned to a
listening state but the socket doesn't have SO_ACCEPTCONN set.

This is historical behavior resulting from bitrot since 4.3BSD, in which
that line of code was associated with the conditional NULL'ing of the
connection queue pointers (one-time initialization to be performed
during the transition to a listening socket), which are now initialized
separately.

Discussed with:	fenner, gnn
MFC after:	3 days
2005-02-18 00:52:17 +00:00
Nate Lawson
e94a0c1a18 Introduce a new method, cpufreq_drv_type(), that returns the type of the
driver.  This used to be handled by cpufreq_drv_settings() but it's
useful to get the type/flags separately from getting the settings.
(For example, you don't have to pass an array of cf_setting just to find
the driver type.)

Use this new method in our in-tree drivers to detect reliably if acpi_perf
is present and owns the hardware.  This simplifies logic in drivers as well
as fixing a bug introduced in my last commit where too many drivers attached.
2005-02-18 00:23:36 +00:00
Robert Watson
1e8f89541e In accept1(), extend coverage of the socket lock from just covering
soref() to also covering the update of so_state.  While no other user
threads can update the socket state here as it's not yet hooked up to
the file descriptor array yet, the protocol could also frob the
socket state here, leading to a lost update to the so_state field.
No reported instances of this bug (as yet).

MFC after:      3 days
2005-02-17 13:00:23 +00:00
Robert Watson
280249a66a In sonewconn(), set the new socket's state to show the protocol-provided
connection status before inserting the new socket into the listen
socket's accept queue, or there might be a race in which another thread
wakes up when the accept lock is released, and sees the socket before its
state is set correctly.  The wakeup still occurs after the accept lock is
released.  There have been no diagnoses of this bug in real-world systems
(as yet).

MFC after:	3 days
2005-02-17 12:53:45 +00:00
Poul-Henning Kamp
4d8ac58b05 Introduce vx_wait{l}() and use it instead of home-rolled versions. 2005-02-17 10:49:51 +00:00
Poul-Henning Kamp
58aac12894 Convert KASSERTS to VNASSERTS 2005-02-17 10:28:58 +00:00
Dag-Erling Smørgrav
f3f4baf099 Add /rescue/init to the default init_path, before /stand/sysinstall.
MFC after:	2 weeks
2005-02-17 10:00:10 +00:00
Bosko Milekic
8076cb5289 Well, it seems that I pre-maturely removed the "All rights reserved"
statement from some files, so re-add it for the moment, until the
related legalese is sorted out.  This change affects:

sys/kern/kern_mbuf.c
sys/vm/memguard.c
sys/vm/memguard.h
sys/vm/uma.h
sys/vm/uma_core.c
sys/vm/uma_dbg.c
sys/vm/uma_dbg.h
sys/vm/uma_int.h
2005-02-16 21:45:59 +00:00
Nate Lawson
67c8649f7f When dealing with systems with no absolute drivers attached, only calibrate
the rate for the 100% state once.  Afterwards, use that value for deriving
states.  This should fix the problem where the calibrated frequency was
different once a switch was done, giving a different set of levels each
time.  Also, properly search for the right cpufreqX device when detaching.
2005-02-15 07:43:48 +00:00
Nate Lawson
1196826af5 Bind to the driver's parent cpu before switching, for both absolute and
relative drivers.  Remove some extraneous KASSERTs since NULL pointers
will be found when they're used right afterwards.
2005-02-15 07:22:42 +00:00
Nate Lawson
5f0afa0415 Implement priorities. This allows a driver (say, for cooling purposes) to
override the current freq level temporarily and restore it when the
higher priority condition is past.  Note that only the first overridden
value is saved.  Callers pass NULL to CPUFREQ_SET to restore the saved
level.  Priorities are not yet used so this commit should have no effect.
2005-02-14 18:16:35 +00:00
Nate Lawson
e22cd41c01 Add support for the CPUFREQ_FLAG_INFO_ONLY flag. Devices that report this
are not added to the list(s) of available settings.  However, other drivers
can call the CPUFREQ_DRV_SETTINGS() method on those devices directly to
get info about available settings.

Update the acpi_perf(4) driver to use this flag in the presence of
"functional fixed hardware."  Thus, future drivers like Powernow can
query acpi_perf for platform info but perform frequency transitions
themselves.
2005-02-13 18:49:48 +00:00
Maxim Sobolev
f460d05699 Backout addition of SIGTHR into the list of signals allowed to be delivered
to the suid/sugid process, since apparently it has security implications.

Suggested by:   rwatson
2005-02-13 17:51:47 +00:00
Maxim Sobolev
1a88a252fd Backout previous change (disabling of security checks for signals delivered
in emulation layers), since it appears to be too broad.

Requested by:   rwatson
2005-02-13 17:37:20 +00:00
Nate Lawson
0325089dad Set levels on all CPUs and attach a cpufreq device to each one. Sysctl
on dev.cpu.0 will affect all of the CPUs together.  In the future,
independent control will be supported but this is good enough for now.
Check that the timecounter isn't TSC before switching (from Colin Percival.)
2005-02-13 17:31:56 +00:00
Maxim Sobolev
d8ff44b79f Split out kill(2) syscall service routine into user-level and kernel part, the
former is callable from user space and the latter from the kernel one. Make
kernel version take additional argument which tells if the respective call
should check for additional restrictions for sending signals to suid/sugid
applications or not.

Make all emulation layers using non-checked version, since signal numbers in
emulation layers can have different meaning that in native mode and such
protection can cause misbehaviour.

As a result remove LIBTHR from the signals allowed to be delivered to a
suid/sugid application.

Requested (sorta) by:	rwatson
MFC after:	2 weeks
2005-02-13 16:42:08 +00:00
Christian S.J. Peron
84f85aedef Add much needed descriptions for a number of the IPC related sysctl OIDs.
This information will be very useful for people who are tuning applications
which have a dependence on IPC mechanisms.

The following OIDs were documented:

Message queues:
 kern.ipc.msgmax
 kern.ipc.msgmni
 kern.ipc.msgmnb
 kern.ipc.msgtlq
 kern.ipc.msgssz
 kern.ipc.msgseg

Semaphores:
 kern.ipc.semmap
 kern.ipc.semmni
 kern.ipc.semmns
 kern.ipc.semmnu
 kern.ipc.semmsl
 kern.ipc.semopm
 kern.ipc.semume
 kern.ipc.semusz
 kern.ipc.semvmx
 kern.ipc.semaem

Shared memory:
 kern.ipc.shmmax
 kern.ipc.shmmin
 kern.ipc.shmmni
 kern.ipc.shmseg
 kern.ipc.shmall
 kern.ipc.shm_use_phys
 kern.ipc.shm_allow_removed
 kern.ipc.shmsegs

These new descriptions can be viewed using sysctl -d

PR:		kern/65219
Submitted by:	Dan Nelson <dnelson at allantgroup dot com> (modified)
No objections:	developers@
Descriptions reviewed by: gnn
MFC after:	1 week
2005-02-12 01:22:39 +00:00
Maxim Sobolev
ac16ff40c5 Add SIGTHR (32) into list of signals permitted to be delivered to the
suid application. The problem is that Linux applications using old Linux
threads (pre-NPTL) use signal 32 (linux SIGRTMIN) for communication between
thread-processes. If such an linux application is installed suid or sgid
and security.bsd.conservative_signals=1 (default), then permission will be
denied to send such a signal and the application will freeze.

I believe the same will be true for native applications that use libthr,
since libthr uses SIGTHR for implementing conditional variables.

PR:		72922
Submitted by:	Andriy Gapon <avg@icyb.net.ua>
MFC after:	2 weeks
2005-02-11 14:02:42 +00:00
Ian Dowse
57c037be1c When processing a timeout() callout and returning it to the free
list, set `curr_callout' to NULL. This ensures that we won't attempt
to cancel the current callout if the original callout structure
gets recycled while we wait to acquire Giant.

This is reported to fix an intermittent syscons problem that was
introduced by revision 1.96.
2005-02-11 00:14:00 +00:00
Bosko Milekic
3d2a3ff25e Optimize the way reference counting is performed with Mbufs. We
do not need to perform an extra memory fetch in the Packet (Mbuf+Cluster)
constructor to initialize the reference counter anymore.  The reference
counts are located in a separate memory region (in the slab header,
because this zone is UMA_ZONE_REFCNT), so the memory fetch resulted very
often in a cache miss.  Additionally, and perhaps more significantly,
optimize the free mbuf+cluster (packet) case, which is very common, to
no longer require an atomic operation on free (to verify the reference
counter) if the reference on the cluster has never been increased (also
very common).  Reduces an atomic on mbuf free on average.

Original patch submitted by: Gerrit Nagelhout <gnagelhout@sandvine.com>
2005-02-10 22:23:02 +00:00