130542 Commits

Author SHA1 Message Date
mckusick
325fee4370 Fix an error in dumping large sparse files containing extended attributes. 2007-02-27 07:28:17 +00:00
kmacy
b7672bad26 Further improvements to LOCK_PROFILING:
- Fix missing initialization in kern_rwlock.c causing bogus times to be collected
 - Move updates to the lock hash to after the lock is released for spin mutexes,
   sleep mutexes, and sx locks
 - Add new kernel build option LOCK_PROFILE_FAST - only update lock profiling
   statistics when an acquisition is contended. This reduces the overhead of
   LOCK_PROFILING to increasing system time by 20%-25% which on
   "make -j8 kernel-toolchain" on a dual woodcrest is unmeasurable in terms
   of wall-clock time. Contrast this to enabling lock profiling without
   LOCK_PROFILE_FAST and I see a 5x-6x slowdown in wall-clock time.
2007-02-27 06:42:05 +00:00
grog
128ac1d595 Update HISTORY.
Reviewed by:	dmr
2007-02-27 05:39:22 +00:00
bde
a94865a5ec Fixed some style bugs (whitespace lossage for removal of __P(()), and
lots of naming and typing errors involving `interval').
2007-02-27 05:10:36 +00:00
bde
3ee0d09a44 Use a periodic itimer instead of repeated calls to alarm() in
sidewaysintpr().  This increases the accuracy of the per-interval
counts when they are interpreted as rates.  Repeated calls to alarm(n)
give an average interval that is about 2 ticks larger than n and has
a large variance.  Periodic itimers normally get the average almost
right but have similarly large variance (due to scheduling delays).

Statistics utilities should use clock_gettime() to determine the
actual interval, but it is still useful to maximize the accuracy of
the interval, especially for cases like netstat -w where counts are
displayed so the program cannot hide the inaccuracy in a rate
conversion.
2007-02-27 04:54:33 +00:00
mjacob
05b92097cb First cut at GEOM based multipath. This is an active/passive{/passive...}
arrangement that has no intrinsic internal knowledge of whether devices
it is given are truly multipath devices. As such, this is a simplistic
approach, but still a useful one.

The basic approach is to (at present- this will change soon) use camcontrol
to find likely identical devices and and label the trailing sector of the
first one. This label contains both a full UUID and a name. The name is
what is presented in /dev/multipath, but the UUID is used as a true
distinguishor at g_taste time, thus making sure we don't have chaos
on a shared SAN where everyone names their data multipath as "Fred".

The first of N identical devices (and N *may* be 1!) becomes the active
path until a BIO request is failed with EIO or ENXIO. When this occurs,
the active disk is ripped away and the next in a list is picked to
(retry and) continue with.

During g_taste events new disks that meet the match criteria for existing
multipath geoms get added to the tail end of the list.

Thus, this active/passive setup actually does work for devices which
go away and come back, as do (now) mpt(4) and isp(4) SAN based disks.

There is still a lot to do to improve this- like about 5 of the 12
recommendations I've received about it,  but it's been functional enough
for a while that it deserves a broader test base.

Reviewed by: pjd
Sponsored by: IronPort Systems
MFC: 2 months
2007-02-27 04:01:58 +00:00
grog
502eb2ec0e Add warning about deadlocks created by use of wakeup_one. 2007-02-27 02:51:41 +00:00
jkim
2620bd06da MFP4: 115094
Linux does not check file descriptor when MAP_ANONYMOUS is set.
This should fix recent LTP test regressions.

Reported by:	Scot Hetzel (swhetzel at gmail dot com)
		netchild
2007-02-27 02:08:01 +00:00
pjd
287d98b314 Replace spaces with tabs in some places. 2007-02-27 01:48:58 +00:00
njl
7eb821d885 Rework EC I/O approach. Implement burst mode, including proper handling of
case where it asynchronously exits burst mode on its own.  Handle different
values of hz in sleep loop.  Provide more debugging options to tune EC
behavior.  These tunables/sysctls may be temporary and are not for user
access if the EC is working properly.  Burst mode is now on by default for
testing and the poll interval has been increased from 100 to 500 us and
total timeout from 100 to 500 ms.

Hopefully this should be the first step of addressing reports of timeout
errors during battery or thermal access, especially on HP/Compaq laptops.
It is reasonably stable and should not cause a loss of functionality or
performance on systems that were previously working.  Testing shows an
increase of responsiveness by ~75% on one system.

PR:		kern/98171
2007-02-27 00:14:20 +00:00
mohans
384aeb29f6 Reap FIN_WAIT_2 connections marked SOCANTRCVMORE faster. This mitigate
potential issues where the peer does not close, potentially leaving
thousands of connections in FIN_WAIT_2. This is controlled by a new sysctl
fast_finwait2_recycle, which is disabled by default.

Reviewed by: gnn, silby.
2007-02-26 22:25:21 +00:00
jkim
2bd7382fdc Add three new ioctl(2) commands for bpf(4).
- BIOCGDIRECTION and BIOCSDIRECTION get or set the setting determining
whether incoming, outgoing, or all packets on the interface should be
returned by BPF.  Set to BPF_D_IN to see only incoming packets on the
interface.  Set to BPF_D_INOUT to see packets originating locally and
remotely on the interface.  Set to BPF_D_OUT to see only outgoing
packets on the interface.  This setting is initialized to BPF_D_INOUT
by default.  BIOCGSEESENT and BIOCSSEESENT are obsoleted by these but
kept for backward compatibility.

- BIOCFEEDBACK sets packet feedback mode.  This allows injected packets
to be fed back as input to the interface when output via the interface is
successful.  When BPF_D_INOUT direction is set, injected outgoing packet
is not returned by BPF to avoid duplication.  This flag is initialized to
zero by default.

Note that libpcap has been modified to support BPF_D_OUT direction for
pcap_setdirection(3) and PCAP_D_OUT direction is functional now.

Reviewed by:	rwatson
2007-02-26 22:24:14 +00:00
rwatson
fdcdf27f80 Revise locking strategy used for UNIX domain sockets in order to improve
concurrency:

- Add per-unpcb mutexes protecting unpcb connection state, fields, etc.

- Replace global UNP mutex with a global UNP rwlock, which will protect the
  UNIX domain socket connection topology, v_socket, and be acquired
  exclusively before acquiring more than per-unpcb at a time in order to
  avoid lock order issues.

In performance measurements involving MySQL, this change has little or no
overhead on UP (+/- 1%), but leads to a significant (5%-30%) improvement in
multi-processor measurements using the sysbench and supersmack benchmarks.

Much testing by:	kris
Approved by:		re (kensmith)
2007-02-26 20:47:52 +00:00
jhb
dcb274ad6e Use NULL rather than 0 for various pointer constants. 2007-02-26 19:28:18 +00:00
rwatson
34647636ba Add rw_wowned(9) symlink. 2007-02-26 19:09:36 +00:00
rwatson
35d54d13a9 Update rwlock(9) for rw_wowned(). 2007-02-26 19:07:41 +00:00
rwatson
03d72b2dff Add rw_wowned() interface to rwlock(9), allowing a kernel thread to
determine if it holds an exclusive rwlock reference or not.  This is
non-ideal, but recursion scenarios in the network stack currently
require it.

Approved by:	jhb
2007-02-26 19:05:13 +00:00
jhb
7816149d47 Mark the kernel linker file as linked so that it is visible to the various
kld*() syscalls.

Tested by:	piso
2007-02-26 16:48:14 +00:00
jhb
c40e03e025 Fix a comment. 2007-02-26 16:36:48 +00:00
bms
b0363fd079 Document m_pulldown().
Obtained from:	MBUF issues in 4.4BSD IPv6/IPsec support (itojun)
2007-02-26 15:17:19 +00:00
rrs
dfdc42503b Fix include declaration it was sys/sctp.h should be netinet/sctp.h,
reported by pluknet@gmail.com.
2007-02-26 12:23:32 +00:00
rwatson
4afabb70d3 Mark data structures used on the wire with IPX SAP as __packed so that
they are not inappropriately padded as a result of compiler changes.

PR:		kern/74105
Submitted by:	Bob Johnson <bob89 at eng dot ufl dot edu>
2007-02-26 12:07:08 +00:00
rwatson
5961533dd0 Build ipx_ip.c only if options IPXIP is defined. No functional change. 2007-02-26 11:55:34 +00:00
ru
56c88fa187 Don't block on the socket zone limit during the socket()
call which can easily lock up a system otherwise; instead,
return ENOBUFS as documented in a manpage, thus reverting
us to the FreeBSD 4.x behavior.

Reviewed by:	rwatson
MFC after:	2 weeks
2007-02-26 10:45:21 +00:00
rwatson
5023c1b4ee Fix a likely bug by adding what appears to be a missing break statement
in the IPX over IP configuration ioctl: when changing the flags on a
tunnel interface, return the generated error rather than always EINVAL.
2007-02-26 10:16:53 +00:00
kmacy
6508c4f27b general LOCK_PROFILING cleanup
- only collect timestamps when a lock is contested - this reduces the overhead
  of collecting profiles from 20x to 5x

- remove unused function from subr_lock.c

- generalize cnt_hold and cnt_lock statistics to be kept for all locks

- NOTE: rwlock profiling generates invalid statistics (and most likely always has)
  someone familiar with that should review
2007-02-26 08:26:44 +00:00
mckusick
01ee9020b3 Update the dump program to save extended attributes. Update
the restore program to restore all dumped extended attributes.

If the restore is running as root, it will always be able
to restore all extended attributes. If it is not running
as root, it makes a best effort to set them. Using the -v
command line flag or the `verbose' command in interactive
mode will display all the extended attributes being set on
files (and at the end on directories) that are being restored.
It will note any extended attributes that could not be set.

The extended attributes are placed on the dump image immediately
following each file's data. Older versions of restore can work
with the newer dump images. Old versions of restore will
correctly restore the file data and then (silently) skip
over the extended attribute data and proceed to the next file.

This resolves PR 93085 which will be closed once the code
has been MFC'ed.

Note that this code will not compile until these header
files have been updated: <protocols/dumprestore.h> and
<sys/extattr.h>.

PR:		bin/93085
Comments from:	Poul-Henning Kamp and Robert Watson
MFC after:	3 weeks
2007-02-26 08:15:56 +00:00
mckusick
22aa654f0b Declare a `struct extattr' that defines the format of an extended
attribute. Also define some macros to manipulate one of these
structures. Explain their use in the extattr.9 manual page.

The next step will be to make a sweep through the kernel replacing
the old pointer manipulation code. To get an idea of how they would
be used, the ffs_findextattr() function in ufs/ffs/ffs_vnops.c is
currently written as follows:

/*
 * Vnode operating to retrieve a named extended attribute.
 *
 * Locate a particular EA (nspace:name) in the area (ptr:length), and return
 * the length of the EA, and possibly the pointer to the entry and to the data.
 */
static int
ffs_findextattr(u_char *ptr, u_int length, int nspace, const char *name,
    u_char **eap, u_char **eac)
{
	u_char *p, *pe, *pn, *p0;
	int eapad1, eapad2, ealength, ealen, nlen;
	uint32_t ul;

	pe = ptr + length;
	nlen = strlen(name);

	for (p = ptr; p < pe; p = pn) {
		p0 = p;
		bcopy(p, &ul, sizeof(ul));
		pn = p + ul;
		/* make sure this entry is complete */
		if (pn > pe)
			break;
		p += sizeof(uint32_t);
		if (*p != nspace)
			continue;
		p++;
		eapad2 = *p++;
		if (*p != nlen)
			continue;
		p++;
		if (bcmp(p, name, nlen))
			continue;
		ealength = sizeof(uint32_t) + 3 + nlen;
		eapad1 = 8 - (ealength % 8);
		if (eapad1 == 8)
			eapad1 = 0;
		ealength += eapad1;
		ealen = ul - ealength - eapad2;
		p += nlen + eapad1;
		if (eap != NULL)
			*eap = p0;
		if (eac != NULL)
			*eac = p;
		return (ealen);
	}
	return(-1);
}

After applying the structure and macros, it would look like this:

/*
 * Vnode operating to retrieve a named extended attribute.
 *
 * Locate a particular EA (nspace:name) in the area (ptr:length), and return
 * the length of the EA, and possibly the pointer to the entry and to the data.
 */
static int
ffs_findextattr(u_char *ptr, u_int length, int nspace, const char *name,
    u_char **eapp, u_char **eac)
{
	struct extattr *eap, *eaend;

	eaend = (struct extattr *)(ptr + length);
	for (eap = (struct extattr *)ptr; eap < eaend; eap = EXTATTR_NEXT(eap)){
		/* make sure this entry is complete */
		if (EXTATTR_NEXT(eap) > eaend)
			break;
		if (eap->ea_namespace != nspace ||
		    eap->ea_namelength != length ||
		    bcmp(eap->ea_name, name, length))
			continue;
		if (eapp != NULL)
			*eapp = eap;
		if (eac != NULL)
			*eac = EXTATTR_CONTENT(eap);
		return (EXTATTR_CONTENT_SIZE(eap));
	}
	return(-1);
}

Not only is it considerably shorter, but it hopefully more readable :-)
2007-02-26 06:18:53 +00:00
kevlo
37471781c4 Remove unused header file <machine/katelib.h> 2007-02-26 05:17:47 +00:00
imp
35717fc9c0 mii_phy_dev_probe returns its third argument on match, not 0, so pass 0
in if we're going to test against 0.

Noticed by: marius@
2007-02-26 04:48:24 +00:00
delphij
8cc8bccf58 Close race conditions between fork() and [sg]etpriority()'s
PRIO_USER case, possibly also other places that deferences
p_ucred.

In the past, we insert a new process into the allproc list right
after PID allocation, and release the allproc_lock sx.  Because
most content in new proc's structure is not yet initialized,
this could lead to undefined result if we do not handle PRS_NEW
with care.

The problem with PRS_NEW state is that it does not provide fine
grained information about how much initialization is done for a
new process.  By defination, after PRIO_USER setpriority(), all
processes that belongs to given user should have their nice value
set to the specified value.  Therefore, if p_{start,end}copy
section was done for a PRS_NEW process, we can not safely ignore
it because p_nice is in this area.  On the other hand, we should
be careful on PRS_NEW processes because we do not allow non-root
users to lower their nice values, and without a successful copy
of the copy section, we can get stale values that is inherted
from the uninitialized area of the process structure.

This commit tries to close the race condition by grabbing proc
mutex *before* we release allproc_lock xlock, and do copy as
well as zero immediately after the allproc_lock xunlock.  This
guarantees that the new process would have its p_copy and p_zero
sections, as well as user credential informaion initialized.  In
getpriority() case, instead of grabbing PROC_LOCK for a PRS_NEW
process, we just skip the process in question, because it does
not affect the final result of the call, as the p_nice value
would be copied from its parent, and we will see it during
allproc traverse.

Other potential solutions are still under evaluation.

Discussed with:	davidxu, jhb, rwatson
PR:		kern/108071
MFC after:	2 weeks
2007-02-26 03:38:09 +00:00
kientzle
3de91924d2 Move _posix1e_acl_name_to_id out of acl_support.c and into
acl_from_text.c.  Since acl_from_text.c is the only place it
is used, we can now make this internal utility function "static."

As a bonus, acl_set_fd() no longer pulls in getpwuid() for no reason.

MFC after: 7 days
2007-02-26 02:07:02 +00:00
cognet
89842b0829 Define FLASHADDR and LOADERRAMADDR for the Avila, so that we can boot a
kernel from the onboard flash.
2007-02-26 02:04:24 +00:00
cognet
14a40fa621 Erm we can't change the value of arm_memcpy if we're running from flash.
Instead, make memcpy() check if we're running from flash, and avoid
using arm_memcpy if we're doing so.
2007-02-26 02:03:48 +00:00
mckusick
271ce58544 Implement the -h flag (set an ACL on a symbolic link).
Before this fix the -h flag was ignored (i.e. setfacl
always set the ACL on the file pointed to by the symbolic
link even when the -h flag requested that the ACL be set
on the symbolic link itself).
2007-02-26 00:42:17 +00:00
cognet
f4f08cb037 Update for the new prototype of bus_setup_intr(). 2007-02-25 22:17:54 +00:00
kientzle
76658c2a7d Don't assert() the TLS allocation requested is big enough; just
fix the argument.

In particular, this is a step towards breaking crt1's dependence on stdio.
2007-02-25 21:23:50 +00:00
ru
99b9ebd3eb Mark the vm_page_unmanage(9) manpage as obsolete.
Reminded by:	maxim
2007-02-25 17:34:16 +00:00
piso
363e58ec6b Catch up with bus_setup_intr() modification and garbage collect a
reference to INTR_FAST.
2007-02-25 15:04:08 +00:00
piso
97a9197f4f Catch up with bus_setup_intr() modification and garbage collect two
references to INTR_FAST.
2007-02-25 15:02:03 +00:00
piso
23a6e704c9 Garbage collect a reference to INTR_FAST. 2007-02-25 14:53:55 +00:00
piso
27f8ccc0d6 Fix attach of at91_pio() after bus_setup_intr() modification.
Reported and tested by: Krassimir Slavchev
2007-02-25 14:34:59 +00:00
bms
865396057e Update multicast(4) manual page to reflect the new reality of the code. 2007-02-25 14:31:41 +00:00
bms
2f11af3e83 Unlock a mutex which should be unlocked before returning.
MFC after:	1 week
2007-02-25 14:22:03 +00:00
netchild
b325c96ab1 semi-automatic style(9) 2007-02-25 13:51:52 +00:00
netchild
249ecc9078 MFp4 (110541):
Sync with rev 1.7 in NetBSD.

	Obtained from:	NetBSD
2007-02-25 12:43:07 +00:00
netchild
40358f3b01 MFp4 (110523, parts which apply cleanly):
semi-automatic style(9)

The futex stuff already differs a lot (only a small part does not differ)
from NetBSD, so we are already way off and can't apply changes from NetBSD
automatically. As we need to merge everything by hand already, we can even
make the files comply to our world order.
2007-02-25 12:40:35 +00:00
marius
1950dd41df Use uma_set_align(). 2007-02-25 10:52:47 +00:00
ru
1652cacd15 Remove the traces of vm_page_unmanage(). 2007-02-25 06:51:11 +00:00
ariff
eb8824b9ca Fix ALC883 microphone / recording issues. Setting high(er) VRef on
(external) microphone pin tend to screw it. Internal microphone (found
on several laptops) still need high VRef.

Tested by:	Pietro Cerutti <pietro.cerutti@gmail.com>
          	lenix <irc.freenode.net>
2007-02-25 06:17:56 +00:00