130381 Commits

Author SHA1 Message Date
Matt Jacob
a5e9bfc6f9 Add a man page. 2007-02-27 07:29:15 +00:00
Kirk McKusick
3ec818266f Fix an error in dumping large sparse files containing extended attributes. 2007-02-27 07:28:17 +00:00
Kip Macy
f183910b97 Further improvements to LOCK_PROFILING:
- Fix missing initialization in kern_rwlock.c causing bogus times to be collected
 - Move updates to the lock hash to after the lock is released for spin mutexes,
   sleep mutexes, and sx locks
 - Add new kernel build option LOCK_PROFILE_FAST - only update lock profiling
   statistics when an acquisition is contended. This reduces the overhead of
   LOCK_PROFILING to increasing system time by 20%-25% which on
   "make -j8 kernel-toolchain" on a dual woodcrest is unmeasurable in terms
   of wall-clock time. Contrast this to enabling lock profiling without
   LOCK_PROFILE_FAST and I see a 5x-6x slowdown in wall-clock time.
2007-02-27 06:42:05 +00:00
Greg Lehey
fc5fe41fe9 Update HISTORY.
Reviewed by:	dmr
2007-02-27 05:39:22 +00:00
Bruce Evans
b6c86f4b1e Fixed some style bugs (whitespace lossage for removal of __P(()), and
lots of naming and typing errors involving `interval').
2007-02-27 05:10:36 +00:00
Bruce Evans
93547b07b9 Use a periodic itimer instead of repeated calls to alarm() in
sidewaysintpr().  This increases the accuracy of the per-interval
counts when they are interpreted as rates.  Repeated calls to alarm(n)
give an average interval that is about 2 ticks larger than n and has
a large variance.  Periodic itimers normally get the average almost
right but have similarly large variance (due to scheduling delays).

Statistics utilities should use clock_gettime() to determine the
actual interval, but it is still useful to maximize the accuracy of
the interval, especially for cases like netstat -w where counts are
displayed so the program cannot hide the inaccuracy in a rate
conversion.
2007-02-27 04:54:33 +00:00
Matt Jacob
e770bc6bf5 First cut at GEOM based multipath. This is an active/passive{/passive...}
arrangement that has no intrinsic internal knowledge of whether devices
it is given are truly multipath devices. As such, this is a simplistic
approach, but still a useful one.

The basic approach is to (at present- this will change soon) use camcontrol
to find likely identical devices and and label the trailing sector of the
first one. This label contains both a full UUID and a name. The name is
what is presented in /dev/multipath, but the UUID is used as a true
distinguishor at g_taste time, thus making sure we don't have chaos
on a shared SAN where everyone names their data multipath as "Fred".

The first of N identical devices (and N *may* be 1!) becomes the active
path until a BIO request is failed with EIO or ENXIO. When this occurs,
the active disk is ripped away and the next in a list is picked to
(retry and) continue with.

During g_taste events new disks that meet the match criteria for existing
multipath geoms get added to the tail end of the list.

Thus, this active/passive setup actually does work for devices which
go away and come back, as do (now) mpt(4) and isp(4) SAN based disks.

There is still a lot to do to improve this- like about 5 of the 12
recommendations I've received about it,  but it's been functional enough
for a while that it deserves a broader test base.

Reviewed by: pjd
Sponsored by: IronPort Systems
MFC: 2 months
2007-02-27 04:01:58 +00:00
Greg Lehey
5af924bf6c Add warning about deadlocks created by use of wakeup_one. 2007-02-27 02:51:41 +00:00
Jung-uk Kim
6a5964d385 MFP4: 115094
Linux does not check file descriptor when MAP_ANONYMOUS is set.
This should fix recent LTP test regressions.

Reported by:	Scot Hetzel (swhetzel at gmail dot com)
		netchild
2007-02-27 02:08:01 +00:00
Pawel Jakub Dawidek
3ad48efa4a Replace spaces with tabs in some places. 2007-02-27 01:48:58 +00:00
Nate Lawson
ef2374f700 Rework EC I/O approach. Implement burst mode, including proper handling of
case where it asynchronously exits burst mode on its own.  Handle different
values of hz in sleep loop.  Provide more debugging options to tune EC
behavior.  These tunables/sysctls may be temporary and are not for user
access if the EC is working properly.  Burst mode is now on by default for
testing and the poll interval has been increased from 100 to 500 us and
total timeout from 100 to 500 ms.

Hopefully this should be the first step of addressing reports of timeout
errors during battery or thermal access, especially on HP/Compaq laptops.
It is reasonably stable and should not cause a loss of functionality or
performance on systems that were previously working.  Testing shows an
increase of responsiveness by ~75% on one system.

PR:		kern/98171
2007-02-27 00:14:20 +00:00
Mohan Srinivasan
7c72af8770 Reap FIN_WAIT_2 connections marked SOCANTRCVMORE faster. This mitigate
potential issues where the peer does not close, potentially leaving
thousands of connections in FIN_WAIT_2. This is controlled by a new sysctl
fast_finwait2_recycle, which is disabled by default.

Reviewed by: gnn, silby.
2007-02-26 22:25:21 +00:00
Jung-uk Kim
560a54e10c Add three new ioctl(2) commands for bpf(4).
- BIOCGDIRECTION and BIOCSDIRECTION get or set the setting determining
whether incoming, outgoing, or all packets on the interface should be
returned by BPF.  Set to BPF_D_IN to see only incoming packets on the
interface.  Set to BPF_D_INOUT to see packets originating locally and
remotely on the interface.  Set to BPF_D_OUT to see only outgoing
packets on the interface.  This setting is initialized to BPF_D_INOUT
by default.  BIOCGSEESENT and BIOCSSEESENT are obsoleted by these but
kept for backward compatibility.

- BIOCFEEDBACK sets packet feedback mode.  This allows injected packets
to be fed back as input to the interface when output via the interface is
successful.  When BPF_D_INOUT direction is set, injected outgoing packet
is not returned by BPF to avoid duplication.  This flag is initialized to
zero by default.

Note that libpcap has been modified to support BPF_D_OUT direction for
pcap_setdirection(3) and PCAP_D_OUT direction is functional now.

Reviewed by:	rwatson
2007-02-26 22:24:14 +00:00
Robert Watson
e7c33e29ed Revise locking strategy used for UNIX domain sockets in order to improve
concurrency:

- Add per-unpcb mutexes protecting unpcb connection state, fields, etc.

- Replace global UNP mutex with a global UNP rwlock, which will protect the
  UNIX domain socket connection topology, v_socket, and be acquired
  exclusively before acquiring more than per-unpcb at a time in order to
  avoid lock order issues.

In performance measurements involving MySQL, this change has little or no
overhead on UP (+/- 1%), but leads to a significant (5%-30%) improvement in
multi-processor measurements using the sysbench and supersmack benchmarks.

Much testing by:	kris
Approved by:		re (kensmith)
2007-02-26 20:47:52 +00:00
John Baldwin
c0e767f9dd Use NULL rather than 0 for various pointer constants. 2007-02-26 19:28:18 +00:00
Robert Watson
149b6b0b12 Add rw_wowned(9) symlink. 2007-02-26 19:09:36 +00:00
Robert Watson
dd348217d7 Update rwlock(9) for rw_wowned(). 2007-02-26 19:07:41 +00:00
Robert Watson
8525230afd Add rw_wowned() interface to rwlock(9), allowing a kernel thread to
determine if it holds an exclusive rwlock reference or not.  This is
non-ideal, but recursion scenarios in the network stack currently
require it.

Approved by:	jhb
2007-02-26 19:05:13 +00:00
John Baldwin
59800afcb5 Mark the kernel linker file as linked so that it is visible to the various
kld*() syscalls.

Tested by:	piso
2007-02-26 16:48:14 +00:00
John Baldwin
4a0f58d25b Fix a comment. 2007-02-26 16:36:48 +00:00
Bruce M Simpson
d305f4b99e Document m_pulldown().
Obtained from:	MBUF issues in 4.4BSD IPv6/IPsec support (itojun)
2007-02-26 15:17:19 +00:00
Randall Stewart
7c3768006d Fix include declaration it was sys/sctp.h should be netinet/sctp.h,
reported by pluknet@gmail.com.
2007-02-26 12:23:32 +00:00
Robert Watson
158ae6cdf2 Mark data structures used on the wire with IPX SAP as __packed so that
they are not inappropriately padded as a result of compiler changes.

PR:		kern/74105
Submitted by:	Bob Johnson <bob89 at eng dot ufl dot edu>
2007-02-26 12:07:08 +00:00
Robert Watson
c724ad6648 Build ipx_ip.c only if options IPXIP is defined. No functional change. 2007-02-26 11:55:34 +00:00
Ruslan Ermilov
fac61393b9 Don't block on the socket zone limit during the socket()
call which can easily lock up a system otherwise; instead,
return ENOBUFS as documented in a manpage, thus reverting
us to the FreeBSD 4.x behavior.

Reviewed by:	rwatson
MFC after:	2 weeks
2007-02-26 10:45:21 +00:00
Robert Watson
e89beb720d Fix a likely bug by adding what appears to be a missing break statement
in the IPX over IP configuration ioctl: when changing the flags on a
tunnel interface, return the generated error rather than always EINVAL.
2007-02-26 10:16:53 +00:00
Kip Macy
fe68a91631 general LOCK_PROFILING cleanup
- only collect timestamps when a lock is contested - this reduces the overhead
  of collecting profiles from 20x to 5x

- remove unused function from subr_lock.c

- generalize cnt_hold and cnt_lock statistics to be kept for all locks

- NOTE: rwlock profiling generates invalid statistics (and most likely always has)
  someone familiar with that should review
2007-02-26 08:26:44 +00:00
Kirk McKusick
772ad651bf Update the dump program to save extended attributes. Update
the restore program to restore all dumped extended attributes.

If the restore is running as root, it will always be able
to restore all extended attributes. If it is not running
as root, it makes a best effort to set them. Using the -v
command line flag or the `verbose' command in interactive
mode will display all the extended attributes being set on
files (and at the end on directories) that are being restored.
It will note any extended attributes that could not be set.

The extended attributes are placed on the dump image immediately
following each file's data. Older versions of restore can work
with the newer dump images. Old versions of restore will
correctly restore the file data and then (silently) skip
over the extended attribute data and proceed to the next file.

This resolves PR 93085 which will be closed once the code
has been MFC'ed.

Note that this code will not compile until these header
files have been updated: <protocols/dumprestore.h> and
<sys/extattr.h>.

PR:		bin/93085
Comments from:	Poul-Henning Kamp and Robert Watson
MFC after:	3 weeks
2007-02-26 08:15:56 +00:00
Kirk McKusick
fdece2b8f7 Declare a `struct extattr' that defines the format of an extended
attribute. Also define some macros to manipulate one of these
structures. Explain their use in the extattr.9 manual page.

The next step will be to make a sweep through the kernel replacing
the old pointer manipulation code. To get an idea of how they would
be used, the ffs_findextattr() function in ufs/ffs/ffs_vnops.c is
currently written as follows:

/*
 * Vnode operating to retrieve a named extended attribute.
 *
 * Locate a particular EA (nspace:name) in the area (ptr:length), and return
 * the length of the EA, and possibly the pointer to the entry and to the data.
 */
static int
ffs_findextattr(u_char *ptr, u_int length, int nspace, const char *name,
    u_char **eap, u_char **eac)
{
	u_char *p, *pe, *pn, *p0;
	int eapad1, eapad2, ealength, ealen, nlen;
	uint32_t ul;

	pe = ptr + length;
	nlen = strlen(name);

	for (p = ptr; p < pe; p = pn) {
		p0 = p;
		bcopy(p, &ul, sizeof(ul));
		pn = p + ul;
		/* make sure this entry is complete */
		if (pn > pe)
			break;
		p += sizeof(uint32_t);
		if (*p != nspace)
			continue;
		p++;
		eapad2 = *p++;
		if (*p != nlen)
			continue;
		p++;
		if (bcmp(p, name, nlen))
			continue;
		ealength = sizeof(uint32_t) + 3 + nlen;
		eapad1 = 8 - (ealength % 8);
		if (eapad1 == 8)
			eapad1 = 0;
		ealength += eapad1;
		ealen = ul - ealength - eapad2;
		p += nlen + eapad1;
		if (eap != NULL)
			*eap = p0;
		if (eac != NULL)
			*eac = p;
		return (ealen);
	}
	return(-1);
}

After applying the structure and macros, it would look like this:

/*
 * Vnode operating to retrieve a named extended attribute.
 *
 * Locate a particular EA (nspace:name) in the area (ptr:length), and return
 * the length of the EA, and possibly the pointer to the entry and to the data.
 */
static int
ffs_findextattr(u_char *ptr, u_int length, int nspace, const char *name,
    u_char **eapp, u_char **eac)
{
	struct extattr *eap, *eaend;

	eaend = (struct extattr *)(ptr + length);
	for (eap = (struct extattr *)ptr; eap < eaend; eap = EXTATTR_NEXT(eap)){
		/* make sure this entry is complete */
		if (EXTATTR_NEXT(eap) > eaend)
			break;
		if (eap->ea_namespace != nspace ||
		    eap->ea_namelength != length ||
		    bcmp(eap->ea_name, name, length))
			continue;
		if (eapp != NULL)
			*eapp = eap;
		if (eac != NULL)
			*eac = EXTATTR_CONTENT(eap);
		return (EXTATTR_CONTENT_SIZE(eap));
	}
	return(-1);
}

Not only is it considerably shorter, but it hopefully more readable :-)
2007-02-26 06:18:53 +00:00
Kevin Lo
82003b276a Remove unused header file <machine/katelib.h> 2007-02-26 05:17:47 +00:00
Warner Losh
3a5e8cefd3 mii_phy_dev_probe returns its third argument on match, not 0, so pass 0
in if we're going to test against 0.

Noticed by: marius@
2007-02-26 04:48:24 +00:00
Xin LI
1ad9ee8603 Close race conditions between fork() and [sg]etpriority()'s
PRIO_USER case, possibly also other places that deferences
p_ucred.

In the past, we insert a new process into the allproc list right
after PID allocation, and release the allproc_lock sx.  Because
most content in new proc's structure is not yet initialized,
this could lead to undefined result if we do not handle PRS_NEW
with care.

The problem with PRS_NEW state is that it does not provide fine
grained information about how much initialization is done for a
new process.  By defination, after PRIO_USER setpriority(), all
processes that belongs to given user should have their nice value
set to the specified value.  Therefore, if p_{start,end}copy
section was done for a PRS_NEW process, we can not safely ignore
it because p_nice is in this area.  On the other hand, we should
be careful on PRS_NEW processes because we do not allow non-root
users to lower their nice values, and without a successful copy
of the copy section, we can get stale values that is inherted
from the uninitialized area of the process structure.

This commit tries to close the race condition by grabbing proc
mutex *before* we release allproc_lock xlock, and do copy as
well as zero immediately after the allproc_lock xunlock.  This
guarantees that the new process would have its p_copy and p_zero
sections, as well as user credential informaion initialized.  In
getpriority() case, instead of grabbing PROC_LOCK for a PRS_NEW
process, we just skip the process in question, because it does
not affect the final result of the call, as the p_nice value
would be copied from its parent, and we will see it during
allproc traverse.

Other potential solutions are still under evaluation.

Discussed with:	davidxu, jhb, rwatson
PR:		kern/108071
MFC after:	2 weeks
2007-02-26 03:38:09 +00:00
Tim Kientzle
4813511138 Move _posix1e_acl_name_to_id out of acl_support.c and into
acl_from_text.c.  Since acl_from_text.c is the only place it
is used, we can now make this internal utility function "static."

As a bonus, acl_set_fd() no longer pulls in getpwuid() for no reason.

MFC after: 7 days
2007-02-26 02:07:02 +00:00
Olivier Houchard
0dcd646d88 Define FLASHADDR and LOADERRAMADDR for the Avila, so that we can boot a
kernel from the onboard flash.
2007-02-26 02:04:24 +00:00
Olivier Houchard
f96b290333 Erm we can't change the value of arm_memcpy if we're running from flash.
Instead, make memcpy() check if we're running from flash, and avoid
using arm_memcpy if we're doing so.
2007-02-26 02:03:48 +00:00
Kirk McKusick
b5ea8f4cbc Implement the -h flag (set an ACL on a symbolic link).
Before this fix the -h flag was ignored (i.e. setfacl
always set the ACL on the file pointed to by the symbolic
link even when the -h flag requested that the ACL be set
on the symbolic link itself).
2007-02-26 00:42:17 +00:00
Olivier Houchard
902222e77c Update for the new prototype of bus_setup_intr(). 2007-02-25 22:17:54 +00:00
Tim Kientzle
0031cdf4d7 Don't assert() the TLS allocation requested is big enough; just
fix the argument.

In particular, this is a step towards breaking crt1's dependence on stdio.
2007-02-25 21:23:50 +00:00
Ruslan Ermilov
b141b9655f Mark the vm_page_unmanage(9) manpage as obsolete.
Reminded by:	maxim
2007-02-25 17:34:16 +00:00
Paolo Pisati
83496acdc6 Catch up with bus_setup_intr() modification and garbage collect a
reference to INTR_FAST.
2007-02-25 15:04:08 +00:00
Paolo Pisati
3b1afd0f92 Catch up with bus_setup_intr() modification and garbage collect two
references to INTR_FAST.
2007-02-25 15:02:03 +00:00
Paolo Pisati
20c24107ca Garbage collect a reference to INTR_FAST. 2007-02-25 14:53:55 +00:00
Paolo Pisati
23b5f6dc07 Fix attach of at91_pio() after bus_setup_intr() modification.
Reported and tested by: Krassimir Slavchev
2007-02-25 14:34:59 +00:00
Bruce M Simpson
0770db8953 Update multicast(4) manual page to reflect the new reality of the code. 2007-02-25 14:31:41 +00:00
Bruce M Simpson
410052125e Unlock a mutex which should be unlocked before returning.
MFC after:	1 week
2007-02-25 14:22:03 +00:00
Alexander Leidinger
8f981688dd semi-automatic style(9) 2007-02-25 13:51:52 +00:00
Alexander Leidinger
8cf5ee2e2a MFp4 (110541):
Sync with rev 1.7 in NetBSD.

	Obtained from:	NetBSD
2007-02-25 12:43:07 +00:00
Alexander Leidinger
f9dac96185 MFp4 (110523, parts which apply cleanly):
semi-automatic style(9)

The futex stuff already differs a lot (only a small part does not differ)
from NetBSD, so we are already way off and can't apply changes from NetBSD
automatically. As we need to merge everything by hand already, we can even
make the files comply to our world order.
2007-02-25 12:40:35 +00:00
Marius Strobl
669a4d96e4 Use uma_set_align(). 2007-02-25 10:52:47 +00:00
Ruslan Ermilov
066eef7a1d Remove the traces of vm_page_unmanage(). 2007-02-25 06:51:11 +00:00