Commit Graph

62512 Commits

Author SHA1 Message Date
Matt Jacob
e770bc6bf5 First cut at GEOM based multipath. This is an active/passive{/passive...}
arrangement that has no intrinsic internal knowledge of whether devices
it is given are truly multipath devices. As such, this is a simplistic
approach, but still a useful one.

The basic approach is to (at present- this will change soon) use camcontrol
to find likely identical devices and and label the trailing sector of the
first one. This label contains both a full UUID and a name. The name is
what is presented in /dev/multipath, but the UUID is used as a true
distinguishor at g_taste time, thus making sure we don't have chaos
on a shared SAN where everyone names their data multipath as "Fred".

The first of N identical devices (and N *may* be 1!) becomes the active
path until a BIO request is failed with EIO or ENXIO. When this occurs,
the active disk is ripped away and the next in a list is picked to
(retry and) continue with.

During g_taste events new disks that meet the match criteria for existing
multipath geoms get added to the tail end of the list.

Thus, this active/passive setup actually does work for devices which
go away and come back, as do (now) mpt(4) and isp(4) SAN based disks.

There is still a lot to do to improve this- like about 5 of the 12
recommendations I've received about it,  but it's been functional enough
for a while that it deserves a broader test base.

Reviewed by: pjd
Sponsored by: IronPort Systems
MFC: 2 months
2007-02-27 04:01:58 +00:00
Jung-uk Kim
6a5964d385 MFP4: 115094
Linux does not check file descriptor when MAP_ANONYMOUS is set.
This should fix recent LTP test regressions.

Reported by:	Scot Hetzel (swhetzel at gmail dot com)
		netchild
2007-02-27 02:08:01 +00:00
Pawel Jakub Dawidek
3ad48efa4a Replace spaces with tabs in some places. 2007-02-27 01:48:58 +00:00
Nate Lawson
ef2374f700 Rework EC I/O approach. Implement burst mode, including proper handling of
case where it asynchronously exits burst mode on its own.  Handle different
values of hz in sleep loop.  Provide more debugging options to tune EC
behavior.  These tunables/sysctls may be temporary and are not for user
access if the EC is working properly.  Burst mode is now on by default for
testing and the poll interval has been increased from 100 to 500 us and
total timeout from 100 to 500 ms.

Hopefully this should be the first step of addressing reports of timeout
errors during battery or thermal access, especially on HP/Compaq laptops.
It is reasonably stable and should not cause a loss of functionality or
performance on systems that were previously working.  Testing shows an
increase of responsiveness by ~75% on one system.

PR:		kern/98171
2007-02-27 00:14:20 +00:00
Mohan Srinivasan
7c72af8770 Reap FIN_WAIT_2 connections marked SOCANTRCVMORE faster. This mitigate
potential issues where the peer does not close, potentially leaving
thousands of connections in FIN_WAIT_2. This is controlled by a new sysctl
fast_finwait2_recycle, which is disabled by default.

Reviewed by: gnn, silby.
2007-02-26 22:25:21 +00:00
Jung-uk Kim
560a54e10c Add three new ioctl(2) commands for bpf(4).
- BIOCGDIRECTION and BIOCSDIRECTION get or set the setting determining
whether incoming, outgoing, or all packets on the interface should be
returned by BPF.  Set to BPF_D_IN to see only incoming packets on the
interface.  Set to BPF_D_INOUT to see packets originating locally and
remotely on the interface.  Set to BPF_D_OUT to see only outgoing
packets on the interface.  This setting is initialized to BPF_D_INOUT
by default.  BIOCGSEESENT and BIOCSSEESENT are obsoleted by these but
kept for backward compatibility.

- BIOCFEEDBACK sets packet feedback mode.  This allows injected packets
to be fed back as input to the interface when output via the interface is
successful.  When BPF_D_INOUT direction is set, injected outgoing packet
is not returned by BPF to avoid duplication.  This flag is initialized to
zero by default.

Note that libpcap has been modified to support BPF_D_OUT direction for
pcap_setdirection(3) and PCAP_D_OUT direction is functional now.

Reviewed by:	rwatson
2007-02-26 22:24:14 +00:00
Robert Watson
e7c33e29ed Revise locking strategy used for UNIX domain sockets in order to improve
concurrency:

- Add per-unpcb mutexes protecting unpcb connection state, fields, etc.

- Replace global UNP mutex with a global UNP rwlock, which will protect the
  UNIX domain socket connection topology, v_socket, and be acquired
  exclusively before acquiring more than per-unpcb at a time in order to
  avoid lock order issues.

In performance measurements involving MySQL, this change has little or no
overhead on UP (+/- 1%), but leads to a significant (5%-30%) improvement in
multi-processor measurements using the sysbench and supersmack benchmarks.

Much testing by:	kris
Approved by:		re (kensmith)
2007-02-26 20:47:52 +00:00
John Baldwin
c0e767f9dd Use NULL rather than 0 for various pointer constants. 2007-02-26 19:28:18 +00:00
Robert Watson
8525230afd Add rw_wowned() interface to rwlock(9), allowing a kernel thread to
determine if it holds an exclusive rwlock reference or not.  This is
non-ideal, but recursion scenarios in the network stack currently
require it.

Approved by:	jhb
2007-02-26 19:05:13 +00:00
John Baldwin
59800afcb5 Mark the kernel linker file as linked so that it is visible to the various
kld*() syscalls.

Tested by:	piso
2007-02-26 16:48:14 +00:00
John Baldwin
4a0f58d25b Fix a comment. 2007-02-26 16:36:48 +00:00
Robert Watson
c724ad6648 Build ipx_ip.c only if options IPXIP is defined. No functional change. 2007-02-26 11:55:34 +00:00
Ruslan Ermilov
fac61393b9 Don't block on the socket zone limit during the socket()
call which can easily lock up a system otherwise; instead,
return ENOBUFS as documented in a manpage, thus reverting
us to the FreeBSD 4.x behavior.

Reviewed by:	rwatson
MFC after:	2 weeks
2007-02-26 10:45:21 +00:00
Robert Watson
e89beb720d Fix a likely bug by adding what appears to be a missing break statement
in the IPX over IP configuration ioctl: when changing the flags on a
tunnel interface, return the generated error rather than always EINVAL.
2007-02-26 10:16:53 +00:00
Kip Macy
fe68a91631 general LOCK_PROFILING cleanup
- only collect timestamps when a lock is contested - this reduces the overhead
  of collecting profiles from 20x to 5x

- remove unused function from subr_lock.c

- generalize cnt_hold and cnt_lock statistics to be kept for all locks

- NOTE: rwlock profiling generates invalid statistics (and most likely always has)
  someone familiar with that should review
2007-02-26 08:26:44 +00:00
Kirk McKusick
fdece2b8f7 Declare a `struct extattr' that defines the format of an extended
attribute. Also define some macros to manipulate one of these
structures. Explain their use in the extattr.9 manual page.

The next step will be to make a sweep through the kernel replacing
the old pointer manipulation code. To get an idea of how they would
be used, the ffs_findextattr() function in ufs/ffs/ffs_vnops.c is
currently written as follows:

/*
 * Vnode operating to retrieve a named extended attribute.
 *
 * Locate a particular EA (nspace:name) in the area (ptr:length), and return
 * the length of the EA, and possibly the pointer to the entry and to the data.
 */
static int
ffs_findextattr(u_char *ptr, u_int length, int nspace, const char *name,
    u_char **eap, u_char **eac)
{
	u_char *p, *pe, *pn, *p0;
	int eapad1, eapad2, ealength, ealen, nlen;
	uint32_t ul;

	pe = ptr + length;
	nlen = strlen(name);

	for (p = ptr; p < pe; p = pn) {
		p0 = p;
		bcopy(p, &ul, sizeof(ul));
		pn = p + ul;
		/* make sure this entry is complete */
		if (pn > pe)
			break;
		p += sizeof(uint32_t);
		if (*p != nspace)
			continue;
		p++;
		eapad2 = *p++;
		if (*p != nlen)
			continue;
		p++;
		if (bcmp(p, name, nlen))
			continue;
		ealength = sizeof(uint32_t) + 3 + nlen;
		eapad1 = 8 - (ealength % 8);
		if (eapad1 == 8)
			eapad1 = 0;
		ealength += eapad1;
		ealen = ul - ealength - eapad2;
		p += nlen + eapad1;
		if (eap != NULL)
			*eap = p0;
		if (eac != NULL)
			*eac = p;
		return (ealen);
	}
	return(-1);
}

After applying the structure and macros, it would look like this:

/*
 * Vnode operating to retrieve a named extended attribute.
 *
 * Locate a particular EA (nspace:name) in the area (ptr:length), and return
 * the length of the EA, and possibly the pointer to the entry and to the data.
 */
static int
ffs_findextattr(u_char *ptr, u_int length, int nspace, const char *name,
    u_char **eapp, u_char **eac)
{
	struct extattr *eap, *eaend;

	eaend = (struct extattr *)(ptr + length);
	for (eap = (struct extattr *)ptr; eap < eaend; eap = EXTATTR_NEXT(eap)){
		/* make sure this entry is complete */
		if (EXTATTR_NEXT(eap) > eaend)
			break;
		if (eap->ea_namespace != nspace ||
		    eap->ea_namelength != length ||
		    bcmp(eap->ea_name, name, length))
			continue;
		if (eapp != NULL)
			*eapp = eap;
		if (eac != NULL)
			*eac = EXTATTR_CONTENT(eap);
		return (EXTATTR_CONTENT_SIZE(eap));
	}
	return(-1);
}

Not only is it considerably shorter, but it hopefully more readable :-)
2007-02-26 06:18:53 +00:00
Kevin Lo
82003b276a Remove unused header file <machine/katelib.h> 2007-02-26 05:17:47 +00:00
Warner Losh
3a5e8cefd3 mii_phy_dev_probe returns its third argument on match, not 0, so pass 0
in if we're going to test against 0.

Noticed by: marius@
2007-02-26 04:48:24 +00:00
Xin LI
1ad9ee8603 Close race conditions between fork() and [sg]etpriority()'s
PRIO_USER case, possibly also other places that deferences
p_ucred.

In the past, we insert a new process into the allproc list right
after PID allocation, and release the allproc_lock sx.  Because
most content in new proc's structure is not yet initialized,
this could lead to undefined result if we do not handle PRS_NEW
with care.

The problem with PRS_NEW state is that it does not provide fine
grained information about how much initialization is done for a
new process.  By defination, after PRIO_USER setpriority(), all
processes that belongs to given user should have their nice value
set to the specified value.  Therefore, if p_{start,end}copy
section was done for a PRS_NEW process, we can not safely ignore
it because p_nice is in this area.  On the other hand, we should
be careful on PRS_NEW processes because we do not allow non-root
users to lower their nice values, and without a successful copy
of the copy section, we can get stale values that is inherted
from the uninitialized area of the process structure.

This commit tries to close the race condition by grabbing proc
mutex *before* we release allproc_lock xlock, and do copy as
well as zero immediately after the allproc_lock xunlock.  This
guarantees that the new process would have its p_copy and p_zero
sections, as well as user credential informaion initialized.  In
getpriority() case, instead of grabbing PROC_LOCK for a PRS_NEW
process, we just skip the process in question, because it does
not affect the final result of the call, as the p_nice value
would be copied from its parent, and we will see it during
allproc traverse.

Other potential solutions are still under evaluation.

Discussed with:	davidxu, jhb, rwatson
PR:		kern/108071
MFC after:	2 weeks
2007-02-26 03:38:09 +00:00
Olivier Houchard
0dcd646d88 Define FLASHADDR and LOADERRAMADDR for the Avila, so that we can boot a
kernel from the onboard flash.
2007-02-26 02:04:24 +00:00
Olivier Houchard
f96b290333 Erm we can't change the value of arm_memcpy if we're running from flash.
Instead, make memcpy() check if we're running from flash, and avoid
using arm_memcpy if we're doing so.
2007-02-26 02:03:48 +00:00
Olivier Houchard
902222e77c Update for the new prototype of bus_setup_intr(). 2007-02-25 22:17:54 +00:00
Paolo Pisati
83496acdc6 Catch up with bus_setup_intr() modification and garbage collect a
reference to INTR_FAST.
2007-02-25 15:04:08 +00:00
Paolo Pisati
3b1afd0f92 Catch up with bus_setup_intr() modification and garbage collect two
references to INTR_FAST.
2007-02-25 15:02:03 +00:00
Paolo Pisati
20c24107ca Garbage collect a reference to INTR_FAST. 2007-02-25 14:53:55 +00:00
Paolo Pisati
23b5f6dc07 Fix attach of at91_pio() after bus_setup_intr() modification.
Reported and tested by: Krassimir Slavchev
2007-02-25 14:34:59 +00:00
Bruce M Simpson
410052125e Unlock a mutex which should be unlocked before returning.
MFC after:	1 week
2007-02-25 14:22:03 +00:00
Alexander Leidinger
8f981688dd semi-automatic style(9) 2007-02-25 13:51:52 +00:00
Alexander Leidinger
8cf5ee2e2a MFp4 (110541):
Sync with rev 1.7 in NetBSD.

	Obtained from:	NetBSD
2007-02-25 12:43:07 +00:00
Alexander Leidinger
f9dac96185 MFp4 (110523, parts which apply cleanly):
semi-automatic style(9)

The futex stuff already differs a lot (only a small part does not differ)
from NetBSD, so we are already way off and can't apply changes from NetBSD
automatically. As we need to merge everything by hand already, we can even
make the files comply to our world order.
2007-02-25 12:40:35 +00:00
Marius Strobl
669a4d96e4 Use uma_set_align(). 2007-02-25 10:52:47 +00:00
Ariff Abdullah
b88d63a8ea Fix ALC883 microphone / recording issues. Setting high(er) VRef on
(external) microphone pin tend to screw it. Internal microphone (found
on several laptops) still need high VRef.

Tested by:	Pietro Cerutti <pietro.cerutti@gmail.com>
          	lenix <irc.freenode.net>
2007-02-25 06:17:56 +00:00
Alan Cox
9f5c801b94 Change the way that unmanaged pages are created. Specifically,
immediately flag any page that is allocated to a OBJT_PHYS object as
unmanaged in vm_page_alloc() rather than waiting for a later call to
vm_page_unmanage().  This allows for the elimination of some uses of
the page queues lock.

Change the type of the kernel and kmem objects from OBJT_DEFAULT to
OBJT_PHYS.  This allows us to take advantage of the above change to
simplify the allocation of unmanaged pages in kmem_alloc() and
kmem_malloc().

Remove vm_page_unmanage().  It is no longer used.
2007-02-25 06:14:58 +00:00
Robert Watson
dd0cb30f89 Further style(9) for ipx_ip. 2007-02-25 00:17:56 +00:00
Robert Watson
ca642fe976 Improve ipx_ip.c's approximation of style(9). 2007-02-25 00:10:59 +00:00
Sam Leffler
16d84e0140 don't call ath_reset when processing sysctl's before the device
is marked running; we don't have all the needed state in place

Noticed by:	Hugo Silva <hugo@barafranca.com>
MFC after:	1 week
2007-02-24 23:23:29 +00:00
Sam Leffler
8debcae44f set the antenna switch when fixing the tx antenna using the
dev.ath.X.txantenna sysctl; this is typically what folks
want but beware this has the side effect of disabling rx
diversity

MFC after:	2 weeks
2007-02-24 23:12:58 +00:00
Bruce M Simpson
1291e2a0eb Fix tinderbox. ip6_mrouter should be defined in raw_ip6.c as it is
tested to determine if the userland socket is open; this, in turn, is
used to determine if the module has been loaded.

Tested with:	LINT
2007-02-24 21:09:35 +00:00
Paolo Pisati
531db8b139 Updated ia64 isa support with the new bus_setup_intr() syntax.
Approved by: re (implicit?)
2007-02-24 16:56:22 +00:00
Alexander Leidinger
802e08a360 Partial MFp4 of 114977:
Whitespace commit: Fix grammar, spelling and punctuation.

Submitted by:	"Scot Hetzel" <swhetzel@gmail.com>
2007-02-24 16:49:25 +00:00
Warner Losh
e2ae5821fe exca->pccarddev should always be non-null now. Only call it when the
device is actually attached.
2007-02-24 15:56:06 +00:00
Xin LI
c55e033cca Convert sis(4) to use its own watchdog procedure.
Submitted by:	Florian C. Smeets <flo kasimir com>
2007-02-24 14:27:36 +00:00
Bruce M Simpson
6be2e366d6 Make IPv6 multicast forwarding dynamically loadable from a GENERIC kernel.
It is built in the same module as IPv4 multicast forwarding, i.e. ip_mroute.ko,
if and only if IPv6 support is enabled for loadable modules.
Export IPv6 forwarding structs to userland netstat(1) via sysctl(9).
2007-02-24 11:38:47 +00:00
Paolo Pisati
ceca3a3873 o break newbus api: add a new argument of type driver_filter_t to
bus_setup_intr()

o add an int return code to all fast handlers

o retire INTR_FAST/IH_FAST

For more info: http://docs.freebsd.org/cgi/getmsg.cgi?fetch=465712+0+current/freebsd-current

Approved by: re (implicit?)
2007-02-24 02:28:07 +00:00
Matt Jacob
5f53837698 Redo previous newbus related change to be kinder to
multi-release support.
2007-02-23 23:13:46 +00:00
John Baldwin
6e50e38fcc Use tsleep() rather than msleep() with a NULL mtx parameter. 2007-02-23 23:06:10 +00:00
John Baldwin
4a50bbc03e Whitespace fix. 2007-02-23 23:05:31 +00:00
Scott Long
04f0ce213f Fix a case in rman_manage_region() where the resource list would get missorted.
This would in turn confuse rman_reserve_resource().  This was only seen for
MSI resources that can get allocated and deallocated after boot.
2007-02-23 22:53:56 +00:00
Alexander Leidinger
1a26db0a3a MFp4 (114193 (i386 part), 114194, 114195, 114200):
- Dont "return" in linux_clone() after we forked the new process in a case
   of problems.
 - Move the copyout of p2->p_pid outside the emul_lock coverage in
   linux_clone().
 - Cache the em->pdeath_signal in a local variable and move the copyout
   out of the emul_lock coverage.
 - Move the free() out of the emul_shared_lock coverage in a preparation
   to switch emul_lock to non-sleepable lock (mutex).

Submitted by:	rdivacky
2007-02-23 22:39:26 +00:00
Alexander Leidinger
e8b8b834b4 MFp4 (part of 114132):
- Fix a LOR caused by holding emul_lock and proctree_lock at once.

Submitted by:	rdivacky
2007-02-23 22:29:24 +00:00