Commit Graph

66727 Commits

Author SHA1 Message Date
kmacy
fb74f62b24 Insulate inpcb consumers outside the stack from the lock type and offset within the pcb by adding accessor functions.
Reviewed by: rwatson
MFC after: 3 weeks
2008-03-23 22:34:16 +00:00
alc
e702727e2c To date, we have assumed that the TLB will only set the PG_M bit in a
PTE if that PTE has the PG_RW bit set.  However, this assumption does
not hold on recent processors from Intel.  For example, consider a PTE
that has the PG_RW bit set but the PG_M bit clear.  Suppose this PTE
is cached in the TLB and later the PG_RW bit is cleared in the PTE,
but the corresponding TLB entry is not (yet) invalidated.
Historically, upon a write access using this (stale) TLB entry, the
TLB would observe that the PG_RW bit had been cleared and initiate a
page fault, aborting the setting of the PG_M bit in the PTE.  Now,
however, P4- and Core2-family processors will set the PG_M bit before
observing that the PG_RW bit is clear and initiating a page fault.  In
other words, the write does not occur but the PG_M bit is still set.

The real impact of this difference is not that great.  Specifically,
we should no longer assert that any PTE with the PG_M bit set must
also have the PG_RW bit set, and we should ignore the state of the
PG_M bit unless the PG_RW bit is set.  However, these changes enable
me to remove a work-around from pmap_promote_pde(), the superpage
promotion procedure.

(Note: The AMD processors that we have tested, including the latest,
the Phenom, still exhibit the historical behavior.)

Acknowledgments: After I observed the problem, Stephan (ups) was
instrumental in characterizing the exact behavior of Intel's recent
TLBs.

Tested by: Peter Holm
2008-03-23 20:38:01 +00:00
kib
5ddf5664cc Yield the cpu in the kernel while iterating the list of the
vnodes belonging to the mountpoint. Also, yield when in the
softdep_process_worklist() even when we are not going to sleep due to
buffer drain.

It is believed that the ULE fixed the problem [1], but the yielding
seems to be needed at least for the 4BSD case.

Discussed:	on stable@, with bde
Reviewed by:	tegge, jeff [1]
MFC after:	2 weeks
2008-03-23 13:45:24 +00:00
kib
53a15ee1ea Prevent the overflow in the calculation of the next page directory.
The overflow causes the wraparound with consequent corruption of the
(almost) whole address space mapping.

As Alan noted, pmap_copy() does not require the wrap-around checks
because it cannot be applied to the kernel's pmap. The checks there are
included for consistency.

Reported and tested by:	kris (i386/pmap.c:pmap_remove() part)
Reviewed by:	alc
MFC after:	1 week
2008-03-23 07:07:27 +00:00
yongari
fcd39263e4 MSI handling on some RealTek chips are broken so disable it by
default.

Reported by:	Giulio Ferro ( auryn AT zirakzigil DOT org )
Tested by:	Giulio Ferro ( auryn AT zirakzigil DOT org )
2008-03-23 05:35:18 +00:00
yongari
031ecde733 For MSI capable hardwares, enable MSI enable bit in RL_CFG2
register.  If MSI was disabled by hw.re.msi_disable tunable
expliclty clear the MSI enable bit.
2008-03-23 05:31:35 +00:00
yongari
7fea7ba914 Some RealTek chips are known to be buggy on DAC handling, so
disable DAC by default.
2008-03-23 05:13:45 +00:00
yongari
00b0cf0b1a VLAN hardware tag information should be set for all desciptors of a
multi-descriptor transmission attempt. Datasheet said nothing about
this requirements. This should fix a long-standing VLAN hardware
tagging issues with re(4).

Reported by:	Giulio Ferro ( auryn AT zirakzigil DOT org )
Tested by:	Giulio Ferro ( auryn AT zirakzigil DOT org )
2008-03-23 05:06:16 +00:00
yongari
fd413d352f Always honor configured VLAN/checksum offload capabilities.
Previously re(4) used to blindly enable VLAN hardware tag stripping
and Rx checksum offload regardless of enabled optional features of
interface.
2008-03-23 04:59:13 +00:00
davidxu
c32a483ae9 Remove commented out code, thread suspension is done in thread library. 2008-03-23 02:03:06 +00:00
jeff
8103d042fb - Only return 1 from sync_vnode() in cases where the vnode is still
at the head of the sync list.  This prevents sched_sync() from
   re-queueing a vnode which may have been freed already.

Discussed with:	kib
2008-03-23 01:44:28 +00:00
marcel
124e0025d3 Instead of making a single geom_part.ko module, make a module
for each partitioning scheme. The gpart code is currently non-
optional.
2008-03-23 01:42:47 +00:00
jeff
73b6a5597c - Pass BO_MTX(bo) to lockmgr in vtruncbuf, we don't own the vnode
interlock here anymore.

Reported by:	kris
2008-03-23 01:42:19 +00:00
marcel
c184f6ced2 Redefine G_PART_SCHEME_DECLARE() from populating a private linker set
to declaring a proper module. The module event handler is part of the
gpart core and will add the scheme to an internal list on module load
and will remove the scheme from the internal list on module unload.
This makes it possible to dynamically load and unload partitioning
schemes.
2008-03-23 01:31:59 +00:00
marcel
31a163ef06 Add g_retaste(), which given a class will present all non-open providers
to it for tasting. This is useful when the class, through means outside
the scope of GEOM, can claim providers previously unclaimed.

The g_retaste() function posts an event which is handled by the
g_retaste_event().

Event suggested by: phk
2008-03-23 01:23:35 +00:00
cognet
4d5f668fc2 We need to prototype _start() as well, as we use it to test if we're running
from flash or from RAM.

Reported by:	imp
MFC After:	3 days
2008-03-22 20:34:07 +00:00
qingli
4471734ac4 Reuse the mbuf that was just retrieved from the receive ring if mbuf
exhaustion is encountered. There was a fix made previously for this
problem but the solution (breaking out of the receive loop) does not
seem to work. mbuf reuse strategy is already adopted by other drivers
such as if_bge.  The problem was recreated and the patch is also
verified in the same test environment.
2008-03-22 18:13:39 +00:00
sam
5996854133 add hints to specify how NPE ports are mapped to MAC+PHY; these
could be commented out as they just duplicate the defaults that
are built into the code

Reviewed by:	imp
MFC after:	1 week
2008-03-22 16:55:51 +00:00
sam
85a6e3f5ef Improve mac+phy configuration so that hints can be used to describe
layouts different than the defaults:
o hint.npe.0.mac="A", "B", etc. specifies the window for MAC register accesses
o hint.npe.0.mii="A", "B", etc. specifies PHY registers
o hint.npe.1.phy=%d specifies the PHY to map to a port

This allows devices like NSLU to be setup w/o code changes and will
also be used for forthcoming support for more Avila boards.

Reviewed by:	imp
MFC after	1 week
2008-03-22 16:53:28 +00:00
phk
5a1f4173f5 In abort2(2): Accept a NULL arg pointer if nargs == 0 2008-03-22 16:32:52 +00:00
sam
9eb2a09a7e (finally) add the hal status to the diagnostic generated after
a failed ath_hal_reset call

MFC after:	3 days
2008-03-22 16:27:47 +00:00
jeff
a9d123c3ab - Complete part of the unfinished bufobj work by consistently using
BO_LOCK/UNLOCK/MTX when manipulating the bufobj.
 - Create a new lock in the bufobj to lock bufobj fields independently.
   This leaves the vnode interlock as an 'identity' lock while the bufobj
   is an io lock.  The bufobj lock is ordered before the vnode interlock
   and also before the mnt ilock.
 - Exploit this new lock order to simplify softdep_check_suspend().
 - A few sync related functions are marked with a new XXX to note that
   we may not properly interlock against a non-zero bv_cnt when
   attempting to sync all vnodes on a mountlist.  I do not believe this
   race is important.  If I'm wrong this will make these locations easier
   to find.

Reviewed by:	kib (earlier diff)
Tested by:	kris, pho (earlier diff)
2008-03-22 09:15:16 +00:00
alfred
b283b3e59a Fix a race where timeout/untimeout could cause crashes for Giant locked
code.

The bug:

There exists a race condition for timeout/untimeout(9) due to the
way that the softclock thread dequeues timeouts.

The softclock thread sets the c_func and c_arg of the callout to
NULL while holding the callout lock but not Giant.  It then drops
the callout lock and acquires Giant.

It is at this point where untimeout(9) on another cpu/thread could
be called.

Since c_arg and c_func are cleared, untimeout(9) does not touch the
callout and returns as if the callout is canceled.

The softclock then tries to acquire Giant and likely blocks due to
the other cpu/thread holding it.

The other cpu/thread then likely deallocates the backing store that
c_arg points to and finishes working and hence drops Giant.

Softclock resumes and acquires giant and calls the function with
the now free'd c_arg and we have corruption/crash.

The fix:

We need to track curr_callout even for timeout(9) (LOCAL_ALLOC)
callouts.  We need to free the callout after the softclock processes
it to deal with the race here.

Obtained from: Juniper Networks, iedowse
Reviewed by: jhb, iedowse
MFC After: 2 weeks.
2008-03-22 07:29:45 +00:00
ambrisko
085cbdfe5d Add in a compat. mode so you can either open the card's device
node or directly open mfi0 and specify the card you want to talk to
in the ioctl.
2008-03-22 02:57:49 +00:00
bz
418e4a564c Add ';' missed with the SYSINIT changes.
Not noticed by tb as TCP_SIGNATURE is not in LINT.

MFC after:	1 month
2008-03-21 18:31:42 +00:00
remko
29b5baab7c Add the i915 GME device to DRM.
PR:		kern/121808
Submitted by:	Volker Werth <volker at vwsoft dot com>
Approved by:	imp (mentor, implicit for trivial changes)
MFC after:	3 days
2008-03-21 16:38:42 +00:00
kib
bc4bc893dd Reduce contention on the vnode interlock by not acquiring the BO_LOCK
around the check for the BV_BKGRDINPROG in the brelse() and bqrelse().
See the comment for the explanation why it is safe.

Tested by:	pho
Submitted by:	jeff
2008-03-21 12:38:44 +00:00
kib
04661caa35 Reduce the acquisition of the vnode interlock in the ffs_read() and
ffs_extread() when setting the IN_ACCESS flag by checking whether the
IN_ACCESS is already set. The possible race there is admissible.

Tested by:	pho
Submitted by:	jeff
2008-03-21 12:33:00 +00:00
jeff
72142b2fae - Reduce contention on the global bdonelock and bpinlock by using
a pool mutex to protect these sleep/wakeup/counter races.  This
   still is preferable to bloating each bio with a mtx.
2008-03-21 10:00:05 +00:00
jeff
ba540b27d6 - Add a new td flag TDF_NEEDSUSPCHK that is set whenever a thread needs
to enter thread_suspend_check().
 - Set TDF_ASTPENDING along with TDF_NEEDSUSPCHK so we can move the
   thread_suspend_check() to ast() rather than userret().
 - Check TDF_NEEDSUSPCHK in the sleepq_catch_signals() optimization so
   that we don't miss a suspend request.  If this is set use the
   expensive signal path.
 - Set NEEDSUSPCHK when creating a new thread in thr in case the
   creating thread is due to be suspended as well but has not yet.

Reviewed by:	davidxu (Authored original patch)
2008-03-21 08:23:25 +00:00
jhb
fbea3b6403 Explicitly use spinlock_enter/exit rather than locking the icu_lock spin
lock in the 8259A drivers as these drivers are only used on UP systems.
This slightly reduces the penalty of an SMP kernel (such as GENERIC) on
a UP x86 machine.
2008-03-20 21:53:27 +00:00
jhb
6cf6d7b22b Implement a BUS_BIND_INTR() method in the bus interface to bind an IRQ
resource to a CPU.  The default method is to pass the request up to the
parent similar to BUS_CONFIG_INTR() so that all busses don't have to
explicitly implement bus_bind_intr.  A bus_bind_intr(9) wrapper routine
similar to bus_setup/teardown_intr() is added for device drivers to use.
Unbinding an interrupt is done by binding it to NOCPU.  The IRQ resource
must be allocated, but it can happen in any order with respect to
bus_setup_intr().  Currently it is only supported on amd64 and i386 via
nexus(4) methods that simply call the intr_bind() routine.

Tested by:	gallatin
2008-03-20 21:24:32 +00:00
sos
f997b9d36a Unbreak the last commit.
Changes from the PM WIP sneaked in and caused compile errors.
2008-03-20 21:21:31 +00:00
kmacy
db590514fa pay attention to default cluster limits when sizing receive queues 2008-03-20 20:52:37 +00:00
emaste
ae058e4be5 Restore creation of passthrough devices with newer controller firmware by
putting the correct size in the fib header.  Presumably the older firmware
silently ignored a bad size field.

(This change tested with a 3805 controller.  Passthrough devices were
created when running firmware build 12814, but not 15323 or later.  With
this change they're created for both old and new firmware versions.)

Submitted by:	Adaptec
2008-03-20 20:33:48 +00:00
emaste
2d11776afc Add ioctls FSACTL_SEND_LARGE_FIB, FSACTL_SEND_RAW_SRB,
FSACTL_LNX_SEND_LARGE_FIB, and FSACTL_LNX_SEND_RAW_SRB, and correct size
checks on FIBs passed in from userspace.  Both changes were obtained from
Adaptec's driver build 15317.  Adaptec's commandline RAID tool arcconf uses
these ioctls when creating a RAID-10 array (and probably other operations
too).
2008-03-20 17:59:19 +00:00
sam
0b0672cdd0 add usb devices and more wlan stuff now that usb is functional
MFC after:	1 month
2008-03-20 17:44:58 +00:00
rdivacky
4a8a8b1c08 o Add stub support for some new futex operations,
so the annoying message is not printed.

	o	Don't warn about FUTEX_FD not being implemented
		and return ENOSYS instead of 0 (eg. success).

	o	Clear FUTEX_PRIVATE_FLAG as we actually implement
		only private futexes so there is no reason to
		return ENOSYS when app asks for a private futex.
		We don't reject shared futexes because they worked
		just fine with our implementation so far.

Approved by:	kib (mentor)
Tested by:	bsam
MFC after:	1 week
2008-03-20 17:03:55 +00:00
sam
1906c8de60 Workaround design botch in usb: blindly mixing bus_dma with PIO does not
work on architectures with a write-back cache as the PIO writes end up
in the cache which the sync(BUS_DMASYNC_POSTREAD) in usb_transfer_complete
then discards; compensate in the xfer methods that do PIO by pushing the
writes out of the cache before usb_transfer_complete is called.

This fixes USB on xscale and likely other places.

Sponsored by:	hobnob
Reviewed by:	cognet, imp
MFC after:	1 month
2008-03-20 16:19:25 +00:00
kib
de73f6b678 Do not dereference cdev->si_cdevsw, use the dev_refthread() to properly
obtain the reference. In particular, this fixes the panic reported in
the PR. Remove the comments stating that this needs to be done.

PR:	kern/119422
MFC after:	1 week
2008-03-20 16:08:42 +00:00
sam
c8f58c14c0 Correct cache handling for xfer requests marked URQ_REQUEST: many (if not
all uses) involve a read but usbd_start_transfer only does a PREWRITE; change
this to BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE as I'm not sure if any
users do write+read.

Reviewed by:	cognet, imp
MFC after:	1 month
2008-03-20 16:04:13 +00:00
sam
70faf3fbc1 map device 5; the optional USB controller on Gateworks 2348 boards
shows up here instead of the minipci slot at J4

Reviewed by:	cognet, imp
MFC after:	1 week
2008-03-20 15:54:19 +00:00
kib
28174d9ffb Fix the leak of the vmspace on the fork when the process limits
are exceeded.

Pointy hat to:	me
MFC after:	3 days
2008-03-20 15:24:49 +00:00
sos
c69ad0290e Fix Problem with Intel Matrix RAID.
Fix from PR/121899.
2008-03-20 11:54:26 +00:00
kmacy
56b72c6a35 back out last change as Sam believes that it breaks multicast - need to revisit after following up with pyun 2008-03-20 06:19:34 +00:00
jeff
a3f8e0c20d - Restore runq to manipulating threads directly by putting runq links and
rqindex back in struct thread.
 - Compile kern_switch.c independently again and stop #include'ing it from
   schedulers.
 - Remove the ts_thread backpointers and convert most code to go from
   struct thread to struct td_sched.
 - Cleanup the ts_flags #define garbage that was causing us to sometimes
   do things that expanded to td->td_sched->ts_thread->td_flags in 4BSD.
 - Export the kern.sched sysctl node in sysctl.h
2008-03-20 05:51:16 +00:00
kmacy
8b4fc7299f Don't re-initialize the interface if it is already running.
This one line change makes the following code found in many ethernet device drivers
(at least em, igb, ixgbe, and cxgb) gratuitous

	case SIOCSIFADDR:
		if (ifa->ifa_addr->sa_family == AF_INET) {
			/*
			 * XXX
			 * Since resetting hardware takes a very long time
			 * and results in link renegotiation we only
			 * initialize the hardware only when it is absolutely
			 * required.
			 */
			ifp->if_flags |= IFF_UP;
			if (!(ifp->if_drv_flags & IFF_DRV_RUNNING)) {
				EM_CORE_LOCK(adapter);
				em_init_locked(adapter);
				EM_CORE_UNLOCK(adapter);
			}
			arp_ifinit(ifp, ifa);
		} else
			error = ether_ioctl(ifp, command, data);
		break;
2008-03-20 05:35:02 +00:00
kevlo
931fe00266 - Add the Corega CG-WLUSB2GL from NetBSD
- Add the Corega CG-WLUSB2GPX
2008-03-20 05:05:37 +00:00
bland
5671837ad2 Improve VT_WAITACTIVE semantics.
- Wait for requested vty activation regardless its open state.
- Remove redundant console cleanup.

Approved by:	kib
MFC after:	1 week
2008-03-20 04:10:52 +00:00
sam
dc8118c259 add some debug msgs for tracking xfers 2008-03-20 03:11:07 +00:00