130087 Commits

Author SHA1 Message Date
wpaul
901783bd82 The TCP checksum offload handling in the 8111B/8168B and 8101E PCIe can
apparently be confused by short TCP segments that have been manually
padded to the minimum ethernet frame size. The driver does short frame
padding in software as a workaround for a bug in the 8169 PCI devices
that causes short IP fragments to be corrupted due to an apparent
conflict between the hardware autopadding and hardware IP checksumming.

To fix this, we avoid software padding for short TCP segments, since
the hardware seems to autopad and checksum these correctly (even the
older 8169 NICs get these right). Short UDP packets appear to be
handled correctly in all cases. This should work around the IP header
checksum bug in the 8169 while not tripping the TCP checksum bug in
the 8111B/8168B and 8101E.
2007-01-25 17:30:30 +00:00
bde
abf353c8bd Rename some functions and variables from nfs_* to nfs4_* to avoid
collisions with nfsclient's names.  Even static names should have a
unique prefix so that they can be debugged easily.

Hide the unused colliding variable nfsv3_commit_on_close in "#if 0"
together with other unused sysctl variables.  Duplicating the nfs sysctl
under nfs4 is probably just a bug.

Fix some nearby style bugs.

Remove duplicate $FreeBSD$.
2007-01-25 14:33:13 +00:00
bde
2bc6f7d071 Rename some functions and variables (mainly vfsops entry points) from
nfs_* to nfs4_* to avoid collisions with nfsclient's names.   Even
static names should have a unique prefix so that they can be debugged
easily.

Most of the renamed functions can probably be shared.  nfs4_cmount()
and nfs4_sync() are identical to the nfs_* versions, and all the others
except nfs4_vfsops() seem to be idendentical except for style bugs,
missing support for mountroot, and bugs.

Fix some nearby style bugs.

Remove duplicate $FreeBSD$.
2007-01-25 14:18:40 +00:00
bde
5a952c0766 Unstaticize nfs_iosize() in nfsclient and use it in nfs4client instead
of duplicating it except for larger style bugs in the copy.

Fix some nearby style bugs (including a harmless type mismatch)
in and near the remaining copy.

This is part of fixing collisions of the 2 nfs*client's names.  Even
static names should have a unique prefixes so that they can be debugged
easily.
2007-01-25 13:07:25 +00:00
mpp
938276b258 Add a BUGS section that shows that ids that appear to be
negative are now ignored by the quota system and that extremely
large ids may make quotacheck run for a very long time.

Also mention that "options QUOTA" is required for the kernel
to provide quota support.
2007-01-25 12:42:18 +00:00
pjd
dc2987aa01 When the following conditions are meet:
- First configured key is based only on keyfile (no passphrase).
- Device is attached.
- User changes first key (setkey) from keyfile to passphrase and doesn't
  specify number of iterations (with -i option).
...geli(8) won't store calculated number of iterations in metadata.
This result in device beeing unaccesable after detach.

One can recover from this situation by guessing number of iterations
generated, storing it in metadata and trying to attach device.
Recovery procedure isn't nice, but one's data is not lost.

Reported by:	Thomas Nickl <T.Nickl@gmx.net>
MFC after:	1 week
2007-01-25 11:44:03 +00:00
pjd
3cd8e7b357 Implement gctl_change_param() function, which changes value of existing
parameter.

MFC after:	1 week
2007-01-25 11:35:27 +00:00
rodrigc
5ecf1e9826 Try to avoid a possible infinite loop when parsing an invalid kernel dump file.
PR:		108229
Submitted by:	Jessica Han <jessicah juniper net>
Reviewed by:	marcel
MFC after:	1 week
2007-01-25 06:39:25 +00:00
mohans
83064ec323 Fix for problems that occur when all mbuf clusters migrate to the mbuf packet
zone. Cluster allocations fail when this happens. Also processes that may have
blocked on cluster allocations will never be woken up. Thanks to rwatson for
an overview of the issue and pointers to the mbuma paper and his tool to dump
out UMA zones.

Reviewed by: andre@
2007-01-25 01:05:23 +00:00
mpp
a6abba5b5e Display the name of the quota data file in verbose mode. 2007-01-24 22:52:32 +00:00
mohans
9799fcf93c Fix for a bug where only one process (of multiple) blocked on
maxpages on a zone is woken up, with the rest never being woken up as
a result of the ZFLAG_FULL flag being cleared. Wakeup all such blocked
procsses instead. This change introduces a thundering herd, but since
this should be relatively infrequent, optimizing this (by introducing
a count of blocked processes, for example) may be premature.

Reviewd by: ups@
2007-01-24 22:49:11 +00:00
dougb
802f16fd6b Update birth entry for Warren Zevon with his birthplace, and add an
entry for his death. Both per Wikipedia.
2007-01-24 21:21:38 +00:00
jeff
04228656e7 - Add a horrible bit of code to detect tsc differences between processors.
This only works if there is no significant drift and all processors are
   running at the same frequency.  Fortunately, schedgraph traces on MP
   machines tend to cover less than a second so drift shouldn't be an issue.
 - KTRFile::synchstamp() iterates once over the whole list to determine the
   lowest tsc value and syncs adjusts all other values to match.  We assume
   that the first tick recorded on all cpus happened at the same instant to
   start with.
 - KTRFile::monostamp() iterates again over the whole file and checks for
   a cpu agnostic monotonically increasing clock.  If the time ever goes
   backwards the cpu responsible is adjusted further to fit.  This will
   make the possible incorrect delta between cpus as small as the shortest
   time between two events.  This time can be fairly large due to sched_lock
   essentially protecting all events.
 - KTRFile::checkstamp() now returns an adjusted timestamp.
 - StateEvent::draw() detects states that occur out of order in time and
   draws them as 0 pixels after printing a warning.
2007-01-24 21:19:56 +00:00
jeff
743ea48fbc - With a sleep time over 2097 seconds hzticks and slptime could end up
negative.  Use unsigned integers for sleep and run time so this doesn't
   disturb sched_interact_score().  This should fix the invalid interactive
   priority panics reported by several users.
2007-01-24 18:18:43 +00:00
rrs
ba4b733a7c Fixes the MSG_PEEK for sctp_generic_recvmsg() the msg_flags
were not being copied in properly so PEEK and any other
msg_flags input operation were not being performed right.
Approved by:	gnn
2007-01-24 12:59:56 +00:00
ceri
eaeb42b080 Bump .Dd for r1.313. 2007-01-24 09:22:56 +00:00
jhb
dcee465690 Document LD_UTRACE.
MFC after:	3 days
2007-01-23 22:38:39 +00:00
jeff
9ea1e03533 - Print clock information so we know if something is not reported correctly
from the tsc.
 - Set skipnext = 1 for yielding and preempted events so we don't show the
   event that adds us back to the run queue.  It used to be 2 so we would
   skip the ksegrp run queue addition and the system run queue addition
   but the ksegrp run queue has gone away.
 - Don't display down to nanosecond resolution for scheduling events right
   now.  This can sometimes cause a division by zero.
2007-01-23 22:19:27 +00:00
mpp
974010dbb6 Document new quota knobs. 2007-01-23 21:27:07 +00:00
bruno
dcb4be2750 o introduce a flags 'errata' for HW bugs onto the softc.
o remove errata_a0 and introduce the corresponding flags into 'errata'.
o introduce a new errata for K8, namely some platform might set the
  PENDING_BIT but aren't able to unset it, also don't loop forever
  waiting PENDING_BIT being cleared.
o try to introduce a workaround for the PENDING_BIT stuck problem,
o support now half multipliers for K8.

Tested by:	Abdullah Al-Marrie

Approved by:	njl
2007-01-23 19:20:30 +00:00
imp
941d9aa2cc Use the more specific 'EM732X' designation rather than * to disable sync
cache commands, per request from njl@.
2007-01-23 17:29:31 +00:00
kib
fdd50404d1 Cylinder group bitmaps and blocks containing inode for a snapshot
file are after snaplock, while other ffs device buffers are before
snaplock in global lock order. By itself, this could cause deadlock
when bdwrite() tries to flush dirty buffers on snapshotted ffs. If,
during the flush, COW activity for snapshot needs to allocate block
and ffs_alloccg() selects the cylinder group that is being written
by bdwrite(), then kernel would panic due to recursive buffer lock
acquision.

Avoid dealing with buffers in bdwrite() that are from other side of
snaplock divisor in the lock order then the buffer being written. Add
new BOP, bop_bdwrite(), to do dirty buffer flushing for same vnode in
the bdwrite(). Default implementation, bufbdflush(), refactors the code
from bdwrite(). For ffs device buffers, specialized implementation is
used.

Reviewed by:	tegge, jeff, Russell Cattelan (cattelan xfs org, xfs changes)
Tested by:	Peter Holm
X-MFC after:	3 weeks (if ever: it changes ABI)
2007-01-23 10:01:19 +00:00
rodrigc
7ebb7b7367 Remove mount_nfs4 from SUBDIR list. The mount_nfs Makefile
links mount_nfs to mount_nfs4 now.
2007-01-23 09:18:25 +00:00
rodrigc
b831772f64 Link mount_nfs -> mount_nfs4, and mount_nfs.8 -> mount_nfs4.8.
Suggested by:	rees
2007-01-23 09:14:33 +00:00
jeff
8fd8265087 - Catch up to setrunqueue/choosethread/etc. api changes.
- Define our own maybe_preempt() as sched_preempt().  We want to be able
   to preempt idlethread in all cases.
 - Define our idlethread to require preemption to exit.
 - Get the cpu estimation tick from sched_tick() so we don't have to worry
   about errors from a sampling interval that differs from the time
   domain.  This was the source of sched_priority prints/panics and
   inaccurate pctcpu display in top.
2007-01-23 08:50:34 +00:00
bde
3f550f19b8 Oops, pc98 is independent of i386 for clock.c and machdep.c but not
for clock.h, so changing th i386 clock.h broke it.  MFi386 (not tested):

Cleaned up declaration and initialization of clock_lock.  It is only
used by clock code, so don't export it to the world for machdep.c to
initialize.  There is a minor problem initializing it before it is
used, since although clock initialization is split up so that parts
of it can be done early, the first part was never done early enough
to actually work.  Split it up a bit more and do the first part as
late as possible to document the necessary order.  The functions that
implement the split are still bogusly exported.

Cleaned up initialization of the i8254 clock hardware using the new
split.  Actually initialize it early enough, and don't work around it
not being initialized in DELAY() when DELAY() is called early for
initialization of some console drivers.

This unfortunately moves a little more code before the early debugger
breakpoint so that it is harder to debug.  The ordering of console and
related initialization is delicate because we want to do as little as
possible before the breakpoint, but must initialize a console.
2007-01-23 08:48:26 +00:00
jeff
474b917526 - Remove setrunqueue and replace it with direct calls to sched_add().
setrunqueue() was mostly empty.  The few asserts and thread state
   setting were moved to the individual schedulers.  sched_add() was
   chosen to displace it for naming consistency reasons.
 - Remove adjustrunqueue, it was 4 lines of code that was ifdef'd to be
   different on all three schedulers where it was only called in one place
   each.
 - Remove the long ifdef'd out remrunqueue code.
 - Remove the now redundant ts_state.  Inspect the thread state directly.
 - Don't set TSF_* flags from kern_switch.c, we were only doing this to
   support a feature in one scheduler.
 - Change sched_choose() to return a thread rather than a td_sched.  Also,
   rely on the schedulers to return the idlethread.  This simplifies the
   logic in choosethread().  Aside from the run queue links kern_switch.c
   mostly does not care about the contents of td_sched.

Discussed with:	julian

 - Move the idle thread loop into the per scheduler area.  ULE wants to
   do something different from the other schedulers.

Suggested by:	jhb

Tested on:	x86/amd64 sched_{4BSD, ULE, CORE}.
2007-01-23 08:46:51 +00:00
jeff
f53a7830f7 - Allow the schedulers to IPI_PREEMPT idlethread. This puts the decision
for this behavior on the initiator side.
2007-01-23 08:38:39 +00:00
bde
b12ed0640c Cleaned up declaration and initialization of clock_lock. It is only
used by clock code, so don't export it to the world for machdep.c to
initialize.  There is a minor problem initializing it before it is
used, since although clock initialization is split up so that parts
of it can be done early, the first part was never done early enough
to actually work.  Split it up a bit more and do the first part as
late as possible to document the necessary order.  The functions that
implement the split are still bogusly exported.

Cleaned up initialization of the i8254 clock hardware using the new
split.  Actually initialize it early enough, and don't work around it
not being initialized in DELAY() when DELAY() is called early for
initialization of some console drivers.

This unfortunately moves a little more code before the early debugger
breakpoint so that it is harder to debug.  The ordering of console and
related initialization is delicate because we want to do as little as
possible before the breakpoint, but must initialize a console.
2007-01-23 08:01:20 +00:00
njl
ae205e3573 Add missing function trace for debug prints. 2007-01-23 07:20:44 +00:00
rodrigc
375699e57b Merge mount_nfs4.c and mount_nfs.c into one program.
If argv[0] == "mount_nfs4", then default to mounting NFSv4,
otherwise if argv[0] == "mount_nfs", default to the old mount_nfs behavior.

- Add a -4 option.
- Add the University of Michigan copyright from mount_nfs4.c, for the
  code merged from mount_nfs4.c.

Reviewed by:	rees
2007-01-23 07:17:10 +00:00
rodrigc
4c351de443 When exiting vfs_export(), delete the "export" option from
the mount options list with vfs_deleteopt().  At this point, the export
information is saved in mp->mnt_export, so we can delete
the "export" mount option from mp->mnt_optnew and mp->mnt_opt.

This fixes read-write/read-only update mounts (mount -u -o rw, mount -u -o ro)
of NFS exported directories.

For some reason, I could only reproduce the problem with a configuration
supplied by Andre:
- "options QUOTA" enabled in kernel config
- "/ -maproot=root 10.0.1.105" in /etc/exports

Reported by:	kris, Andre Guibert de Bruet <andy siliconlandmark com>,
            	Andrzej Tobola <ato iem pw edu pl>
Tested by:	Andre Guibert de Bruet
2007-01-23 06:19:16 +00:00
scottl
2685a7063b Remove a PCI ID entry that conflicts with the AMR driver. 2007-01-23 02:47:33 +00:00
mpp
fff1ced6bf Use fseeko to seek in the file, instead of fseek to prevent seek
errors for extremely large uids (e.g. in the billions range).
2007-01-23 02:13:00 +00:00
mpp
d0c16759d9 Make sure that unknown uids/gids that now have non-zero usage and
had a previously recorded usage of zero are correctly displayed in
verbose mode.  Generalize the print routine a little too.
2007-01-23 02:10:19 +00:00
yongari
b4c0dd68e0 It seems that enabling Tx and Rx before setting descriptor DMA
addresses shall access invalid descriptor DMA addresses on PCIe
hardwares and then panicked the system.
To fix it set descriptor DMA addresses before enabling Tx and Rx
such that hardware can see valid descriptor DMA addresses. Also
set RL_EARLY_TX_THRESH before starting Tx and Rx.

Reported by:	steve.tell AT crashmail DOT de
Tested by:	steve.tell AT crashmail DOT de
Obtained from:	NetBSD
MFC after:	1 week
2007-01-23 00:44:12 +00:00
mjacob
2df6044b61 Clean up some of the various platform and release specific dma tag
stuff so it is centralized in isp_freebsd.h.

Take out PCI posting flushed in qla2100/2200 register reads except for
2100s.
2007-01-23 00:02:29 +00:00
jhb
3624354c54 Expand the MSI/MSI-X API to address some deficiencies in the MSI-X support.
- First off, device drivers really do need to know if they are allocating
  MSI or MSI-X messages.  MSI requires allocating powerof2() messages for
  example where MSI-X does not.  To address this, split out the MSI-X
  support from pci_msi_count() and pci_alloc_msi() into new driver-visible
  functions pci_msix_count() and pci_alloc_msix().  As a result,
  pci_msi_count() now just returns a count of the max supported MSI
  messages for the device, and pci_alloc_msi() only tries to allocate MSI
  messages.  To get a count of the max supported MSI-X messages, use
  pci_msix_count().  To allocate MSI-X messages, use pci_alloc_msix().
  pci_release_msi() still handles both MSI and MSI-X messages, however.
  As a result of this change, drivers using the existing API will only
  use MSI messages and will no longer try to use MSI-X messages.
- Because MSI-X allows for each message to have its own data and address
  values (and thus does not require all of the messages to have their
  MD vectors allocated as a group), some devices allow for "sparse" use
  of MSI-X message slots.  For example, if a device supports 8 messages
  but the OS is only able to allocate 2 messages, the device may make the
  best use of 2 IRQs if it enables the messages at slots 1 and 4 rather
  than default of using the first N slots (or indicies) at 1 and 2.  To
  support this, add a new pci_remap_msix() function that a driver may call
  after a successful pci_alloc_msix() (but before allocating any of the
  SYS_RES_IRQ resources) to allow the allocated IRQ resources to be
  assigned to different message indices.  For example, from the earlier
  example, after pci_alloc_msix() returned a value of 2, the driver would
  call pci_remap_msix() passing in array of integers { 1, 4 } as the
  new message indices to use.  The rid's for the SYS_RES_IRQ resources
  will always match the message indices.  Thus, after the call to
  pci_remap_msix() the driver would be able to access the first message
  in slot 1 at SYS_RES_IRQ rid 1, and the second message at slot 4 at
  SYS_RES_IRQ rid 4.  Note that the message slots/indices are 1-based
  rather than 0-based so that they will always correspond to the rid
  values (SYS_RES_IRQ rid 0 is reserved for the legacy INTx interrupt).
  To support this API, a new PCIB_REMAP_MSIX() method was added to the
  pcib interface to change the message index for a single IRQ.

Tested by:	scottl
2007-01-22 21:48:44 +00:00
andre
4a22f82e6c Unbreak writes of 0 bytes. Zero byte writes happen when only ancillary
control data but no payload data is passed.

Change m_uiotombuf() to return at least one empty mbuf if the requested
length was zero.  Add comment to sosend_dgram and sosend_generic().

Diagnoses by:		jhb
Regression test by:	rwatson
Pointy hat to.		andre
2007-01-22 14:50:28 +00:00
bms
3530b41545 Document the existence of the TCP_INFO socket option.
Approved by:	rwatson
2007-01-22 14:16:47 +00:00
marius
710b7025d1 Actually fully emulate NetBSD and print the media instance number
only for non-zero instances so the typical output for IFM_IEEE80211
type media doesn't overflow 80 columns.

Requested by:	sam
2007-01-22 13:42:07 +00:00
bms
df992b84fc Docuemnt exactly which functions access which NSS databases.
Point out that FreeBSD libc has compat stubs for GNU glibc NSS
modules which access NSDB_PASSWD/NSDB_GROUP, but not NSDB_HOSTS;
based on painful experience porting nss_mdns.

Reviewed by:	ru
2007-01-22 11:45:25 +00:00
kib
79752b63e1 Below is slightly edited description of the LOR by Tor Egge:
--------------------------
[Deadlock] is caused by a lock order reversal in vfs_lookup(), where
[some] process is trying to lock a directory vnode, that is the parent
directory of covered vnode) while holding an exclusive vnode lock on
covering vnode.

A simplified scenario:

root fs					var fs
/    		A			/    (/var)	D
/var		B			/log (/var/log) E
vfs lock	C			vfs lock	F

Within each file system, the lock order is clear: C->A->B and F->D->E

When traversing across mounts, the system can choose between two lock orders,
but everything must then follow that lock order:

      L1: C->A->B
		|
	        +->F->D->E

      L2: F->D->E
	     |
             +->C->A->B

The lookup() process for namei("/var") mixes those two lock orders:

    VOP_LOOKUP() obtains B while A is held
    vfs_busy() obtains a shared lock on F while A and B are held (follows L1,
    violates L2)
    vput() releases lock on B
    VOP_UNLOCK() releases lock on A
    VFS_ROOT() obtains lock on D while shared lock on F is held
    vfs_unbusy() releases shared lock on F
    vn_lock() obtains lock on A while D is held (violates L1, follows L2)

dounmount() follows L1 (B is locked while F is drained).

Without unmount activity, vfs_busy() will always succeed without blocking
and the deadlock isn't triggered (the system behaves as if L2 is followed).

With unmount, you can get 4 processes in a deadlock:

     p1: holds D, want A (in lookup())
     p2: holds shared lock on F, want D (in VFS_ROOT())
     p3: holds B, want drain lock on F (in dounmount())
     p4: holds A, want B (in VOP_LOOKUP())

You can have more than one instance of p2.

The reversal was introduced in revision 1.81 of src/sys/kern/vfs_lookup.c and
MFCed to revision 1.80.2.1, probably to avoid a cascade of vnode locks when nfs
servers are dead (VFS_ROOT() just hangs) spreading to the root fs root vnode.

- Tor Egge

To fix the LOR, ups@ noted that when crossing the mount point, ni_dvp
is actually not used by the callers of namei. Thus, placeholder deadfs
vnode vp_crossmp is introduced that is filled into ni_dvp.

Idea by:	ups
Reviewed by:	tegge, ups, jeff, rwatson (mac interaction)
Tested by:	Peter Holm
MFC after:	2 weeks
2007-01-22 11:25:22 +00:00
imp
bbae4f9949 Add quirk for EasyMP3 EM732X usb 2.0 flash mp3 player.
(It appears that the quirk proceedures link has disappeared and that
this PR complied with it, if there's a problem, please contact me).

PR: usb/96546
2007-01-22 04:34:03 +00:00
marius
95a9b2142a Change the remainder of the drivers for DMA'ing devices enabled in the
sparc64 GENERIC and the sound device drivers known working on sparc64
to use bus_get_dma_tag() to obtain the parent DMA tag so we can get rid
of the sparc64_root_dma_tag kludge eventually. Except for ath(4), sk(4),
stge(4) and ti(4) these changes are runtime tested (unless I booted up
the wrong kernels again...).
2007-01-21 19:32:51 +00:00
marius
32ccb0b969 Correct a logic bug in the previous change. 2007-01-21 19:28:00 +00:00
netchild
1542e0642c Use a printf-modifier which doesn't need a cast.
Submitted by:	scottl
2007-01-21 13:18:52 +00:00
rodrigc
66215c66ac Decrease to WARNS=3. 2007-01-20 23:24:11 +00:00
rodrigc
1308f85b86 Clean up compilation warnings. Set WARNS=6 in Makefile.
PR:	71659
Submitted by:	Dan Lukes <dan obluda cz>
2007-01-20 21:35:11 +00:00
jeff
5fd995e14a - Disable the long-term load balancer. I believe that steal_busy works
better and gives more predictable results.
2007-01-20 21:24:05 +00:00