Commit Graph

65051 Commits

Author SHA1 Message Date
Ken Smith
a258946554 Make sure that either inp is NULL or we have obtained a lock on it before
jumping to dropunlock to avoid a panic.  While here move the calls to
ipsec4_in_reject() and ipsec6_in_reject() so they are after we obtain
the lock on inp.

Original patch to avoid panic:	pjd
Review of locking adjustments:	gnn, sam
Approved by:			re (rwatson)
2007-09-10 14:49:32 +00:00
Robert Watson
f5514f084e Further UDPv4 cleanup:
- Resort includes a bit.
- Correct typos and wording problems in comments.
- Rename udpcksum to udp_cksum to be consistent with other UDP-related
  configuration variables.
- Remove indirection of udp_notify through local notify variable in
  udp_ctlinput(), which is presumably due to copying and pasting from TCP,
  where multiple notify routines exist.

Approved by:	re (kensmith)
2007-09-10 14:22:15 +00:00
Bjoern A. Zeeb
7fd627f00f Fix a DIV0 in case a large value for fs_avgfilesize or fs_avgfpdir
is given (with newfs or tunefs) and dirsize overflows.

In case dirsize is <= 0 because of an overflow set maxcontigdirs
to 0 so it will be 1 later. This is what would happen for large
fs_avgfilesize. [1]

Identified with help from:	roberto, pjd
Submitted by:			pjd [1]
Approved by:			re (rwatson)
MFC after:			8 days
2007-09-10 14:12:29 +00:00
Tai-hwa Liang
73474451b9 Fixing invalid channel display in ifconfig(8) by implementing required
ioctl().

Note that other information provided by ifconfig(8) such like "list chan"
or "list ap" are still not available at this moment.

Before an(4) is connected to wlan(4), users are encouraged to use
ancontrol(8) to retrieve aforementioned information.

Reported by:	dhw (http://lists.freebsd.org/pipermail/freebsd-current/2007-July/074848.html)
Reviewed by:	ambrisko
Tested by:	dhw
Approved by:	re (bmah)
2007-09-10 12:53:34 +00:00
Kip Macy
2de1fa86d7 pull in changes made to RELENG_6 version in the process of doing the MFC
Supported by: Chelsio
Approved by: re (blanket)
2007-09-10 00:59:51 +00:00
Andrew Thompson
cb44b6dfe8 Check for multicast destination on bpf injected packets and update the M_*CAST
flags, the absense of these flags causes problems in other areas such as
bridging which expect them to be correct.

At the moment only Ethernet DLTs are checked.

Reviewed by:	bms, csjp, sam
Approved by:	re (bmah)
2007-09-10 00:03:06 +00:00
Robert Watson
45e0f3d63d Rename mac_check_vnode_delete() MAC Framework and MAC Policy entry
point to mac_check_vnode_unlink(), reflecting UNIX naming conventions.

This is the first of several commits to synchronize the MAC Framework
in FreeBSD 7.0 with the MAC Framework as it will appear in Mac OS X
Leopard.

Reveiwed by:    csjp, Samy Bahra <sbahra at gwu dot edu>
Submitted by:   Jacques Vidrine <nectar at apple dot com>
Obtained from:  Apple Computer, Inc.
Sponsored by:   SPARTA, SPAWAR
Approved by:    re (bmah)
2007-09-10 00:00:18 +00:00
Kip Macy
f4a2d780df - Remove filter support
Supported by: Chelsio
Approved by: re(blanket)
2007-09-09 20:26:02 +00:00
Olivier Houchard
4168e66b1f In __bswap16_var(), make sure the 16 upper bits are cleared; while
optimizing, gcc4 doesn't always do so.

Reported by:	Nathan Whitehorn
Approved by:	re (blanket)
2007-09-09 11:58:38 +00:00
Kip Macy
8adc65adda Add back in support for normal mbuf chaining on RX under DISABLE_MBUF_IOVEC
Approved by: re(blanket)
Supported by: Chelsio
2007-09-09 04:34:03 +00:00
Kip Macy
a8d57f7f24 Fix last-minute typo in last commit caused by pre-commit scripts
Approved by: re(blanket)
2007-09-09 03:51:25 +00:00
Kip Macy
5c5df3da16 - fix qset to port binding as a proper fix for the problems encountered on the 4-port
- fix the use after free seen when sending packets small enough to fit as an immediate
   and bpf peers are present
 - update to firmware rev 4.7 along with various small vendor fixes

Supported by: Chelsio
Approved by: re (blanket)
MFC after: 3 days
2007-09-09 01:28:03 +00:00
Olivier Houchard
18b6e4c8d2 Do not set the RTF_GATEWAY flag if RTF_LLINFO is set, it doesn't make much
sense in that context, and leads to unusable routes.
This should unbreak bootpd.

Discussed with: glebius
Submitted by:   bms
Approved by:    re (bmah)
2007-09-08 19:28:45 +00:00
Randall Stewart
851b7298b3 - send call has a reference to uio->uio_resid in
the recent send code, but uio may be NULL on sendfile
  calls. Change to use sndlen variable.
- EMSGSIZE is not being returned in non-blocking mode
  and needs a small tweak to look if the msg would
  ever fit when returning EWOULDBLOCK.
- FWD-TSN has a bug in stream processing which could
  cause a panic. This is a follow on to the codenomicon
  fix.
- PDAPI level 1 and 2 do not work unless the reader
  gets his returned buffer full. Fix so we can break
  out when at level 1 or 2.
- Fix fast-handoff features to copy across properly on
  accepted sockets
- Fix sctp_peeloff() system call when no true system call
  exists to screen arguments for errors. In cases where a
  real system call exists the system call itself does this.
- Fix raddr leak in recent add-ip code change for bundled
  asconfs (even when non-bundled asconfs are received)
- Make sure ipi_addr lock is held when walking global addr
  list. Need to change this lock type to a rwlock().
- Add don't wake flag on both input and output when the
  socket is closing.
- When deleting an address verify the interface is correct
  before allowing the delete to process. This protects panda
  and unnumbered.
- Clean up old sysctl stuff and get rid of the old Open/Net
  BSD structures.
- Add a function to watch the ranges in the sysctl sets.
- When appending in the reassembly queue, validate that
  the assoc has not gone to about to be freed. If so
  (in the middle) abort out. Note this especially effects
  MAC I think due to the lock/unlock they do (or with
  LOCK testing in place).
- Netstat patch to get rid of warnings.
- Make sure that no data gets queued to inactive/unconfirmed
  destinations. This especially effect CMT but also makes a
  impact on regular SCTP as well.
- During init collision when we detect seq number out
  of sync we need to treat it like Case C and discard
  the cookie (no invarient needed here).
- Atomic access to the random store.
- When we declare a vtag good, we need to shove it
  into the time wait hash to prevent further use. When
  the tag is put into the assoc hash, we need to remove it
  from the twait hash (where it will surely be). This prevents
  duplicate tag assignments.
- Move decr-ref count to better protect sysctl out of
  data.
- ltrace error corrections in sctp6_usrreq.c
- Add hook for interface up/down to be sent to us.
- Make sysctl() exported structures independent of processor
  architecture.
- Fix route and src addr cache clearing for delete address case.
- Make sure address marked SCTP_DEL_IP_ADDRESS is never selected
  as src addr.
- in icmp handling fixed so we actually look at the icmp codes
  to figure out what to do.
- Modified mobility code.
  Reception of DELETE IP ADDRESS for a primary destination and
  SET PRIMARY for a new primary destination is used for
  retransmission trigger to the new primary destination.
  Also, in this case, destination of chunks in send_queue are
  changed to the new primary destination.
- Fix so that we disallow sending by mbuf to ever have EEOR
  mode set upon it.

Approved by:	re@freebsd.org (B Mah)
2007-09-08 17:48:46 +00:00
Randall Stewart
ceaad40ae7 - Locking compatiability changes. This involves adding
additional flags to many function calls. The flags only
  get used in BSD when we compile with lock testing. These
  flags allow apple to escape the "giant" lock it holds on
  the socket and have more fine-grained locking in the NKE.
  It also allows us to test (with witness) the locking used
  by apple via a compile switch (manually applied).

Approved by:	re@freebsd.org(B Mah)
2007-09-08 11:35:11 +00:00
Robert Watson
ce4d8529e3 Continue UDP/UDPv6 synchronization project:
- Fix copyrights, comments in UDPv6.
- Remove macro defines for in6pcb and udp6stat.
- Consistently refer to inpcbs as 'inp' and not also 'in6p'.

Reviewed by:	gnn, jinmei, bz
Approved by:	re (bmah)
2007-09-08 08:18:24 +00:00
Robert Watson
85d9437250 Back out tcp_timer.c:1.93 and associated changes that reimplemented the many
TCP timers as a single timer, but retain the API changes necessary to
reintroduce this change.  This will back out the source of at least two
reported problems: lock leaks in certain timer edge cases, and TCP timers
continuing to fire after a connection has closed (a bug previously fixed and
then reintroduced with the timer rewrite).

In a follow-up commit, some minor restylings and comment changes performed
after the TCP timer rewrite will be reapplied, and a further change to allow
the TCP timer rewrite to be added back without disturbing the ABI.  The new
design is believed to be a good thing, but the outstanding issues are
leading to significant stability/correctness problems that are holding
up 7.0.

This patch was generated by silby, but is being committed by proxy due to
poor network connectivity for silby this week.

Approved by:	re (kensmith)
Submitted by:	silby
Tested by:	rwatson, kris
Problems reported by:	peter, kris, others
2007-09-07 09:19:22 +00:00
Sam Leffler
2a2391c23c - fix a bug that zyd_attach() returns 0 even if it encountered errors
that can lead to a panic when the stick is yanked.
- make sure that zyd_attach() returns 0 or errno.

Submitted by:	Weongyo Jeong <weongyo.jeong@gmail.com>
Reported by:	Ted Lindgreen <ted@tednet.nl>
Reviewed by:	sam
Approved by:	re (blanket wireless)
2007-09-07 03:54:54 +00:00
Marius Strobl
7439368f60 o Revamp the sparc64 interrupt code in order to be able to interface
with the INTR_FILTER-enabled MI code. Basically this consists of
  registering an interrupt controller (of which there can be multiple
  and optionally different ones either per host-to-foo bridge or shared
  amongst host-to-foo bridges in any one machine) along with an interrupt
  vector as specific argument for all the interrupt vectors used by a
  given host-to-foo bridge (roughly similar to registering interrupt
  sources on amd64 and i386), providing functions to enable, clear and
  disable the interrupts of the children beneath the bridge.
  This also includes:
  - No longer entering a critical section in tl0_intr() and tl1_intr()
    for executing interrupt handlers but rather let the handlers enter
    it themselves so in the case of intr_event_handle() we don't enter
    a nested critical section.
  - Adding infrastructure for binding delivery of interrupt vectors to
    specific CPUs which later on can be interfaced with the code from
    amd64/i386 for binding interrupts to specific CPUs.
  - Getting rid of the wrapper hack introduced along the lines of the
    API changes for INTR_FILTER which as a side-effect caused interrupts
    associated with ithread handlers only to get the elevated priority
    of those associated with filters ("fast handlers") (this removes the
    hack also in the non-INTR_FILTER case).
  - Disabling (by not clearing) an interrupt in the interrupt controller
    until all associated handlers have been executed, which is crucial
    for the typical locking strategy of NIC drivers in order to work
    correctly in case of shared interrupts. This was a more or less
    theoretical problem on sparc64 though, as shared interrupts are
    rather uncommon there except for the on-board SCCs and UARTs.
  Note that due to the behavior of at least of some of the interrupt
  controllers used on sparc64 an enable+EOI instead of a disable+EOI
  approach (as implied by the INTR_FILTER MI code and implemented on
  other architectures) is used as the latter can cause lost interrupts
  or in the worst case interrupt starvation.
o Correct a typo in sbus_alloc_resource() which caused (pass-through)
  allocations to only work down to the grandchildren of the bus, which
  wasn't a real problem so far as we don't support any devices which are
  great-grandchildren or greater of a U2S bridge, yet.
o In fhc(4) use bus_{read,write}_4() instead of bus_space_{read,write}_4()
  in order to get rid of sc_bh and sc_bt in the fhc_softc. Also get rid
  of some other unneeded members in fhc_softc.

Reviewed by:	marcel (earlier version)
Approved by:	re (kensmith)
2007-09-06 19:16:30 +00:00
Marius Strobl
5435966282 Style(9) fix - use #define<tab> consistently.
Approved by:	re (kensmith)
2007-09-06 14:56:09 +00:00
Sam Leffler
7595008bb1 oops, add missing bit from last change
Approved by:	re (blanket wireless)
2007-09-06 00:08:02 +00:00
Sam Leffler
c066143c08 Fixup sta inactivity handling:
o reset ni_inact when ni_inact_reload is changed so we're
  assured a valid setting
o never let ni_inact go negative
o add a knob to disable hostap sta idle handling (e.g. so it can be done
  by a user application)
o remove bogus reload on associate

Reviewed by:	avatar
Approved by:	re (blanket wireless)
2007-09-06 00:04:36 +00:00
Sam Leffler
5c096cfbe5 Add missing bg scanning bits; update ic_lastdata and cancel any
bg scan when there's outbound traffic.

Approved by:	re (blanket wireless)
2007-09-05 23:40:59 +00:00
Sam Leffler
2b9411e29f Add missing bits that made bg scanning lame:
o update ic_lastdata to reflect time of last outbound frame
o outbound traffic must preempt/cancel bg scanning to avoid delays

This stuff was somehow missed in the initial import.

Reviewed by:	thompsa, avatar, sephe (earlier version)
Approved by:	re (blanket wireless)
2007-09-05 23:00:27 +00:00
Sam Leffler
14fb6b8fe2 o add 802.11 state machine states for DFS and client-side power save
o fixup drivers to ignore new states

Reviewed by:	avatar (?)
Approved by:	re (blanket wireless)
2007-09-05 21:31:32 +00:00
Sam Leffler
dc60433061 add defs just removed from ieee80211.h
Approved by:	re (blanket wireless)
2007-09-05 21:25:58 +00:00
Sam Leffler
3f87f68e74 Update channel definition:
o add ic_extieee to hold the HT40 extension channel number
o add ic_state to track dynamic channel state for DFS
o add flags to mark regulatory channel requirements
o add state defs for DFS/radar support

Reviewed by:	avatar
Approved by:	re (blanket wireless)
2007-09-05 20:37:39 +00:00
Sam Leffler
eddedabe31 Miscellaneous fixups to 802.11 defs:
o update 11n definitions to D2.0 spec
o add IEEE80211_CAPINFO_SPECTRUM_MGMT for DFS support
o add CSA ie definition for DFS support
o purge some unused definitions
o correct 802.11 reason and status codes
o correct reason code returned when a sta tries to associate to an
  ap operating with WPA/RSN but without a WPA/RSN ie

Reviewed by:	thompsa, avatar
Approved by:	re (blanket wireless)
2007-09-05 20:29:51 +00:00
Sam Leffler
b1acbdbbbb o add M_WEP mbuf flag so drivers can mark frames that are decrypted by the
device and have had the crypto bits stripped from the 802.11 header
o strip mbuf flags in the rx path before passing up the stack

Reviewed by:	thompsa, sephe, avatar
Approved by:	re (blanket wireless)
2007-09-05 20:22:59 +00:00
Olivier Houchard
33321c8166 There's no need to re-read PCIR_COMMAND once we set it.
Approved by:	re (blanket)
2007-09-04 18:45:27 +00:00
Jack F Vogel
3ec35e52b8 This is an update to the new Intel 10G 82598 driver.
The first drop was Beta, this code is expected to be the release version.
Note that this driver code will build in either 6.2 or 7. If you
use the code in 6.2 you will not get TSO or MSI/X support but it will
function in a legacy mode.

Approved by: re
2007-09-04 02:31:35 +00:00
Robert Watson
70ffc2fb53 In userland_sysctl(), call useracc() with the actual newlen value to be
used, rather than the one passed via 'req', which may not reflect a
rewrite.  This call to useracc() is redundant to validation performed by
later copyin()/copyout() calls, so there isn't a security issue here,
but this could technically lead to excessive validation of addresses if
the length in newlen is shorter than req.newlen.

Approved by:	re (kensmith)
Reviewed by:	jhb
Submitted by:	Constantine A. Murenin <cnst+freebsd@bugmail.mojo.ru>
Sponsored by:	Google Summer of Code 2007
2007-09-02 09:59:33 +00:00
Yoshihiro Takahashi
7b226dfaa8 Fix a kernel panic due to a NULL pointer access on pc98.
When any PnP device exists, isa_release_resource() is called with no
activated resource.  So a bushandle is not allocated yet.

Approved by:	re (kensmith)
2007-09-01 12:18:28 +00:00
Pawel Jakub Dawidek
864cba9669 Add support for Camellia encryption algorithm.
PR:		kern/113790
Submitted by:	Yoshisato YANAGISAWA <yanagisawa@csg.is.titech.ac.jp>
Approved by:	re (bmah)
2007-09-01 06:33:02 +00:00
Pawel Jakub Dawidek
6bc581fcf0 Use CTLFLAG_RDTUN for tunable sysctls.
Approved by:	re (bmah)
2007-09-01 06:23:42 +00:00
Bruce Evans
c2819440b3 Fix races in msdosfs_lookup() and msdosfs_readdir(). These functions
can easily block in bread(), and then there was nothing to prevent the
static buffer (nambuf_{ptr,len,last_id}) being clobbered by another
thread.

The effects of the bug seem to have been limited to failed lookups and
mangled names in readdir(), since Giant locking provides enough
serialization to prevent concurrent calls to the functions that access
the buffer.  They were very obvious for multiple concurrent tree walks,
especially with a small cluster size.

The bug was introduced in msdosfs_conv.c 1.34 and associated changes,
and is in all releases starting with 5.2.

The fix is to allocate the buffer as a local variable and pass around
pointers to it like "_r" functions in libc do.  Stack use from this
is large but not too large.  This also fixes a memory leak on module
unload.

Reviewed by:	kib
Approved by:	re (kensmith)
2007-08-31 22:29:55 +00:00
John Baldwin
67b158d888 Close a race that snuck in with the recent changes to fix a LOR between
the callout_lock spin lock and the sleepqueue spin locks.  In the fix,
callout_drain() has to drop the callout_lock so it can acquire the
sleepqueue lock.  The state of the callout can change while the
callout_lock is held however (for example, it can be rescheduled via
callout_reset()).  The previous code assumed that the only state change
that could happen is that the callout could finish executing.  This change
alters callout_drain() to effectively restart and recheck everything
after it acquires the sleepqueue lock thus handling all the possible
states that the callout could be in after any changes while callout_lock
was dropped.

Approved by:	re (kensmith)
Tested by:	kris
2007-08-31 19:01:30 +00:00
Diomidis Spinellis
d5b6981e69 Add missing newline in the log message of the previous commit.
Approved by:	re (kensmith) - implied
2007-08-31 13:56:26 +00:00
Diomidis Spinellis
72de1b3709 Don't panic. When encountering a negative value call log(LOG_NOTICE, ...)
and record LONG_MAX, instead of calling KASSERT(...).

Reported by:	rwatson
Approved by:	re (kensmith)
2007-08-31 13:36:58 +00:00
Nate Lawson
c961faca8c Evaluate _OSC on boot to indicate our OS capabilities to ACPI. This is
needed at least to convince the BIOS to give us access to CPU freq
control on MacBooks.

Submitted by:	Rui Paulo <rpaulo / fnop.net>
Approved by:	re
MFC after:	5 days
2007-08-30 21:18:42 +00:00
Andrew Thompson
207455510b Show the ACTIVE flag in ifconfig for the single interface that is actaully
active in failover mode rather than all interfaces with a link. This makes it
clear if the master interface is in use or one of the backup links.

Found by:	Writing the Handbook section
Approved by:	re (kensmith)
2007-08-30 19:12:10 +00:00
Andrew Thompson
06035e8252 Remove the lock assert from iwi_newstate, this function does not need the lock
to be held and this will falsely trigger if called from net80211.

Reported by:	Munehiro (haro) Matsuda
Reviewed by:	sam
Approved by:	re (kensmith)
2007-08-29 21:52:03 +00:00
John Baldwin
57b7fe337e Partially revert the previous change. I failed to notice that where
ktruserret() is invoked, an unlocked check of  the per-process queue
is performed inline, thus, we don't lock the ktrace_sx on every userret().

Pointy hat to:	jhb
Approved by:	re (kensmith)
Pointy hat recovered from:	rwatson
2007-08-29 21:17:11 +00:00
Warner Losh
eb0fa74e92 A port of the zyd driver from NetBSD by . This supports the ZyDAS
ZD1211/ZD1211B USB IEEE 802.11b/g wireless network devices.  Not (yet)
connected to the build process (next batch of commits once I've looped
the current back back).

Submitted by: Weongyo Jeong
Reviewed by: sam@
Approved by: re@
2007-08-29 21:16:50 +00:00
Warner Losh
44298c2b79 Makefile for building zyd kernel module.
Submitted by: Weongyo Jeong
Approved by: re@ (kensmith)
2007-08-29 21:04:26 +00:00
Warner Losh
4c2b0b2a5e Add devices for the forthcoming zyd driver, ported from NetBSD, by
Weongyo Jeong.

Submitted by: Weongyo Jeong
Approved by: re@
2007-08-29 21:00:57 +00:00
Brian Feldman
598fa04675 Repair ALTQ-tagging rules in IPFW which got broken in the last PF
import.  The PF mbuf-tagging support routines changed to link the
allocated tags into the provided mbuf themselves, so the left-over
m_tag_prepend() was trying to add a bogus (usually NULL) tag.

Reviewed by: mlaier
Approved by: re
2007-08-29 19:34:28 +00:00
John Baldwin
cc479dda4a Rework the routines to convert a 5.x+ statfs structure (with fixed-size
64-bit counters) to a 4.x statfs structure (with long-sized counters).
- For block counters, we scale up the block size sufficiently large so
  that the resulting block counts fit into a the long-sized (long for the
  ABI, so 32-bit in freebsd32) counters.  In 4.x the NFS client's statfs
  VOP did this already.  This can lie about the block size to 4.x binaries,
  but it presents a more accurate picture of the ratios of free and
  available space.
- For non-block counters, fix the freebsd32 stats converter to cap the
  values at INT32_MAX rather than losing the upper 32-bits to match the
  behavior of the 4.x statfs conversion routine in vfs_syscalls.c

Approved by:	re (kensmith)
2007-08-28 20:28:12 +00:00
Konstantin Belousov
0e6ed4feab Regenerate.
Approved by:	re (kensmith)
2007-08-28 12:36:23 +00:00
Konstantin Belousov
b6e645c90f Implement fake linux sched_getaffinity() syscall to enable java to work
with Linux 2.6 emulation. This shall be reimplemented once FreeBSD gets
native scheduler affinity syscalls.

Submitted by:	rdivacky
Reviewed by:	jkim
Sponsored by:	Google Summer of Code 2007
Approved by:	re (kensmith)
2007-08-28 12:26:35 +00:00
Jung-uk Kim
8553cd622c Fix off-by-two errors.
Both WWNN and WWPN are 64-bit unsigned integers and they are prefixed
with "0x", which requires two more bytes each.

Submitted by:	Danny Braniss (danny at cs dot huji dot ac dot il)
		via Matthew Jacob (lydianconcepts at gmail dot com)
Approved by:	re (bmah)
MFC after:	3 days
2007-08-28 00:09:12 +00:00
Randall Stewart
2afb3e849f - During shutdown pending, when the last sack came in and
the last message on the send stream was "null" but still
  there, a state we allow, we could get hung and not clean
  it up and wait for the shutdown guard timer to clear the
  association without a graceful close. Fix this so that
  that we properly clean up.
- Added support for Multiple ASCONF per new RFC. We only
  (so far) accept input of these and cannot yet generate
  a multi-asconf.
- Sysctl'd support for experimental Fast Handover feature. Always
  disabled unless sysctl or socket option changes to enable.
- Error case in add-ip where the peer supports AUTH and ADD-IP
  but does NOT require AUTH of ASCONF/ASCONF-ACK. We need to
  ABORT in this case.
- According to the Kyoto summit of socket api developers
  (Solaris, Linux, BSD). We need to have:
   o non-eeor mode messages be atomic - Fixed
   o Allow implicit setup of an assoc in 1-2-1 model if
     using the sctp_**() send calls - Fixed
   o Get rid of HAVE_XXX declarations - Done
   o add a sctp_pr_policy in hole in sndrcvinfo structure - Done
   o add a PR_SCTP_POLICY_VALID type flag - yet to-do in a future patch!
- Optimize sctp6 calls to reuse code in sctp_usrreq. Also optimize
  when we close sending out the data and disabling Nagle.
- Change key concatenation order to match the auth RFC
- When sending OOTB shutdown_complete always do csum.
- Don't send PKT-DROP to a PKT-DROP
- For abort chunks just always checksums same for
  shutdown-complete.
- inpcb_free front state had a bug where in queue
  data could wedge an assoc. We need to just abandon
  ones in front states (free_assoc).
- If a peer sends us a 64k abort, we would try to
  assemble a response packet which may be larger than
  64k. This then would be dropped by IP. Instead make
  a "minimum" size for us 64k-2k (we want at least
  2k for our initack). If we receive such an init
  discard it early without all the processing.
- When we peel off we must increment the tcb ref count
  to keep it from being freed from underneath us.
- handling fwd-tsn had bugs that caused memory overwrites
  when given faulty data, fixed so can't happen and we
  also stop at the first bad stream no.
- Fixed so comm-up generates the adaption indication.
- peeloff did not get the hmac params copied.
- fix it so we lock the addr list when doing src-addr selection
  (in future we need to use a multi-reader/one writer lock here)
- During lowlevel output, we could end up with a _l_addr set
  to null if the iterator is calling the output routine. This
  means we would possibly crash when we gather the MTU info.
  Fix so we only do the gather where we have a src address
  cached.
- we need to be sure to set abort flag on conn state when
  we receive an abort.
- peeloff could leak a socket. Moved code so the close will
  find the socket if the peeloff fails (uipc_syscalls.c)

Approved by:	re@freebsd.org(Ken Smith)
2007-08-27 05:19:48 +00:00
Maxim Konovalov
4a296ec798 o Fix bug I introduced in the previous commit (ipfw set extention):
pack a set number correctly.

Submitted by:	oleg

o Plug a memory leak.

Submitted by:	oleg and Andrey V. Elsukov
Approved by:	re (kensmith)
MFC after:	1 week
2007-08-26 18:38:31 +00:00
Sepherosa Ziehau
f05ba5eeed Off-by-one bug in country ie construction, which will make HOSTAP send out
malformatted beacons.

Reviewed by: sam
Approved by: re (bmah), sam (mentor)
2007-08-26 11:34:51 +00:00
Sepherosa Ziehau
98b335504d Fix following nits:
- Per ieee80211com sysctl ctx leakage on detach
- getmgtframe incorrectly adjusts mbuf.m_data

Reviewed by: sam
Approved by: re (bmah), sam (mentor)
2007-08-26 11:32:56 +00:00
Scott Long
610f2ef365 Update the MFI driver to support new "1078" series of hardware. This
includes the upcoming Dell PERC6 series.  Many thanks to LSI for
contributing this code.

Submitted by: LSI
Approved by: re
2007-08-25 23:58:45 +00:00
Kip Macy
7ac2e6c362 Fixes for 4 port and small packet optimization
- remove cpl->iff panic - we can't know the port number from the rspq on the 4-port
- pick the ifnet based on the interface in the CPL header
- switch to using qset 0 for egress on the 4-port for now - may change
  when we start using RSS
- move ether_ifdetach to before the port lock gets deinitialized to avoid
  hang in the case where there are BPF peers (cxgb_ioctl is called indirectly
  when BPF peers are present)
- don't call t3_mac_reset if multiport is set, this was causing tx errors
  by misconfiguring the MAC on the 4-port
- change V_TXPKT_INTF to use txpkt_intf as the interfaces are not contiguous
- free the mbuf immediately in the case where the payload is small enough to be copied
  into the rspq
- only update the coalesce timer if for a queue if packets were taken off of it
- add in missed 20ms DELAY in initializaton vsc8211

- prompt MFC as this only applies to the 4-port which is currently completely
  broken - OK'd by kensmith

Supported by: Chelsio
Approved by: re (blanket)
MFC after: 0 days
2007-08-25 21:07:37 +00:00
Sam Leffler
d72c72537e drop frames marked for encryption when no key is available
Reviewed by:	avatar
Approved by:	re (kensmith)
Obtained from:	madwifi
2007-08-24 15:44:27 +00:00
Randall Stewart
c4739e2f47 - Fix address add handling to clear cached routes and source addresses
when peer acks the add in case the routing table changes.
- Fix sctp_lower_sosend to send shutdown chunk for mbuf send
  case when sndlen = 0 and sinfoflag = SCTP_EOF
- Fix sctp_lower_sosend for SCTP_ABORT mbuf send case with null data,
  So that it does not send the "null" data mbuf out and cause
  it to get freed twice.
- Fix so auto-asconf sysctl actually effect the socket's asconf state.
- Do not allow SCTP_AUTO_ASCONF option to be used on subset bound sockets.
- Memset bug in sctp_output.c (arguments were reversed) submitted
  found and reported by Dave Jones (davej@codemonkey.org.uk).
- PD-API point needs to be invoked >= not just > to conform to socket api
  draft this fixes sctp_indata.c in the two places need to be >=.
- move M_NOTIFICATION to use M_PROTO5.
- PEER_ADDR_PARAMS did not fail properly if you specify an address
  that is not in the association with a valid assoc_id. This meant
  you got or set the stcb level values instead of the destination
  you thought you were going to get/set. Now validate if the
  stcb is non-null and the net is NULL that the sa_family is
  set and the address is unspecified otherwise return an error.
- The thread based iterator could crash if associations were freed
  at the exact time it was running. rework the worker thread to
  use the increment/decrement to prevent this and no longer use
  the markers that the timer based iterator uses.
- Fix the memleak in sctp_add_addr_to_vrf() for the case when it is
  detected that ifa is already pointing to a ifn.
- Fix it so that if someone is so insane that they drop the
  send window below the minimal add mark, they still can send.
- Changed all state for associations to use mask safe macro.
- During front states in association freeing in sctp_inpcbfree, we
  had a locking problem where locks were not in place where they
  should have been.
- Free association calls were not testing the return value in
  sctp_inpcb_free() properly... others should be cast  void returns
  where we don't care about the return value.
- If a reference count is held on an assoc, even from the "force free"
  we should not do the actual free.. but instead let the timer
  free it.
- When we enter sctp_input(), if the SCTP_ASOC_ABOUT_TO_BE_FREED
  flag is set, we must NOT process the packet but handle it like
  ootb. This is because while freeing an assoc we release the
  locks to get all the higher order locks so we can purge all
  the hash tables. This leaves a hole if a packet comes in
  just at that point. Now sctp_common_input_processing() will
  call the ootb code in such a case.
- Change MBUF M_NOTIFICATION to use M_PROTO5 (per Sam L). This makes
  it so we don't have a conflict (I think this is a covertity change).
  We made this change AFTER some conversation and looking to make sure
  that M_PROTO5 does not have a problem between SCTP and the 802.11
  stuff (which is the only other place its used).
- Fixed lock order reversal and missing atomic protection around
  locked_tcb during association lookup and the 1-2-1 model.
- Added debug to source address selection.
- V6 output must always do checksum even for loopback.
- Remove more locks around inp that are not needed for an atomically
  added/subtracted ref count.
- slight optimization in the way we zero the array in sctp_sack_check()
- It was possible to respond to a ABORT() with bad checksum with
  a PKT-DROP. This lead to a PKT-DROP/ABORT war. Add code to NOT
  send a PKT-DROP to any ABORT().
- Add an option for local logging (useful for macintosh or when
  you need better performing during debugging). Note no commands
  are here to get the log info, you must just use kgdb.
- The timer code needs to be aware of if it needs to call
  sctp_sack_check() to slide the maps and adjust the cum-ack.
  This is because it may be out of sync cum-ack wise.
- Added threshold managment logging.
- If the user picked just the right size, that just filled the send
  window minus one mtu, we would enter a forever loop not copying and
  at the same time not blocking. Change from < to <= solves this.
- Sysctl added to control the fragment interleave level which defaults
  to 1.
- My rwnd control was not being used to control the rwnd properly (we
  did not add and subtract to it :-() this is now fixed so we handle
  small messages (1 byte etc) better to bring our rwnd down more
  slowly.

Approved by:	re@freebsd.org (Bruce Mah)
2007-08-24 00:53:53 +00:00
Ed Maste
afa3f6df27 Add PCI IDs for two cards:
- Adaptec RAID 3405
- Adaptec RAID 3805

Approved by:	re (bmah)
Submitted by:	John Marra  jmarra at nmu dot edu
MFC After:	1 week
2007-08-23 20:12:40 +00:00
Maksim Yevmenkin
d46210e60d Return EADDRNOTAVAIL instead of EDESTADDRREQ error when
listen(2) is called on improperly bound socket.

Suggested by:	Iain Hibbert
Approved by:	re (kensmith)
MFC after:	3 days
2007-08-23 16:55:22 +00:00
Jung-uk Kim
fada2376b8 Export 4Gbps Fibre Channel link speed correctly with inquiry commands.
Approved by:	re (kensmith)
MFC after:	3 days
2007-08-23 15:57:13 +00:00
Dag-Erling Smørgrav
5afb221c66 Style nits + more reliable Tj(max) detection + improved reporting of
critical temperature + sched_unbind() after rdmsr + initialize sc_dev.

Submitted by:	Rui Paulo <rpaulo@fnop.net>, cnst
Approved by:	re (kensmith)
2007-08-23 10:53:03 +00:00
Daniel Hartmeier
7f368082ad When checking the sequence number of a TCP header embedded in an
ICMP error message, do not access th_flags. The field is beyond
the first eight bytes of the header that are required to be present
and were pulled up in the mbuf.

A random value of th_flags can have TH_SYN set, which made the
sequence number comparison not apply the window scaling factor,
which led to legitimate ICMP(v6) packets getting blocked with
"BAD ICMP" debug log messages (if enabled with pfctl -xm), thus
breaking PMTU discovery.

Triggering the bug requires TCP window scaling to be enabled
(sysctl net.inet.tcp.rfc1323, enabled by default) on both end-
points of the TCP connection. Large scaling factors increase
the probability of triggering the bug.

PR:		kern/115413: [ipv6] ipv6 pmtu not working
Tested by:	Jacek Zapala
Reviewed by:	mlaier
Approved by:	re (kensmith)
2007-08-23 09:30:58 +00:00
Andrew Gallatin
c587e59f20 - Fix a bug which could cause a panic when enabling LRO
on an down mxge interface
- Fix a bug where mxge reported the link state as
   active when it wasn't (after ifconfig down).
- Prevent spurious watchdog resets when link partner is not consuming
- Add support for CX4 and popular XFP media detection
- Update the firmware and associated header files to 1.4.25

Approved by: re (kensmith)
2007-08-22 13:22:12 +00:00
Joseph Koshy
ea49750231 Assign sizes to assembly language support functions.
Approved by:	re (kensmith)
2007-08-22 05:06:14 +00:00
Joseph Koshy
298889efcb Define an END() macro for use in i386 and amd64 assembly code, akin
to the one available on the ia64, sparc64, and sun4v architectures.

Approved by:	re (kensmith)
2007-08-22 04:26:07 +00:00
Konstantin Belousov
046ea980e1 Properly initialize the dev_priv before calling the i915_dma_cleanup().
This fixes my rev. 1.5.

Reviewed by:	anholt
Approved by:	re (kensmith)
MFC after:	2 weeks
2007-08-21 12:52:57 +00:00
Alan Cox
8beae25391 In general, when we map a page into the kernel's address space, we no
longer create a pv entry for that mapping.  (The two exceptions are
mappings into the kernel's exec and pipe submaps.)  Consequently, there is
no reason for get_pv_entry() to dig deep into the free page queues, i.e.,
use VM_ALLOC_SYSTEM, by default.  This revision changes get_pv_entry() to
use VM_ALLOC_NORMAL by default, i.e., before calling pmap_collect() to
reclaim pv entries.

Approved by:	re (kensmith)
2007-08-21 04:59:34 +00:00
Olivier Houchard
7dd9c45f26 Some times ago, vfs_getopts() was changed, so that it would set error to
ENOENT if the option wasn't provided, instead of setting it to 0.
xfs however didn't catch up on this, so it assumed something went bad if
vfs_getopts() sets the error to non-zero, and just returns the error.
Unbreak xfs mount by just ignoring the error if vfs_getopts() sets the
error to ENOENT, as we should have sane defaults.

Reviewed by:    kan
Approved by:    re (rwatson)
Tested by:      rpaulo
2007-08-20 15:33:22 +00:00
Konstantin Belousov
d239bd3ccc Do not drop vm_map lock between doing vm_map_remove() and vm_map_insert().
For this, introduce vm_map_fixed() that does that for MAP_FIXED case.

Dropping the lock allowed for parallel thread to occupy the freed space.

Reported by:	Tijl Coosemans <tijl ulyssis org>
Reviewed by:	alc
Approved by:	re (kensmith)
MFC after:	2 weeks
2007-08-20 12:05:45 +00:00
Konstantin Belousov
5114048b63 Destroy the kaio_mtx on the freeing the struct kaioinfo in the
aio_proc_rundown.

Do not allow for zero-length read to be passed to the fo_read file method
by aio.

Reported and tested by:	Peter Holm
Approved by:	re (kensmith)
2007-08-20 11:53:26 +00:00
Jeff Roberson
67e20930bd - Improve runq_findbit_from() which is used by ULE's circular queue. Mask
of the bits we want to ignore on the first pass rather than doing a
   linear scan.  This puts us within a few instructions of the cost of
   runq_findbit() and removes this function from the top of profiling output
   for context switch heavy workloads.

Approved by:	re
2007-08-20 06:36:12 +00:00
Jeff Roberson
9862717afe - Set steal_thresh to log2(ncpus). This improves idle-time load balancing
on 2cpu machines by reducing it to 1 by default.  This improves loaded
   operation on 8cpu machines by increasing it to 3 where the extra idle
   time is not as critical.

Approved by:	re
2007-08-20 06:34:20 +00:00
Nate Lawson
62db376af3 Always call sched_bind(), even if on the CPU in question. It is wrong to
check if we're already on that cpu and skip the bind since the thread could
be migrated off in the meantime.

Suggested by:	jeff
Approved by:	re
2007-08-20 06:28:26 +00:00
Nate Lawson
2145b9d207 Use a different loop variable for the inner loop. This previous reuse could
have caused a hang, but we got lucky with the available multi-CPU states
on actual hardware.

Submitted by:	Bjorn Koenig <bkoenig / alpha-tierchen.de>
Approved by:	re
MFC after:	3 days
2007-08-19 20:34:13 +00:00
Olivier Houchard
d3973c98d5 Just wbinv if both PREREAD and PREWRITE are set.
In PREREAD, just invalidate the cache lines, and do not write back them, if
the buffer is properly aligned.

Approved by:	re (blanket)
2007-08-18 16:47:28 +00:00
Konstantin Belousov
daab56673e Remove comment that is no longer quite true.
Noted by:	alc
Approved by:	re (kensmith)
2007-08-18 16:41:31 +00:00
Konstantin Belousov
efe7553ed7 Fix the phys_pager in the way similar to the rev. 1.83 of the
sys/vm/device_pager.c:

Protect the creation of the phys pager with non-NULL handle with the
phys_pager_mtx. Lookup of phys pager in the pagers list by handle is now
synchronized with its removal from the list, and phys_pager_mtx is put
before vm object lock in lock order. Dispose the phys_pager_alloc_lock
and tsleep calls, together with acquiring Giant, since phys_pager_mtx
now covers the same block.

Reviewed by:	alc
Approved by:	re (kensmith)
2007-08-18 16:40:33 +00:00
Andrew Thompson
11eeea5e85 If the STP state machine is stopped then clear the bridge-id and root-id.
Approved by:	re (kensmith)
2007-08-18 12:06:13 +00:00
Alexander Motin
3fb87c2411 Add ng_send_fn() error handeling inside ng_con_nodes().
Without it some errors may left unnoticed and unhandeled
that will lead to hooks left in half-connected state.

Reviewed by:	julian@
Approved by:	re (kensmith), glebius (mentor)
2007-08-18 11:59:17 +00:00
Warner Losh
eb2e7f82ff Don't pass RB_BOOTINFO to the kernel. There's no bootinfo actually
passed into the kernel, and the kernel will soon grow that ability on
arm.

Approved by: re@ (blanket)
2007-08-17 18:22:31 +00:00
Kip Macy
7aff6d8ed3 forward port signedness fixes from RELENG_6
fix compile error for case where MSI_SUPPORTED not defined

Approved by: re (blanket)
2007-08-17 05:57:04 +00:00
Hidetoshi Shimokawa
ff038e3a82 We don't need to call dcons_poll event handlers if KDB is not active.
Approved by: re (kensmith)
2007-08-17 05:32:39 +00:00
Pawel Jakub Dawidek
70eaa4219c Some ZFS threads needs stack larger than the default 8kB, so use 16kB of
alternate stack if the default is smaller than 16kB.

Approved by:	re (rwatson)
2007-08-16 20:33:20 +00:00
Xin LI
1f32d0127b MFp4: rework tmpfs_readdir() logic in terms of correctness.
Approved by:	re (tmpfs blanket)
Tested with:	fstest, fsx
2007-08-16 11:00:07 +00:00
David Xu
6ec46f7aa8 Regenerate.
Approved by: re(kensmith)
2007-08-16 05:32:26 +00:00
David Xu
81ca5b4257 Add thr_kill2 compat32 syscall.
Submitted by: Tijl Coosemans tijl at ulyssis dot org
Approved by: re (kensmith)
2007-08-16 05:30:04 +00:00
David Xu
0b1f0611b4 Add thr_kill2 syscall which sends a signal to a thread in another process.
Submitted by: Tijl Coosemans tijl at ulyssis dot org
Approved by: re (kensmith)
2007-08-16 05:26:42 +00:00
Randall Stewart
2dad8a55be - Remove extra comment for 7.0 (no GIANT here).
- Remove unneeded WLOCK/UNLOCK of inp for getting TCB lock.
- Fix panic that may occur when freeing an assoc that has partial
  delivery in progress (may dereference null socket pointer when
  queuing partial delivery aborted notification)
- Some spacing and comment fixes.
- Fix address add handling to clear cached routes and source addresses
  when peer acks the add in case the routing table changes.
Approved by:	re@freebsd.org (Bruce Mah)
2007-08-16 01:51:22 +00:00
Qing Li
8cb5ba02d8 Use the sequence number comparison macro to compare
projected_offset against isn_offset to account for
wrap around.

Reviewed by:	gnn, kmacy, silby
Submitted by:	yusheng.huang@bluecoat.com
Approved by:	re
MFC:		3 days
2007-08-16 01:35:55 +00:00
Dag-Erling Smørgrav
83d18f2283 Add a driver for the on-die digital thermal sensor found on Intel Core
and newer CPUs (including Core 2 and Core / Core 2 based Xeons).  The
driver attaches to each cpu device and creates a sysctl node in that
device's sysctl context (dev.cpu.N.temperature).  When invoked, the
handler binds to the appropriate CPU to ensure a correct reading.

Submitted by:	Rui Paulo <rpaulo@fnop.net>
Sponsored by:	Google Summer of Code 2007
Tested by:	des, marcus, Constantine A. Murenin, Ian FREISLICH
Approved by:	re (kensmith)
MFC after:	3 weeks
2007-08-15 19:26:03 +00:00
John Baldwin
1dc5b1cc56 On 6.x this works:
% mount | grep home
/dev/ad4s1e on /home (ufs, local, noatime, soft-updates)
% mount -u -o atime /home
% mount | grep home
/dev/ad4s1e on /home (ufs, local, soft-updates)

Restore this behavior for on 7.x for the following mount options:
noatime, noclusterr, noclusterw, noexec, nosuid, nosymfollow

In addition, on 7.x, the following are equivalent:
mount -u -o atime /home
mount -u -o nonoatime /home

Ideally, when we introduce new mount options, we should avoid
options starting with "no". :)

Requested by:	jhb
Reported by:	Karol Kwiat <karol.kwiat gmail com>, Scott Hetzel <swhetzel gmail com>
Approved by:	re (bmah)
Proxy commit for:	rodrigc
2007-08-15 17:40:09 +00:00
Scott Long
9adc3a2dfb Move callout initialization to the proper spot. This prevents panics during
error recovery.

Approved by: re
Found by: kan
2007-08-14 19:17:35 +00:00
Pyun YongHyeon
c4aca09a2a Make sure to take PHY out of power down mode in device attach.
Without this the PHY wouldn't work as expected. This should fix
dual-boot Windows XP machine where RealTek Windows drivers put the
PHY in power down mode during shutdown. The magic PHY register
accesses come from RealTek driver. No datasheets mention the magic
PHY registers.
In general, the PHY wakeup code should go into PHY driver. However it
seems that it only apply to RTL8169S single chip and it would be
another hack if we have rgephy(4) check what parent driver/chip model
is attached.

Reported by:	lofi, Laurens Timmermans ( laurens AT timkapel DOT nl )
Tested by:	lofi
Obtained from:	RealTek FreeBSD driver
Approved by:	re (Ken Smith)
2007-08-14 02:00:04 +00:00
Pawel Jakub Dawidek
354eb80141 Improve vn_printf() by:
- adding missing vnode flags,
- printing unknown flags as numbers,
- using strlcat() instead of strcat().

Approved by:	re (bmah)
2007-08-13 21:23:30 +00:00
John Baldwin
cde586a75c Fix a few nits relative to the previous changes:
- Don't leak the config lock if detach() fails due to the controller char
  dev being open.
- Close a race between detach() and a process opening the controller char
  dev.

MFC after:	1 week
Approved by:	re (bmah)
2007-08-13 21:14:16 +00:00
John Baldwin
8ec5c98ba4 Teach the mfi(4) driver to handle requests from userland management
applications to add and remove volumes.

MFC after:	1 week
Approved by:	re (bmah)
Reviewed by:	ambrisko, scottl
2007-08-13 19:29:17 +00:00
Dag-Erling Smørgrav
438dafbbcf Update to support ICH[678] chipsets (based on a patch by Takeharu KATO)
Fix a resource allocation bug (explained by jhb on -acpi)
Thanks for Mike Tancsa for testing and helping track down the bug.

Approved by:	re (kensmith)
MFC after:	3 weeks
2007-08-13 18:52:37 +00:00
John Baldwin
14657ee81f Expand the data structure returned by the ATA RAID status ioctl to include
detailed status on each of the backing subdisks.  This allows userland
to see which subdisks are online, failed, missing, or a hot spare.

MFC after:	1 week
Approved by:	re (bmah)
Reviewed by:	sos
2007-08-13 18:46:31 +00:00
Maksim Yevmenkin
51713b2a7b Make ng_h4(4) MPSAFE. Use similar to ng_tty(4) locking strategy.
Reconnect ng_h(4) back to the build.

Reviewed by:	kensmith
Approved by:	re (kensmith)
MFC after:	1 month
2007-08-13 17:19:28 +00:00
Don Lewis
4d54b88811 Replace three copies of the host controller reset sequence that
differ in their details with calls to a new function, ehci_hcreset(),
that performs the reset.

The original sequences either had no delay or a 1ms delay between
telling the controller to stop and asserting the controller reset
bit.  One instance of the original reset sequence waited for the
controller to indicate that its reset was complete before continuing,
but the other two immediately let the subsequent code execute.  The
latter is a problem on some hardware, because a read of the HCCPARAMS
register returns an incorrect value while the reset is in progress,
which triggers an infinite loop in ehci_pci_givecontroller(), which
hangs the system on shutdown.

The reset sequence in ehci_hcreset() starts with the most complete
instance from the original code, which contains a loop to wait for
the controller to indicate that its reset is complete.   This appears
to be the correct thing to do according to "Enhanced Host Controller
Interface Specification for Universal Serial Bus" revision 1.0,
section 2.3.1.  Add another loop to wait for the controller to
indicate that it has stopped before setting the HCRESET bit.  This
is required by the section 2.3.1 in the specification, which says
that setting HCRESET before the controller has halted "will result
in undefined behaviour".

Reviewed by:	imp (previous patch version without the extra wait loop)
Tested by:	se  (previous patch version without the extra wait loop)
Approved by:	re (bmah)
MFC after:	1 week
2007-08-12 18:45:24 +00:00
Marcel Moolenaar
77d40ffd98 Revamp the interrupt handling in support of INTR_FILTER. This includes:
o  Revamp the PIC I/F to only abstract the PIC hardware. The
   resource handling has been moved to nexus, where it belongs.
o  Include EOI and MASK+EOI methods to the PIC I/F in support of
   INTR_FILTER.
o  With the allocation of interrupt resources and setup of
   interrupt handlers in the common platform code we can delay
   talking to the PIC hardware after enumeration of all devices.
   Introduce a call to powerpc_intr_enable() in configure_final()
   to achieve that and have powerpc_setup_intr() only program the
   PIC when !cold.
o  As a consequence of the above, remove all early_attach() glue
   from the OpenPIC and Heathrow PIC drivers and have them
   register themselves when they're found during enumeration.
o  Decouple the interrupt vector from the interrupt request line.
   Allocate vectors increasingly so that they can be used for
   the intrcnt index as well. Extend the Heathrow PIC driver to
   translate between IRQ and vector. The OpenPIC driver already
   has the support for vectors in hardware.

Approved by: re (blanket)
2007-08-11 19:25:32 +00:00
Kip Macy
93cccbf874 White space cleanups
Approved by: re (blanket)
2007-08-10 23:47:39 +00:00
Kip Macy
6b68e276ce - In all structures other than port info port is a pointer to a port info,
make the code less confusing by renaming the port number to port_id

Approved by: re (blanket)
2007-08-10 23:33:34 +00:00
Xin LI
ad3638ee08 MFp4:
- LK_RETRY prohibits vget() and vn_lock() to return error.
   Remove associated code. [1]
 - Properly use vhold() and vdrop() instead of their unlocked
   versions, we are guaranteed to have the vnode's interlock
   unheld. [1]
 - Fix a pseudo-infinite loop caused by 64/32-bit arithmetic
   with the same way used in modern NetBSD versions. [2]
 - Reorganize tmpfs_readdir to reduce duplicated code.

Submitted by:	kib [1]
Obtained from:	NetBSD [2]
Approved by:	re (tmpfs blanket)
2007-08-10 11:00:30 +00:00
Xin LI
0ae6383d39 MFp4:
- Respect cnflag and don't lock vnode always as LK_EXCLUSIVE [1]
 - Properly lock around tn_vnode to avoid NULL deference
 - Be more careful handling vnodes (*)

(*) This is a WIP
[1] by pjd via howardsu

Thanks kib@ for his valuable VFS related comments.

Tested with:	fsx, fstest, tmpfs regression test set
Found by:	pho's stress2 suite
Approved by:	re (tmpfs blanket)
2007-08-10 05:24:49 +00:00
Nate Lawson
3b3f28135f Add "show sysregs" command to ddb. On i386, this gives gdt, idt, ldt,
cr0-4, etc.  Support should be added for other platforms that have a
different set of registers for system use.

Loosely based on: OpenBSD
Approved by:	re
2007-08-09 20:14:35 +00:00
Tai-hwa Liang
c7f6197937 MFP4(123963): Fixing a possible NULL pointer dereference by making
the actual assignment after the NULL check.

Found by:	Coverity Prevent(tm)
CID:		2303 (run 4156)
Reviewed by:	sam
Approved by:	re (bmah)
2007-08-09 13:29:26 +00:00
Warner Losh
4ced8fb56a Use the .S version for now. I have a version optimized for size p4,
but I'm unsure of its provenance, so rather than add it here, revert
the migration to it.

Approved by: re@ (blanket)
2007-08-09 05:16:55 +00:00
Warner Losh
d8e3f30539 Merge in the AX88178 and AX88772 register definions (along with
rename) from OpenBSD.  This also dribbles in a few fields from OpenBSD
as well.

Approved by: re@ (blanket)
Obtained from: OpenBSD
2007-08-09 04:40:07 +00:00
Marcel Moolenaar
69fc43c03b Compile ipfilter:ip_lookup.c without -Werror. The file contains
a test that assumes that char is signed by default and causes a
warning with GCC 4.2 on PowerPC.
A patch has been sent to the maintainer that addresses this.

Approved by: re (blanket)
2007-08-09 01:11:21 +00:00
Marcel Moolenaar
b66623109d Re-enable -Werror for PowerPC. This should really be unconditional again.
Approved by: re (blanket)
2007-08-08 19:12:06 +00:00
Olivier Houchard
4739da977b Ooops, we need to define TD_LOCK here.
Approved by:	re (blanket)
Pointy hat to:	cognet
2007-08-08 09:27:52 +00:00
Marcel Moolenaar
fc37ccb390 Re-enable external interrupts for faults, traps and syscalls.
Approved by: re (blanket)
2007-08-08 01:19:12 +00:00
Marcel Moolenaar
4f5d8660e5 Eliminate <machine/interruptvar.h> as it has only a single
prototype. In the future that prototype will not be needed
at all anyway, but for now it's moved to intr_machdep.h.

Approved by: re (blanket)
2007-08-07 23:33:35 +00:00
Marcel Moolenaar
0201e3e97b Remove redundant prototype.
Approved by: re (blanket)
2007-08-07 18:40:02 +00:00
Marcel Moolenaar
ad9503cd37 Add prototype for trap().
Approved by: re (blanket)
2007-08-07 18:39:28 +00:00
Olivier Houchard
f7b55b6053 Add cast to silent gcc warnings.
Approved by:	re (blanket)
2007-08-07 18:37:21 +00:00
Olivier Houchard
362a46e4f6 Use the third argument of cpu_switch(), as done for i386/amd63, as it is
required for ULE.

Approved by:	re (blanket)
2007-08-07 18:20:55 +00:00
Konstantin Belousov
deea654ebf Protect the creation of the device pager with the dev_pager_mtx. Lookup
of device pager in the pagers list by handle is now synchronized with
its removal from the list, and dev_pager_mtx is put before vm object
lock in lock order. Dispose the dev_pager_sx lock, since dev_pager_mtx
now covers the same block.

Noted by:	kensmith
Reviewed by:	alc
Approved by:	re (kensmith)
2007-08-07 15:36:25 +00:00
Tai-hwa Liang
07b6a9bed8 MFP4(123687): Closing another LOR by dropping the driver lock around calls
to if_input().

Reviewed by:	ambrisko
Tested by:	dhw
Approved by:	re (kensmith)
2007-08-07 12:26:19 +00:00
Bruce Evans
a4e6807c49 In msdosfs_read() and msdosfs_write(), don't check explicitly for
(uio_offset < 0) since this can't happen.  If this happens, then the
general code handles the problem safely (better than before for reading,
returning 0 (EOF) instead of the bogus errno EINVAL, and the same as
before for writing, returning EFBIG).

In msdosfs_read(), don't check for (uio_resid < 0).  msdosfs_write()
already didn't check.

In msdosfs_read(), document in a comment our assumptions that the caller
passed a valid uio_offset and uio_resid.  ffs checks using KASSERT(),
and that is enough sanity checking.  In the same comment, partly document
there is no need to check for the EOVERFLOW case, unlike in ffs where this
case can happen at least in theory.

In msdosfs_write(), add a comment about why the checking of
(uio_resid == 0) is explicit, unlike in ffs.

In msdosfs_write(), check for impossibly large final offsets before
checking if the file size rlimit would be exceeded, so that we don't
have an overflow bug in the rlimit check and are consistent with ffs.
We now return EFBIG instead of EFBIG plus a SIGXFSZ signal if the final
offset would be impossibly large but not so large as to cause overflow.
Overflow normally gave the benign behaviour of no signal.

Approved by:	re (kensmith) (blanket)
2007-08-07 10:35:27 +00:00
Konstantin Belousov
004e08be60 Do not call free() while holding vnode interlock.
Reported and tested by:	Peter Holm
Reviewed by:	jeff
Approved by:	re (kensmith)
2007-08-07 09:04:50 +00:00
Bruce Evans
b7837a91c9 Fix and update the comments about the effect of the read-only flag on writing.
They are still too verbose.

Remove nearby unreachable code for handling symlinks.

Approved by:	re (kensmith) (blanket)
2007-08-07 05:42:10 +00:00
Bruce Evans
e3117f852e Fix some style bugs (don't assume that off_t == int64_t; fix some comments;
remove some parentheses; fix some whitespace errors; fix only one case of
a boolean comparison of a non-boolean).

Improve an error message by quoting ".", and by not printing large positive
values as negative ones.

Approved by:	re (kensmith) (blanket)
2007-08-07 03:59:49 +00:00
Bruce Evans
c0f5121cac Fix some style bugs (don't assume that off_t == int64_t; fix some comments;
remove some parentheses; fix only a couple of whtespace errors).

Approved by:	re (kensmith) (blanket)
2007-08-07 03:43:28 +00:00
Bruce Evans
2d7c6b2724 Fix some style bugs (mainly some whitespace errors).
Approved by:	re (kensmith) (blanket)
2007-08-07 03:38:36 +00:00
Bruce Evans
b6d0381e7e Fix some style bugs (some whitespace errors only).
Approved by:	re (kensmith) (blanket)
2007-08-07 03:22:10 +00:00
Bruce Evans
d2bb66bacd Sort includes.
Remove rotted banal comment attached to includes.

Approved by:	re (kensmith) (blanket)
2007-08-07 02:28:33 +00:00
Bruce Evans
6becd1c855 Sort includes.
Remove banal comments attached to includes.

Approved by:	re (kensmith) (blanket)
2007-08-07 02:27:35 +00:00
Bruce Evans
5696c6e0b2 Sort includes.
Remove banal comments before includes.  Remove rotted banal comments attached
to includes.

Approved by:	re (kensmith) (blanket)
2007-08-07 02:20:37 +00:00
Bruce Evans
9b0802c90b Remove unused include(s).
Remove banal comments before includes.

Approved by:	re (kensmith) (blanket)
2007-08-07 02:11:16 +00:00
Bruce Evans
a878a31c13 Remove unused include(s).
Approved by:	re (kensmith) (blanket)
2007-08-07 02:08:06 +00:00
Bruce Evans
eba34270fa Include <sys/mutex.h> and its prerequisite <sys/lock.h> instead of
depending on namespace pollution in <sys/buf.h> and/or <sys/vnode.h>

Approved by:	re (kensmith) (blanket)
2007-08-07 01:40:27 +00:00
Bruce Evans
1103771d95 Include <sys/mutex.h>'s prerequisite <sys/lock.h> instead of depending on
namespace pollution in <sys/vnode.h>.

Sort the include of <sys/mutex.h> instead of unsorting it after
<sys/vnode.h> and depending on the pollution there.

Approved by:	re (kensmith) (blanket)
2007-08-07 01:37:59 +00:00
Bruce Evans
6fd81fc7a6 Remove unused include(s).
Approved by:	re (kensmith) (blanket)
2007-08-07 01:07:16 +00:00
Christian S.J. Peron
b244c8ad14 Over the past couple of years, there have been a number of reports relating
the use of divert sockets to dead locks.  A number of LORs have been reported
between divert and a number of other network subsystems including: IPSEC, Pfil,
multicast, ipfw and others.  Other dead locks could occur because of recursive
entry into the IP stack.  This change should take care of most if not all of
these issues.

A summary of the changes follow:

- We disallow multicast operations on divert sockets.  It really doesn't make
  semantic sense to allow this, since typically you would set multicast
  parameters on multicast end points.

  NOTE: As a part of this change, we actually dis-allow multicast options on
  any socket that IS a divert socket OR IS NOT a SOCK_RAW or SOCK_DGRAM family

- We check to see if there are any socket options that have been specified on
  the socket, and if there was (which is very un-common and also probably
  doesnt make sense to support) we duplicate the mbuf carrying the options.

- We then drop the INP/INFO locks over the call to ip_output().  It should be
  noted that since we no longer support multicast operations on divert sockets
  and we have duplicated any socket options, we no longer need the reference
  to the pcb to be coherent.

- Finally, we replaced the call to ip_input() to use netisr queuing.  This
  should remove the recursive entry into the IP stack from divert.

By dropping the locks over the call to ip_output() we eliminate all the lock
ordering issues above.  By switching over to netisr on the inbound path,
we can no longer recursively enter the ip_input() code via divert.

I have tested this change by using the following command:

ipfwpcap -r 8000 - | tcpdump -r - -nn -v

This should exercise the input and re-injection (outbound) path, which is
very similar to the work load performed by natd(8).  Additionally, I have
run some ospf daemons which have a heavy reliance on raw sockets and
multicast.

Approved by:	re@ (kensmith)
MFC after:	1 month
LOR:		163
LOR:		181
LOR:		202
LOR:		203
Discussed with:	julian, andre et al (on freebsd-net)
In collaboration with:	bms [1], rwatson [2]

[1] bms helped out with the multicast decisions
[2] rwatson submitted the original netisr patches and came up with some
    of the original ideas on how to combat this issue.
2007-08-06 22:06:36 +00:00
Randall Stewart
63981c2b40 - change number assignments for SHA225-512 (match artisync
for bakeoff.. using the next sequential ones)
- In cookie processing 1-2-1, we did not increment the stcb
  refcnt before releasing the tcb lock. We need to do this
  to keep the tcb from being freed by a abort or ?? unlikely
  but worth doing. Also get rid of unneed INP_WLOCK.
- extra receive info included the rcvinfo which killed the
  padding/alignment. We now redefine all the fields properly
  so they both align properly both to 128 bytes.
- A peeled off socket would not close without an error due to
  its misguided idea that sctp_disconnect() was not supported
  on it. This fixes it so it goes through the proper path.
- When an assoc was being deleted after abort (via a timer) a
  small race condition exists where we might take a packet for
  the old assoc (since we are waiting for a cleanup timer). This
  state especially happens in mac. We now add a state in the asoc
  so these can properly handle the packet as OOTB.
Approved by:	re@freebsd.org(Ken Smith)
2007-08-06 15:46:46 +00:00
Robert Watson
0bf686c125 Remove the now-unused NET_{LOCK,UNLOCK,ASSERT}_GIANT() macros, which
previously conditionally acquired Giant based on debug.mpsafenet.  As that
has now been removed, they are no longer required.  Removing them
significantly simplifies error-handling in the socket layer, eliminated
quite a bit of unwinding of locking in error cases.

While here clean up the now unneeded opt_net.h, which previously was used
for the NET_WITH_GIANT kernel option.  Clean up some related gotos for
consistency.

Reviewed by:	bz, csjp
Tested by:	kris
Approved by:	re (kensmith)
2007-08-06 14:26:03 +00:00
Marcel Moolenaar
ec2af96ad1 Clear pending interrupts before we enable external interrupts.
Recently the AP in my Merced box seems to have grown a habit
of getting unexpected interrupts, such as redundant wake-ups
and legacy interrupts that require an INTA cycle.

While here, replace DELAY(0) with cpu_spinwait() so that it's
clear what we're doing as well as enable the code to take
advantage of cpu_spinwait() when it gets implemented.

Approved by: re (blanket)
2007-08-06 05:15:57 +00:00
Marcel Moolenaar
78afae27e5 Keep interrupts disabled while handling external interrupts.
There's no advantage in allowing nested external interrupts.
In fact, it leads to a potential stack overrun.

While here, put the interrupt vector in the trapframe, so as
to compensate for the 36 cycle latency of reading cr.ivr.

Further simplify assembly code by dealing with ASTs from C.

Approved by: re (blanket)
2007-08-06 05:11:01 +00:00
Alan Cox
b5e8f167b9 Consider a scenario in which one processor, call it Pt, is performing
vm_object_terminate() on a device-backed object at the same time that
another processor, call it Pa, is performing dev_pager_alloc() on the
same device.  The problem is that vm_pager_object_lookup() should not be
allowed to return a doomed object, i.e., an object with OBJ_DEAD set,
but it does.  In detail, the unfortunate sequence of events is: Pt in
vm_object_terminate() holds the doomed object's lock and sets OBJ_DEAD
on the object.  Pa in dev_pager_alloc() holds dev_pager_sx and calls
vm_pager_object_lookup(), which returns the doomed object.  Next, Pa
calls vm_object_reference(), which requires the doomed object's lock, so
Pa waits for Pt to release the doomed object's lock.  Pt proceeds to the
point in vm_object_terminate() where it releases the doomed object's
lock.  Pa is now able to complete vm_object_reference() because it can
now complete the acquisition of the doomed object's lock.  So, now the
doomed object has a reference count of one!  Pa releases dev_pager_sx
and returns the doomed object from dev_pager_alloc().  Pt now acquires
dev_pager_mtx, removes the doomed object from dev_pager_object_list,
releases dev_pager_mtx, and finally calls uma_zfree with the doomed
object.  However, the doomed object is still in use by Pa.

Repeating my key point, vm_pager_object_lookup() must not return a
doomed object.  Moreover, the test for the object's state, i.e.,
doomed or not, and the increment of the object's reference count
should be carried out atomically.

Reviewed by:	kib
Approved by:	re (kensmith)
MFC after:	3 weeks
2007-08-05 21:04:32 +00:00
Marcel Moolenaar
e54994f990 In ia64_set_rr(), don't perform data serialization. This allows
us to do the data serializations once after writing multiple
region registers, as is done in pmap_switch(). All existing
calls to ia64_set_rr() are followed with calls to ia64_srlz_d().

Approved by: re (blanket)
2007-08-05 18:19:38 +00:00
Bjoern A. Zeeb
cc977adc71 Rename option IPSEC_FILTERGIF to IPSEC_FILTERTUNNEL.
Also rename the related functions in a similar way.
There are no functional changes.

For a packet coming in with IPsec tunnel mode, the default is
to only call into the firewall with the "outer" IP header and
payload.

With this option turned on, in addition to the "outer" parts,
the "inner" IP header and payload are passed to the
firewall too when going through ip_input() the second time.

The option was never only related to a gif(4) tunnel within
an IPsec tunnel and thus the name was very misleading.

Discussed at:			BSDCan 2007
Best new name suggested by:	rwatson
Reviewed by:			rwatson
Approved by:			re (bmah)
2007-08-05 16:16:15 +00:00
Bruce Evans
8d61a735c6 Silently fix up the estimated next free cluster number from the fsinfo
sector, instead of failing the whole mount if it is garbage.  Fields
in the fsinfo sector are only advisory, so there are better sanity
checks than this, and we already silently fix up the only other advisory
field in the fsinfo (the free cluster count).

This wasn't handled quite right in rev.1.92, 1.117, or in NetBSD.  1.92
also failed the whole mount for the non-garbage magic value 0xffffffff
1.117 fixed this well enough in practice since garbage values shouldn't
occur in practice, but left the error handling larger and more convoluted
than necessary.  Now we handle the magic value as a special case of
fixing up all out of bounds values.

Also fix up the estimated next free cluster number when there is no
fsinfo sector.  We were using 0, but CLUST_FIRST is safer.

Approved by:	re (kensmith)
2007-08-05 12:58:34 +00:00
Marius Strobl
6bbb5a106c - Divorce the IOTSBs, which so far where handled via a global list
instead of per IOMMU, so we no longer need to program all of them
  identically in systems having multiple IOMMUs. This continues the
  rototilling of the nexus(4) done about 5 months ago, which amongst
  others changed nexus(4) and the drivers for host-to-foo bridges
  to provide bus_get_dma_tag methods, allowing to handle DMA tags in
  a hierarchical way and to link them with devices.
  This still doesn't move the silicon bug workarounds for Sabre (and
  in the uncommitted schizo(4) for Tomatillo) bridges into special
  bus_dma_tag_create() and bus_dmamap_sync() methods though, as w/o
  fully newbus'ified bus_dma_tag_create() and bus_dma_tag_destroy()
  this still requires too much hackery, i.e. per-child parent DMA
  tags in the parent driver.
- Let the host-to-foo drivers supply the maximum physical address
  of the IOMMU accompanying the bridges. Previously iommu(4) hard-
  coded an upper limit of 16GB, which actually only applies to the
  IOMMUs of the Hummingbird and Sabre bridges. The Psycho variants
  as well as the U2S in fact can can translate to up to 2TB, i.e.
  translate to 41-bit physical addresses. According to the recently
  available Tomatillo documentation these bridges even translate to
  43-bit physical addresses and hints at the Schizo bridges doing
  43 bits as well.
  This fixes the issue the FreeBSD 6.0 todo list item "Max RAM on
  sparc64" was refering to and pretty much obsoletes the lack of
  support for bounce buffers on sparc64.

Thanks to Nathan Whitehorn for pointing me at the Tomatillo manual.

Approved by:	re (kensmith)
2007-08-05 11:56:44 +00:00
Marius Strobl
82a67a70a2 o In order to reduce bug and code duplication fold handling of NICs
requiring DC_TX_ALIGN or DC_TX_COALESCE, which was previously done
  in dc_start_locked(), into dc_encap().
o In dc_encap():
  - If m_defrag() fails just drop the packet like other NIC drivers
    do. This should only happen when there's a mbuf shortage, in which
    case it was possible to end up with an IFQ full of packets which
    couldn't be processed as they couldn't be defragmented as they
    were taking up all the mbufs themselves. This includes adjusting
    dc_start_locked() to not trying to prepend the mbuf (chain) if
    dc_encap() has freed it.
  - Likewise, if bus_dmamap_load_mbuf() fails as dc_dma_map_txbuf()
    failed, free the mbuf possibly allocated by the above call to
    m_defrag() and drop the packet.
o In dc_txeof():
  - Don't clear IFF_DRV_OACTIVE unless there are at least 6 free TX
    descriptors. Further down the road dc_encap() will bail if there
    are only 5 or fewer free TX descriptors, causing dc_start_locked()
    to abort and prepend the dequeued mbuf again so it makes no sense
    to pretend we could process mbufs again when in fact we won't.
    While at it replace this magic 5 with a macro DC_TX_LIST_RSVD.
  - Just always assign idx to sc->dc_cdata.dc_tx_cons; it doesn't
    make much sense to exclude the idx == sc->dc_cdata.dc_tx_cons
    case.
o In dc_dma_map_txbuf() there's no need to set sc->dc_cdata.dc_tx_err
  to error if the latter is != 0, bus_dmamap_load_mbuf() already
  returns the same error value in that case anyway.
o For less overhead, convert to use bus_dmamap_load_mbuf_sg() for
  loading RX buffers.
o Remove some banal and/or outdated comments.

Approved by:	re (kensmith)
MFC after:	1 week
2007-08-05 11:28:19 +00:00
Marius Strobl
9282563532 Initialize the rl_vlanctl field of the descriptors to zero (in order
to clear RL_TDESC_VLANCTL_TAG). This fixes sending packets in the
native VLAN when running both tagged and an untagged VLAN over the
same trunk and descriptors are recycled.

Approved by:	re (kensmith)
MFC after:	1 week
2007-08-05 11:20:33 +00:00
Konstantin Belousov
c6199d59e3 Do not acquire Giant unconditionally around the calls to the cdevsw
d_mmap methods. prep_cdevsw() already installs the shims that
acquire/drop Giant for the methods of a driver that specified the
D_NEEDGIANT flag.

Reviewed by:	alc
Approved by:	re (kensmith)
2007-08-05 05:40:52 +00:00
Andrew Thompson
dd04013007 - Ensure the path cost does not exceed 65535 in legacy STP mode.
- If the path cost is calculated when the link is down, set a pending flag so
  it is calculated again when it comes back up.
- To not use 00:00:00:00:00:00 as the bridge id, all interfaces are scanned and
  the lowest number wins. All zeros is too low.

Approved by:	re (rwatson)
2007-08-04 21:09:04 +00:00
Marcel Moolenaar
f5a9fc710a Replace "__asm __volatile()" by equivalent support functions from
ia64_cpu.h. This improves readability and consistency and aids in
auditing the code.
Add instruction-serialization after writing to cr.pta.

Delay enabling interrupts until after we setup the clocks and after
we program the task priority register.

Approved by: re (blanket)
2007-08-04 19:52:10 +00:00
Marcel Moolenaar
7c31469f67 Replace "__asm __volatile()" by equivalent support functions from
ia64_cpu.h. This improves readability and consistency and aids in
auditing the code.
Add data-serialization after writing to the region registers and
add instruction-serialization after writing to cr.pta.

Approved by: re (blanket)
2007-08-04 19:36:14 +00:00
Marcel Moolenaar
09363c3636 Replace "__asm __volatile()" by equivalent support functions from
ia64_cpu.h. This improves readability and consistency and aids in
auditing the code.
Add data-serialization after writing to cr.tpr.

Approved by: re (blanket)
2007-08-04 19:33:27 +00:00
Marcel Moolenaar
9d662e5c9d Add required data-serialization after writing to cr.itm and cr.itv.
Approved by: re (blanket)
2007-08-04 19:28:19 +00:00
Marcel Moolenaar
855218fbd1 Add ia64_srlz_d() and ia64_srlz_i() functions to aid in serialization.
Approved by: re (blanket)
2007-08-04 19:26:42 +00:00
Konstantin Belousov
a045dbb8ae Set D_NEEDGIANT.
Approved by:	phk
Approved by:	re (kensmith)
2007-08-04 17:43:11 +00:00
Jeff Roberson
3a78f9658b - Fix one line that erroneously crept in my last commit.
Approved by:	re
2007-08-04 01:21:28 +00:00
Jeff Roberson
c47f202b45 - Share scheduler locks between hyper-threaded cores to protect the
tdq_group structure.  Hyper-threaded cores won't really benefit from
   seperate locks anyway.
 - Seperate out the migration case from sched_switch to simplify the main
   switch code.  We only migrate here if called via sched_bind().
 - When preempted place the preempted thread back in the same queue at
   the head.
 - Improve the cpu group and topology infrastructure.

Tested by:	many on current@
Approved by:	re
2007-08-03 23:38:46 +00:00
Jeff Roberson
413ea6f543 - Set SW_PREEMPT when we preempt in critical_exit().
Approved by:	re
2007-08-03 23:35:35 +00:00
Bruce Evans
3726942956 Oops, fix the fix for the i/o size of the fsinfo block. Its log
message explained why the size is 1 sector, but the code used a
size of 1 cluster.

I/o sizes larger than necessary may cause serious coherency problems
in the buffer cache.  Here I think there were only minor efficiency
problems, since a too-large fsinfo buffer could only get far enough
to overlap buffers for the same vnode (the device vnode), so mappings
are coherent at the page level although not at the buffer level, and
the former is probably enough due to our limited use of the fsinfo
buffer.

Approved by:	re (kensmith)
2007-08-03 23:13:50 +00:00
Xin LI
fb7557140e MFp4 - Refine locking to eliminate some potential race/panics:
- Copy before testing a pointer.  This closes a race window.
 - Use msleep with the node interlock instead of tsleep.
 - Do proper locking around access to tn_vpstate.
 - Assert vnode VOP lock for dir_{atta,de}tach to capture
   inconsistent locking.

Suggested by:	kib
Submitted by:	delphij
Reviewed by:	Howard Su
Approved by:	re (tmpfs blanket)
2007-08-03 06:24:31 +00:00
Peter Wemm
b7778ae08f Move mp_topology() from apic_init(i386) and apic_setup_local(amd64) to
cpu_start_mp().  This is after we have read the cpuid registers to
calculate the hyperthreading_cpus value for the sysctl that enables or
disables hyperthread cores.  Change mp_topology() to use that information
rather than trying to do it itself.

This solves the problem of ULE being incorrectly told that dual core
Athlon64 X2 or Operton cpus are hyperthreading cores.  At the very least,
we now have a single piece of code to identify hyperthreading.

Obtained from:  jhb
Approved by:  re (kensmith)
2007-08-02 21:17:58 +00:00
Kevin Lo
0d45c918d2 Add the device ID for the VIA CX700 chipset.
Approved by: re (hrs)
2007-08-02 04:29:19 +00:00
Tai-hwa Liang
d28ab8736f MFP4(123686): Fixing various ancontrol(8) related panics by dropping locks
around copyin()/copyout().

Reviewed by:	sam, thompsa
Tested by:	dhw
Approved by:	re (kensmith)
2007-08-02 02:20:19 +00:00
Maksim Yevmenkin
acbfc85b17 Call ttyld_close() in nmdmclose() to ensure that nmdm(4)
closes line discipline installed onto /dev/nmdmX device.

Reviewed by:	julian
Approved by:	re (hrs)
MFC after:	3 days
2007-08-01 21:38:11 +00:00
Alexander Motin
d6fe462ac1 Add 64bit statistic counters to the ng_ppp node.
64bit counters are needed to simplify traffic accounting and
reduce system load at the big PPP concentrators.

Approved by:	re (rwatson), glebius (mentor)
2007-08-01 20:49:35 +00:00
Alexander Motin
e89c150775 This patch improves fine-grained locking for the ng_ppp node.
Till now node's transmit path was completely unprotected
and so wasn't thread safe in multilink mode. It's receive path was
declared as WRITER as the simpliest protection method but it
reduces performance when compression or encryption enabled.

Approved by:	re (rwatson), glebius (mentor)
2007-08-01 20:38:37 +00:00
Andrew Thompson
85ce729794 Add a bridge interface flag called PRIVATE where any private port can not
communicate with another private port.

All unicast/broadcast/multicast layer2 traffic is blocked so it works much the
same way as using firewall rules but scales better and is generally easier as
firewall packages usually do not allow ARP blocking.

An example usage would be having a number of customers on separate vlans
bridged with a server network. All the vlans are marked private, they can all
communicate with the server network unhindered, but can not exchange any
traffic whatsoever with each other.

Approved by:	re (rwatson)
2007-08-01 00:33:52 +00:00
Peter Wemm
c4a184bdc4 Change TCPTV_MIN to be independent of HZ. While it was documented to
be in ticks "for algorithm stability" when originally committed, it turns
out that it has a significant impact in timing out connections.  When we
changed HZ from 100 to 1000, this had a big effect on reducing the time
before dropping connections.

To demonstrate, boot with kern.hz=100.  ssh to a box on local ethernet
and establish a reliable round-trip-time (ie: type a few commands).
Then unplug the ethernet and press a key.  Time how long it takes to
drop the connection.

The old behavior (with hz=100) caused the connection to typically drop
between 90 and 110 seconds of getting no response.

Now boot with kern.hz=1000 (default).  The same test causes the ssh session
to drop after just 9-10 seconds.  This is a big deal on a wifi connection.

With kern.hz=1000, change sysctl net.inet.tcp.rexmit_min from 3 to 30.
Note how it behaves the same as when HZ was 100.  Also, note that when
booting with hz=100, net.inet.tcp.rexmit_min *used* to be 30.

This commit changes TCPTV_MIN to be scaled with hz.  rexmit_min should
always be about 30.  If you set hz to Really Slow(TM), there is a safety
feature to prevent a value of 0 being used.

This may be revised in the future, but for the time being, it restores the
old, pre-hz=1000 behavior, which is significantly less annoying.

As a workaround, to avoid rebooting or rebuilding a kernel, you can run
"sysctl net.inet.tcp.rexmit_min=30" and add "net.inet.tcp.rexmit_min=30"
to /etc/sysctl.conf.  This is safe to run from 6.0 onwards.

Approved by:  re (rwatson)
Reviewed by:  andre, silby
2007-07-31 22:11:55 +00:00
Scott Long
5878cbeccf Make the driver fully MPSAFE. This fixes some serious locking problems
that could cause panics and corruption under moderate load.  Many thanks
to Matt Reimer, Tom McDonald, and the rest of the guys at VPOP.net for
their help in identifying and testing this.

Approved by: re
2007-07-31 20:16:50 +00:00
Scott Long
9ab0fe8075 Fix locking mistakes in the error recovery paths of the AHC and AHD drivers.
Approved by: re
2007-07-31 20:11:03 +00:00
Warner Losh
e8b7ad8c05 Add in all the USB devices and all the wireless goo. The KB9202 has
only USB 1.1 speeds available, but this shouldn't hurt.  Now that we have
working usb support for this board, this is a natural followup.

Approved by: re (kensmith)
2007-07-31 17:45:54 +00:00
Warner Losh
3f0fd37320 Make USB work on the KB9202{,A,B} boards. This has been in p4 for about
7 months.  You must have JP6 in the 1-2 position to supply power to the
USB devices, but I've used uftdi, uplcom and umass successfully.  If you
have it in 2-3, then nothing will show up.  Also, if you have the FQPA
packaging for the AT91RM9200 (like the KN9202 boards have), you will get
the following message

uhub0: device problem (IOERROR), disabling port 2

due to a hardware erratum.  It is safe to ignore as it is about pins that
aren't brought out on the FQPA package and aren't proeprly terminated either.
Alas, there's no register to read to tell the FQPA from the BGA versions.

Submitted by: Daan Vreeken
Approved by: re (kensmith)
2007-07-31 17:43:18 +00:00
Olivier Houchard
6308183c5d MFppc:
revision 1.66
date: 2007/07/31 06:23:26;  author: marcel;  state: Exp;  lines: +2 -2
Fix backward compatibility of the "old" (i.e. FreeBSD6) lseek
syscall. It was broken when a new lseek syscall was introduced.
The problem is that we need to swap the 32-bit td_retval values
for the __syscall indirect syscall when the actual syscall has
a 32-bit return value. Hence, we need to exclude lseek(2). And
this means the "old" lseek(2) as well -- which we didn't.

Based on a patch from: grehan@

Approved by:	re (blanket)
2007-07-31 17:09:05 +00:00
Marcel Moolenaar
8875aa6621 Fix backward compatibility of the "old" (i.e. FreeBSD6) lseek
syscall. It was broken when a new lseek syscall was introduced.
The problem is that we need to swap the 32-bit td_retval values
for the __syscall indirect syscall when the actual syscall has
a 32-bit return value. Hence, we need to exclude lseek(2). And
this means the "old" lseek(2) as well -- which we didn't.

Based on a patch from: grehan@
Approved by: re (rwatson)
2007-07-31 06:23:26 +00:00
Marcel Moolenaar
789943cc81 Enable -Werror for ia64.
Approved by: re (blanket)
2007-07-31 03:15:32 +00:00
David Christensen
990a2aa530 - Fixed a problem that would cause kernel panics and "bce0: discard frame .."
errors (especially when jumbo frames are enabled or in low memory systems)
  because the RX chain was corrupted when an mbuf was mapped to an unexpected
  number of buffers.
- Fixed a problem that would cause kernel panics when an excessively
  fragmented TX mbuf couldn't be defragmented and was released by
  bce_tx_encap().

Approved by:	re(hrs)
MFC after:	7 days
2007-07-31 00:06:04 +00:00
Marcel Moolenaar
cf681ceef5 o Switch to physical addressing before dereferencing the VHPT
bucket pointer. The virtual mapping may not be present in the
  translation cache. This will result in a nested TLB fault at
  a place we don't handle (and don't want to handle).
o Make sure there's a stop after the rfi instruction, otherwise
  its behaviour is undefined.
o Make sure we switch back to virtual addressing before doing
  a rfi. Behaviour is undefined otherwise.

Approved by: re (blanket)
2007-07-30 22:52:52 +00:00
Marcel Moolenaar
ea5e2a02af Add option EXCEPTION_TRACING, which enables KTR-like functionality
for processor interruptions. This is especially useful to track
unexpected nested TLB faults.

Approved by: re (blanket)
2007-07-30 22:42:33 +00:00
Marcel Moolenaar
fe1c66b9d7 Rework the interrupt code and add support for interrupt filtering
(INTR_FILTER). This includes:
o  Save a pointer to the sapic structure and IRQ for every vector,
   so that we can quickly EOI, mask and unmask the interrupt.
o  Add locking to the sapic code now that we can reprogram a
   sapic on multiple CPUs at the same time.
o  Use u_int for the vector and IRQ. We only have 256 vectors, so
   using a 64-bit type for it is rather excessive.
o  Properly handle concurrent registration of a handler for the
   same vector.

Since vectors have a corresponding priority, we should not map
IRQs to vectors in a linear fashion, but rather pick a vector
that has a priority in line with the interrupt type. This is left
for later. The vector/IRQ interchange has been untangled as much
as possible to make this easier.

Approved by: re (blacket)
2007-07-30 22:29:33 +00:00
Marcel Moolenaar
8a2a70cb02 Explicitly map the VHPT on all processors. Previously we were
merely lucky that the VHPT was mapped as a side-effect of
mapping the kernel, but when there's enough physical memory,
this may not at all be the case.

Approved by: re (blanket)
2007-07-30 22:12:53 +00:00
Marcel Moolenaar
c183b0f2c1 Add casts to some of the more commonly used pointer-type atomic
operations. We really should be able to make those inline functions,
but this would break its use for sx_locks.

Approved by: re (blanket)
2007-07-30 22:07:01 +00:00
Andrew Thompson
de75afe64f - Propagate the largest set of interface capabilities supported by all lagg
ports to the lagg interface.
- Use the MTU from the first interface as the lagg MTU, all extra interfaces
  must be the same.

This fixes using a lagg interface for a vlan or enabling jumbo frames, etc.

Approved by:	re (kensmith)
MFC After:	3 days
2007-07-30 20:17:22 +00:00
Nate Lawson
430eaa744e Dynamically choose the quality of the ACPI timer depending on whether
the fast or safe/slow method is in use.  Fast remains at 1000, slow is
now at 850 (always preferred to TSC).  Since the HPET has proven slower
than ACPI-fast on some systems, drop its quality to 900.  In the future,
it is hoped that HPET performance will improve as it is the main
timer Intel supports.  HPET may move back to 2000 in -current once RELENG_7
is branched to ensure that it gets tested.

Approved by:	re
2007-07-30 15:21:26 +00:00
Dag-Erling Smørgrav
218cbbea9a Make tcpstates[] static, and make sure TCPSTATES is defined before
<netinet/tcp_fsm.h> is included into any compilation unit that needs
tcpstates[].  Also remove incorrect extern declarations and TCPDEBUG
conditionals.  This allows kernels both with and without TCPDEBUG to
build, and unbreaks the tinderbox.

Approved by:	re (rwatson)
2007-07-30 11:06:42 +00:00
David Malone
c848e0de55 Mfi386 revision 1.239 of src/sys/i386/isa/clock.c. Seemingly some
pc98 motherboards do not provide us with the correct day of week
either. Ignore the day of week when setting the clock here too.

Approved by:	re (bmah)
Requested from:	nyan
MFC after:	3 weeks
2007-07-29 20:16:48 +00:00
Bruce A. Mah
e251d2f4f6 Fix a typo in a log message: s/Reveived/Received/.
Approved by:	re (rwatson)
2007-07-29 20:13:22 +00:00
Warner Losh
1dfb823e11 Add missing newline in printf.
Submitted by:  "R.Mahmatkhanov" cvs-src at yandex ru
Approved by: re (blanket)
2007-07-29 18:16:43 +00:00
Marcel Moolenaar
7f67bed625 In pci_alloc_map(), restore the original value of the BAR for
the duration of the function.  The device we would otherwise
have left in an useless state may just as well be the low-level
console. When booting verbose, we do need it addressable if we
want to avoid a MCA.

Approved by: re (kensmith)
2007-07-29 02:44:41 +00:00
Matt Jacob
24face5416 Fix compilation problems- tcpstates is only available if TCPDEBUG
is set.

Approved by:	re (in spirit)
2007-07-29 01:31:33 +00:00
Mike Silbersack
e3020cfd3c Fix a panic introduced in rev 1.126.
Approved by: re (rwatson)
2007-07-28 20:13:40 +00:00
Andre Oppermann
773673c133 Provide a sysctl to toggle reporting of TCP debug logging:
sys.net.inet.tcp.log_debug = 1

It defaults to enabled for the moment and is to be turned off for
the next release like other diagnostics from development branches.

It is important to note that sysctl sys.net.inet.tcp.log_in_vain
uses the same logging function as log_debug.  Enabling of the former
also causes the latter to engage, but not vice versa.

Use consistent terminology in tcp log messages:

 "ignored" means a segment contains invalid flags/information and
   is dropped without changing state or issuing a reply.

 "rejected" means a segments contains invalid flags/information but
   is causing a reply (usually RST) and may cause a state change.

Approved by:	re (rwatson)
2007-07-28 12:20:39 +00:00
Andre Oppermann
cdaf208d09 o Move setting/resetting logic of syncache timer from macro
SYNCACHE_TIMEOUT to new function syncache_timeout().
o Fix inverted timeout callout engagement logic to actually
  enable the timer for the bucket row.  Before SYN|ACK was
  not retransmitted.
o Simplify SYN|ACK retransmit timeout backoff calculation.
o Improve logging of retransmit and timeout events.
o Reset timeout when duplicate SYN arrives.
o Add comments.
o Rearrange SYN cookie statistics counting.

Bug found by:	silby
Submitted by:	silby (different version)
Approved by:	re (rwatson)
2007-07-28 12:02:05 +00:00
Andre Oppermann
19bc77c549 o Move all detailed checks for RST in LISTEN state from tcp_input() to
syncache_rst().
o Fix tests for flag combinations of RST and SYN, ACK, FIN.  Before
  a RST for a connection in syncache did not properly free the entry.
o Add more detailed logging.

Approved by:	re (rwatson)
2007-07-28 11:51:44 +00:00
Robert Watson
c6b2899785 Replace references to NET_CALLOUT_MPSAFE with CALLOUT_MPSAFE, and remove
definition of NET_CALLOUT_MPSAFE, which is no longer required now that
debug.mpsafenet has been removed.

The once over:	bz
Approved by:	re (kensmith)
2007-07-28 07:31:30 +00:00
Alan Cox
eaa29f1ce4 Add a counter for the total number of pages cached and support for
reporting the value of this counter in the program "vmstat".

Approved by:	re (rwatson)
2007-07-27 20:01:22 +00:00
Olivier Houchard
122e1e5e24 CRB config file.
Approved by:	re (blanket)
2007-07-27 14:57:03 +00:00
Olivier Houchard
5f78cb4a35 XScale core 3 definitions.
Approved by:	re (blanket)
2007-07-27 14:54:27 +00:00
Olivier Houchard
0566a63ff3 Cleanup
Approved by:	re (blanket)
2007-07-27 14:53:42 +00:00