Commit Graph

74127 Commits

Author SHA1 Message Date
Julian Elischer
8f26c03fe6 Fix typo in comment that has been bugging me for days.
MFC after:	1 week
2009-08-23 07:59:28 +00:00
Doug Barton
d05203a19f The svnversion string is only relevant when newvers.sh is called
during the kernel build process, the other places that call the script
do not make use of that information. So restrict execution of the
svnversion-related code to the kernel build context.
2009-08-23 05:45:38 +00:00
Ken Smith
cf48cc9f69 Make head 9.0-CURRENT in preparation for lifting code freeze.
Approved by:	re (implicit)
2009-08-22 23:44:37 +00:00
Julian Elischer
c4b21cbe4a Fix ipfw's initialization functions to get the correct order of evaluation
to allow vnet and non vnet operation. Move some functions from ip_fw_pfil.c
to ip_fw2.c and mode to mostly using the SYSINIT and VNET_SYSINIT handlers
instead of the modevent handler. Correct some spelling errors in comments
in the affected code. Note this bug fixes a crash in NON VIMAGE kernels when
ipfw is unloaded.

This patch is a minimal patch for 8.0
I have a much larger patch that actually fixes the underlying problems
that will be applied after 8.0

Reviewed by:	zec@, rwatson@, bz@(earlier version)
Approved by:	re (rwatson)
MFC after:	Immediatly
2009-08-21 11:20:10 +00:00
Julian Elischer
cd81cd3fd1 Don't allow access to the internals until it has all been set up.
Specifically, not until the per-vnet parts have been set up.

Submitted by:	kmacy@
Reviewed by:	julian@, zec@
Approved by:	re(rwatson)
MFC after:	immediately
2009-08-21 09:22:32 +00:00
John Baldwin
eb5a1e8f38 This patch fixes two bugs in sglist(9) and improves robustness of the API via
better semantics if a request to append an address range to an existing list
fails.
- When cloning an sglist, properly set the length in the new sglist instead of
  leaving the new list empty.
- Properly compute the amount of data added to an sglist via
  _sglist_append_buf().  This allows sglist_consume_uio() to properly update
  uio_resid.
- When a request to append an address range to a scatter/gather list fails,
  restore the sglist to the state it had at the start of the function call
  instead of resetting it to an empty list.

Requested by:	np (3)
Approved by:	re (kib)
2009-08-21 02:59:07 +00:00
Ken Smith
eb4d96a268 Fix a boot hang for hptrr(4) caused by changes introduced in r195534.
It is necessary to make sure cpi->transport is set for xpt_scan_bus() to
work properly.

Submitted by: Bernhard Schmidt (scb+freebsd-current <at> techwires
              <dot> net)
Reviewed by:  scottl
Approved by:  re (kib)
2009-08-21 01:00:15 +00:00
Jung-uk Kim
66406b8f25 Check whether the SMBIOS reports reasonable amount of memory. If it is
less than "avail memory", fall back to Maxmem to avoid user confusion.
We use SMBIOS information to display "real memory" since r190599 but
some broken SMBIOS implementation reported only half of actual memory.

Tested by:	bz
Approved by:	re (kib)
2009-08-20 22:58:05 +00:00
Peter Wemm
b4e7e7a065 Fix signed comparison bug when ticks goes negative after 24 days of
uptime.  This causes the tcp time_wait state code to fail to expire
sockets in timewait state.

Approved by:	re (kensmith)
2009-08-20 22:53:28 +00:00
John Baldwin
0cef25aeb2 Change the 'resid' parameter to sglist_consume_uio() from an int to a
size_t to match the recent type change of the uio_resid member of struct
uio.

Approved by:	re (kib)
2009-08-20 19:23:58 +00:00
John Baldwin
a56fe095f0 Temporarily revert the new-bus locking for 8.0 release. It will be
reintroduced after HEAD is reopened for commits by re@.

Approved by:	re (kib), attilio
2009-08-20 19:17:53 +00:00
Will Andrews
52e12426d1 Fix CARP memory leaks on carp_if's malloc'd using M_CARP. This occurs when
CARP tries to free them using M_IFADDR after the last address for a virtual
host is removed and when detaching from the parent interface.

Reviewed by:	mlaier
Approved by:	re (kib), ken (mentor)
2009-08-20 02:33:12 +00:00
Pawel Jakub Dawidek
35ae9291c2 Our libc doesn't implement control method for XDR (only kernel does) and it
will always return failure. Fix this by bringing userland implementation of
xdrmem_control() back. This allow 'zpool import' to work again.

Reported by:	Thomas Backman <serenity@exscape.org>
Reviewed by:	kmacy
Approved by:	re (kib)
2009-08-20 00:05:29 +00:00
Ed Schouten
12f27c4e64 Make the MacBookPro3,1 hardware boot again.
Tested by:	Patrick Lamaiziere <patfbsd davenulle org>
Approved by:	re (kib)
2009-08-19 20:39:33 +00:00
Kip Macy
6d37c3ecd9 This change fixes a comment and addresses a complaint by kib@ by
moving a frequently executed flowtable syslog statement from being
conditional on bootverbose to conditional on a per-vnet flowtable
sysctl.

Approved by:	re@
2009-08-19 20:13:09 +00:00
Xin LI
1886a6912d Temporarily enhance em(4) and igb(4) hack to take account for IFF_NOARP.
Without this changeset there will be no way to prevent these NICs from
sending ARP, which is harmful in server farms that is configured as
"Direct Server Return" behind a load balancer.

A better fix would remove the whole hack completely but it would be
later than 8.0-RELEASE.

Reviewed by:	jfv, yongari
Approved by:	re (kib)
2009-08-19 17:59:41 +00:00
Rafal Jaworowski
35d01728c6 Fix USB cache sync operations for platforms with non-coherent DMA.
- usb_pc_cpu_invalidate() is called between [consecutive] reads from a device,
  so a sequence of BUS_DMASYNC_POSTREAD and _PREREAD should be used. Note we
  cannot use or'ed shorthand ( _POSTREAD | _PREREAD) for BUS_DMASYNC flags, as
  the low level bus dma sync operation is implementation dependent and we
  cannot assume the required order of operations to be guaranteed.

- usb_pc_cpu_flush() is called before writing to a device, so
  BUS_DMASYNC_PREWRITE should be used.

Submitted by:	Grzegorz Bernacki
Reviewed by:	HPS, arm@, usb@ ML
Tested by:	HPS, Mike Tancsa
Approved by:	re (kib)
Obtained from:	Semihalf
2009-08-19 14:39:08 +00:00
Ed Schouten
6c813c1ccc Small changes to the warning message generated by pty(4):
- Only print the warning once, instead of filling up the screen.
- Use the word "legacy" for the pty_warningcnt description, to prevent
  confusion.
- Use log() instead of printf().

Discussed with:	rwatson, jhb
Approved by:	re (kib)
2009-08-19 14:30:46 +00:00
Michael Tuexen
2f99457b0c Fix a bug in the handling of unreliable messages
which results in stalled associations.

Approved by: re, rrs (mentor)
MFC after: immediately
2009-08-19 12:02:28 +00:00
Max Laier
b5a6cecbcd If we cannot immediately get the pf_consistency_lock in the purge thread,
restart the scan after acquiring the lock the hard way.  Otherwise we might
end up with a dead reference.

Reported by:		pfsense
Reviewed by:		eri
Initial patch by:	eri
Tested by:		pfsense
Approved by:		re (kib)
2009-08-19 00:10:10 +00:00
Stanislav Sedov
7f21e273a8 - Do not try to reevaluate current RX production index on each
loop iteration as it can be updated by the card while we
  process the RX ring forcing us to process RX descriptors
  for which DMA synchronisation operation has not been
  performed.  This fixes the bug when bge(4) drops packets
  under high load.

Discussed with:	yongari, marius
Approved by:	re (kib)
MFC after:	1 week
2009-08-18 21:07:39 +00:00
Kip Macy
3ee42584f9 - change the interface to flowtable_lookup so that we don't rely on
the mbuf for obtaining the fib index
 - check that a cached flow corresponds to the same fib index as the
   packet for which we are doing the lookup
 - at interface detach time flush any flows referencing stale rtentrys
   associated with the interface that is going away (fixes reported
   panics)
 - reduce the time between cleans in case the cleaner is running at
   the time the eventhandler is called and the wakeup is missed less
   time will elapse before the eventhandler returns
 - separate per-vnet initialization from global initialization
   (pointed out by jeli@)

Reviewed by:	sam@
Approved by:	re@
2009-08-18 20:28:58 +00:00
Pyun YongHyeon
44d9075392 Backout r193289. r193289 restored page select bits to previous
value instead of blindly resetting it to 0. However, it seems page
select bits of some 88E1116 PHY is initialized to invalid one such
that restoring page select bits after programming broke MII
register access. The correct solution would be reset page select
bits to 0 in PHY attach stage but it would require more testing.
Since we're in BETA stage such a change would be dangerous so just
back it out.
This change should fix nfe(4) breakage on NVIDIA MCP55.

Reported by:	Ryan Rogers < webmaster <> doghouserepair dot com >
		Sam Fourman Jr. < sfourman <> gmail dot com >
Tested by:	Ryan Rogers < webmaster <> doghouserepair dot com >
		Sam Fourman Jr. < sfourman <> gmail dot com >
Approved by:	re (kib)
2009-08-18 20:20:15 +00:00
Michael Tuexen
627dfd6df9 Fix a crash when using one-to-one stlye socket in non-blocking
mode and there is no listening server.
PR: 137795
Approved by: re, rrs (mentor)
MFC after:immediately.
2009-08-18 19:58:49 +00:00
Pawel Jakub Dawidek
e477e4fe8e Remove unused taskqueue_find() function.
Reviewed by:	dfr
Approved by:	re (kib)
2009-08-18 13:55:48 +00:00
Alexander Motin
5daa555d8d Fix copy/paste bug, that requests data read during ATA device probe sequence
for ATA_SETFEATURES/ATA_SF_SETXFER command which by definition transfers no
data. Most of controllers are irrelevant to this bug, but some nVidia's
doesn't.

Tested on:	current@
Approved by:	re (kib)
2009-08-18 09:27:17 +00:00
Alexander Motin
33ea30fed7 Fix iSCSI initiator and vpo driver operation, broken by CAM changes.
Reviewed by:	scottl, Danny Braniss
Approved by:	re (rwatson)
2009-08-18 08:46:54 +00:00
Kip Macy
d53e359b9a fix netboot issue by disabling flowtable lookups until initialization has been run
Reviewed by:	rwatson@
Approved by:	re@
2009-08-17 19:09:28 +00:00
Attilio Rao
353998acc3 * Change the scope of the ASSERT_ATOMIC_LOAD() from a generic check to
a pointer-fetching specific operation check. Consequently, rename the
  operation ASSERT_ATOMIC_LOAD_PTR().
* Fix the implementation of ASSERT_ATOMIC_LOAD_PTR() by checking
  directly alignment on the word boundry, for all the given specific
  architectures. That's a bit too strict for some common case, but it
  assures safety.
* Add a comment explaining the scope of the macro
* Add a new stub in the lockmgr specific implementation

Tested by: marcel (initial version), marius
Reviewed by: rwatson, jhb (comment specific review)
Approved by: re (kib)
2009-08-17 16:17:21 +00:00
Marcel Moolenaar
8530137252 The start of the EFI GPT partition in the PMBR can always be represented
by CHS addressing. Don't define these fields as 0xff, but rather define
them correctly. This prevents boot problems on PCs where GPT is being
used.

PR:		115406
Submitted by:	Kent Hauser <kent@khauser.net>
Approved by:	re (kib)
2009-08-17 16:16:46 +00:00
Rick Macklem
60f04e38ca Apply the same patch as r196205 for nfs_upgrade_lock() and
nfs_downgrade_lock() to the experimental nfs client.

Approved by:	re (kensmith), kib (mentor)
2009-08-17 16:12:28 +00:00
John Hay
e8c91ce393 Fix parse() so that the partition to boot (load /boot/loader) from can
be set. The syntax as printed in main() is used: 0:ad(0p3)/boot/loader

Reviewed by:	jhb
Approved by:	re (kib)
2009-08-17 15:19:03 +00:00
Konstantin Belousov
faccac2d45 Correct a critical accounting error in pmap_demote_pde(). Specifically,
when pmap_demote_pde() allocates a page table page to implement a
user-space demotion, it must increment the pmap's resident page count.
Not doing so, can lead to an underflow during address space termination
that causes pmap_remove() to exit prematurely, before it has destroyed
all of the mappings within the specified range.  The ultimate effect or
symptom of this error is an assertion failure in vm_page_free_toq()
because the page being freed is still mapped.

This error is only possible when superpage promotion is enabled.  Thus,
it only affects FreeBSD  versions greater than 7.2.

Tested by:	pho, alc
Reviewed by:	alc
Approved by:	re (rwatson)
MFC after:	1 week
2009-08-17 13:27:55 +00:00
Rui Paulo
fbe007da74 Fix a typo in ifdef mesh support. This would make mesh unworkable if
TDMA support was compiled out.

Approved by:	re (kib)
2009-08-17 12:57:57 +00:00
Pawel Jakub Dawidek
8e9fd65fbf getcwd() (when __getcwd() fails) works by stating current directory, going up
(..), calling readdir and looking for previous directory inode.  In case of
.zfs/ directory this doesn't work, because .zfs/ is hidden by default, so it
won't be visible in readdir output.

Fix this by implementing VPTOCNP for snapshot directories, so __getcwd()
doesn't fail and getcwd() doesn't have to use readdir method.

This fixes /bin/pwd from within .zfs/snapshot/<name>/.

Suggested by:	kib
Approved by:	re (rwatson)
2009-08-17 10:00:18 +00:00
Pawel Jakub Dawidek
8461b0f043 Manage asynchronous vnode release just like Solaris.
Discussed with:	kmacy
Approved by:	re (kib)
2009-08-17 09:48:34 +00:00
Pawel Jakub Dawidek
e35eb914f4 - Reduce z_teardown_lock lock scope a bit.
- The error variable is int, not bool.
- Convert spaces to tabs where needed.

Approved by:	re (kib)
2009-08-17 09:28:15 +00:00
Pawel Jakub Dawidek
0330a5dc10 If z_buf is NULL, we should free znode immediately.
Noticed by:	avg
Approved by:	re (kib)
2009-08-17 09:25:37 +00:00
Pawel Jakub Dawidek
d83cfc37a4 - We need to recycle vnode instead of freeing znode.
Submitted by:	avg

- Add missing vnode interlock unlock.
- Remove redundant znode locking.

Approved by:	re (kib)
2009-08-17 09:21:39 +00:00
Pawel Jakub Dawidek
f820bc079f Fix panic in zfs recv code. The last vnode (mountpoint's vnode) can have
0 usecount.

Reported by:	Thomas Backman <serenity@exscape.org>
Approved by:	re (kib)
2009-08-17 09:13:22 +00:00
Pawel Jakub Dawidek
159ef108e1 Remove OpenSolaris taskq port (it performs very poorly in our kernel) and
replace it with wrappers around our taskqueue(9).
To make it possible implement taskqueue_member() function which returns 1
if the given thread was created by the given taskqueue.

Approved by:	re (kib)
2009-08-17 09:01:20 +00:00
Pawel Jakub Dawidek
6a3b289388 Because taskqueue_run() can drop tq_mutex, we need to check if the
TQ_FLAGS_ACTIVE flag wasn't removed in the meantime, which means we missed a
wakeup.

Approved by:	re (kib)
2009-08-17 08:42:34 +00:00
Pawel Jakub Dawidek
fddc954016 - Fix a race where /dev/zfs control device is created before ZFS is fully
initialized. Also destroy /dev/zfs before doing other deinitializations.
- Initialization through taskq is no longer needed and there is a race
  where one of the zpool/zfs command loads zfs.ko and tries to do some work
  immediately, but /dev/zfs is not there yet.

Reported by:	pav
Approved by:	re (kib)
2009-08-17 08:36:41 +00:00
Pawel Jakub Dawidek
830940567b Remove files that are no longer used.
Discussed with:	kmacy
Approved by:	re (kib)
2009-08-17 08:03:02 +00:00
Ed Schouten
f6b3749db3 Fix small style regression introduced by the MPSAFE newbus code.
Approved by:	re (rwatson)
2009-08-16 19:55:53 +00:00
Andrew Thompson
532b195250 Change the usb workers from kernel processes to threads, this is mostly a
cosmetic change to reduce cruft in the proc table.

Also change the idle wait message to `-` like how taskqueues are.

Reviewed by:	julian
Approved by:	re (kib)
2009-08-16 14:13:55 +00:00
Marcel Moolenaar
538e86713d Fix misalignment in nvpair_native_embedded() caused by the compiler
replacing the bzero(). See also revision 195627, which fixed the
misalignment in nvpair_native_embedded_array().

Approved by:	re (kensmith)
2009-08-16 01:48:46 +00:00
Marcel Moolenaar
97e84697ae Decouple ACPI CPU Ids from FreeBSD's cpuid. The ACPI Ids can be
sparse, which causes a kernel assert.

Approved by:	re (kensmith)
2009-08-16 01:43:08 +00:00
Robert Watson
a38fdf2946 Rather than fix questionable ifnet list locking in the implementation of
the kern.polling.enable sysctl, remove the sysctl.  It has been deprecated
since FreeBSD 6 in favour of per-ifnet polling flags.

Reviewed by:	luigi
Approved by:	re (kib)
2009-08-15 23:07:43 +00:00
Robert Watson
d931ea0961 Remove unused if_rawoutput() macro; it has been unused since at least
FreeBSD 2.

Approved by:	re (kib)
2009-08-15 22:26:26 +00:00
Michael Tuexen
810ec53688 * Fix a bug where PR-SCTP settings are ignore when using implicit
association setup.
* Fix a bug where message with illegal stream ids are not deleted.
* Fix a crash when reporting back unsent messages from the send_queue.
* Fix a bug related to INIT retransmission when the socket is already
  closed.
* Fix a bug where associations were stalled when partial delivery API
  was enabled.
* Fix a bug where the receive buffer size was smaller than the
  partial_delivery_point.

Approved by: re, rrs (mentor)
MFC after: One day.
2009-08-15 21:10:52 +00:00
Attilio Rao
68a5590836 Port recent IPI enhachements to en:
* Introduce the ipi_nmi_handler() function for the Xen infrastructure
* Fixup adeguately the ipi sender functions

Approved by:	re (kib)
2009-08-15 18:37:06 +00:00
Stanislav Sedov
2dc8b759bd - Proprely intialize UART parameters at probe stage, so uart(4)
will initialize the FIFO memory correctly on attach.  Before
  that this values was intialized in only in at91_usart_bus_attach
  which is called after the uart(4) memory allocation happens.

Approved by:	re (kib)
MFC after:	1 week
2009-08-15 15:15:20 +00:00
Qing Li
3ef5e21d01 In function ip_output(), the cached route is flushed when there is a
mismatch between the cached entry and the intended destination. The
cached rtentry{} is flushed but the associated llentry{} is not. This
causes the wrong destination MAC address being used in the output
packets. The fix is to flush the llentry{} when rtentry{} is cleared.

Reviewed by:	kmacy, rwatson
Approved by:	re
2009-08-14 23:44:59 +00:00
Marko Zec
9abb486279 Appease VNET_DEBUG - in if_vmove we temporarily switch i.e.
recurse from one vnet to another which is OK, so no need
to flood the console with warnings here.

Approved by:	re (rwatson), julian (mentor)
2009-08-14 22:46:45 +00:00
Marko Zec
f92ae4d706 SCTP is not yet compatible with options VIMAGE kernels although it compiles
with VIMAGE defined, so explicitly disallow building such kernels.

Reviewed by:	rrs
Approved by:	re (rwatson), julian (mentor)
2009-08-14 22:43:25 +00:00
Marko Zec
67addcde86 Make VNET_DEBUG a standalone compile-time option, i.e. decouple it from
INVARIANTS.

Reviewed by:	bz
Approved by:	re (rwatson), julian (mentor)
2009-08-14 22:41:39 +00:00
Bjoern A. Zeeb
8d518523cc Add a new macro to test that a variable could be loaded atomically.
Check that the given variable is at most uintptr_t in size and that
it is aligned.

Note: ASSERT_ATOMIC_LOAD() uses ALIGN() to check for adequate
      alignment -- however, the function of ALIGN() is to guarantee
      alignment, and therefore may lead to stronger alignment
      enforcement than necessary for types that are smaller than
      sizeof(uintptr_t).

Add checks to mtx, rw and sx locks init functions to detect possible
breakage. This was used during debugging of the problem fixed with
r196118 where a pointer was on an un-aligned address in the dpcpu area.

In collaboration with:	rwatson
Reviewed by:		rwatson
Approved by:		re (kib)
2009-08-14 21:46:54 +00:00
John Baldwin
21157ad3b1 Adjust the handling of the local APIC PMC interrupt vector:
- Provide lapic_disable_pmc(), lapic_enable_pmc(), and lapic_reenable_pmc()
  routines in the local APIC code that the hwpmc(4) driver can use to
  manage the local APIC PMC interrupt vector.
- Do not enable the local APIC PMC interrupt vector by default when
  HWPMC_HOOKS is enabled.  Instead, the hwpmc(4) driver explicitly
  enables the interrupt when it is succesfully initialized and disables
  the interrupt when it is unloaded.  This avoids enabling the interrupt
  on unsupported CPUs which may result in spurious NMIs.

Reported by:	rnoland
Reviewed by:	jkoshy
Approved by:	re (kib)
MFC after:	2 weeks
2009-08-14 21:05:08 +00:00
Konstantin Belousov
165a3b418f When a UFS node is truncated to the zero length, e.g. by explicit
truncate(2) call, or by being removed or truncated on open, either
new softupdate freeblks structure is allocated to track the freed
blocks of the node, or truncation is done syncronously when too many SU
dependencies are accumulated. The decision does not take into account
the allocated freeblks dependencies, allowing workloads that do huge
amount of truncations to exhaust the kernel memory.

Take the number of allocated freeblks into consideration for
softdep_slowdown().

Reported by:	pluknet gmail com
Diagnosed and tested by:	pho
Approved by:	re (rwatson)
MFC after:	1 month
2009-08-14 11:00:38 +00:00
Konstantin Belousov
48bd6d4a49 In nfs_upgrade_vnlock(), assert that the vnode is locked. It is for all
pathes, as far as I see and testing seems to confirm it. Comparision of
old_lock with LK_SHARED make sense only if vnode is locked by current
thread.

When downgrading, pass LK_RETRY to the vn_lock(), since otherwise
vn_lock() unlocks the doomed vnode, causing extra unlock.

Reported and tested by:	pho
Approved by:	re (rwatson)
MFC after:	3 weeks
2009-08-14 10:59:17 +00:00
Konstantin Belousov
aabd624cf4 Add the address of the lock to the KTR_LOCK trace.
Tested by:	pho
Approved by:	re (rwatson)
2009-08-14 10:57:57 +00:00
Konstantin Belousov
8f40845151 Correctly handle unlock for !MAKEENTRY case, after successfull attempt of
lock upgrade cache shall be unlocked from write.

Reported by:	Lucius Windschuh <lwindschuh googlemail com>
Reviewed by:	kan
Approved by:	re (rwatson)
2009-08-14 10:57:28 +00:00
Julian Elischer
72034f5548 Fix ipfw crash on uid or gid check.
Receiving any ip packet for which there is no existing socket will
crash if ipfw has a uid or gid test rule, as the uid/gid
of the non existent owner of said non existent socket is tested.
Brooks introduced this error as part of his >16 gids patch.
It appears to be a cut-n-paste error from similar code a few lines
before. The old code used the 'pcb' variable here, but in the
new code that switched the 'inp' variable, which is often NULL
and what is tested in the code further up. The rest of the multi-gid
patch for ipfw seems solid (and cleaner than previous code).

Reviewed by:	brooks
Approved by:	re (rwatson)
2009-08-14 10:09:45 +00:00
Scott Long
763fae7918 ntroduce mfiutil, a basic utility for managing LSI SAS-RAID & Dell PERC5/6
controllers.  Controller, array, and drive status can be checked, basic
attributes can be changed, and arrays and spares can be created and deleted.
Controller firmware can also be flashed.

This does not replace MegaCLI, found in ports, as that is officially sanctioned
and supported by LSI and includes vastly more functionality.  However, mfiutil
is open source and guaranteed to provide basic functionality, which can be
especially useful if you have a problem and can't get MegaCLI to work.

Approved by:    re
Obtained from:  Yahoo! Inc.
2009-08-13 23:18:45 +00:00
Attilio Rao
dc6fbf6545 * Completely Remove the option STOP_NMI from the kernel. This option
has proven to have a good effect when entering KDB by using a NMI,
but it completely violates all the good rules about interrupts
disabled while holding a spinlock in other occasions.  This can be the
cause of deadlocks on events where a normal IPI_STOP is expected.
* Adds an new IPI called IPI_STOP_HARD on all the supported architectures.
This IPI is responsible for sending a stop message among CPUs using a
privileged channel when disponible. In other cases it just does match a
normal IPI_STOP.
Right now the IPI_STOP_HARD functionality uses a NMI on ia32 and amd64
architectures, while on the other has a normal IPI_STOP effect. It is
responsibility of maintainers to eventually implement an hard stop
when necessary and possible.
* Use the new IPI facility in order to implement a new userend SMP kernel
function called stop_cpus_hard(). That is specular to stop_cpu() but
it does use the privileged channel for the stopping facility.
* Let KDB use the newly introduced function stop_cpus_hard() and leave
stop_cpus() for all the other cases
* Disable interrupts on CPU0 when starting the process of APs suspension.
* Style cleanup and comments adding

This patch should fix the reboot/shutdown deadlocks many users are
constantly reporting on mailing lists.

Please don't forget to update your config file with the STOP_NMI
option removal

Reviewed by:	jhb
Tested by:	pho, bz, rink
Approved by:	re (kib)
2009-08-13 17:09:45 +00:00
Rafal Jaworowski
cbb3cb151c Use correct wbinv operation in pmap_l2cache_wbinv_range().
Submitted by:	Michal Hajduk
Reviewed by:	stas
Approved by:	re (kib)
Obtained from:	Semihalf
2009-08-13 15:56:09 +00:00
Edward Tomasz Napierala
abd370a36b Remove CDDL warning.
Approved by:	re (kib), core
2009-08-13 12:28:30 +00:00
Bjoern A. Zeeb
eb79e1c76e Make it possible to change the vnet sysctl variables on jails
with their own virtual network stack. Jails only inheriting a
network stack cannot change anything that cannot be changed from
within a prison.

Reviewed by:	rwatson, zec
Approved by:	re (kib)
2009-08-13 10:26:34 +00:00
Bjoern A. Zeeb
20b0cdb749 Put multiple instructions into a block when iterating; unbreaks
NET_RT_DUMP, which otherwise only returned information of AF_MAX.
This was broken in r193232 (save your time - my bug, my fix).

PR:		kern/137700
Reported by:	Larry Baird (lab gta.com)
Tested by:	Larry Baird (lab gta.com)
Reviewed by:	zec, lstewart, qing
Approved by:	re (kib)
2009-08-13 09:29:52 +00:00
Matt Jacob
5cc3786c64 Have at least *some* default WWN to fall back on,
otherwise Sun branded FC cards won't configure.

Reviewed by:	Ken, Scott
Approved by:	re
2009-08-13 01:17:26 +00:00
Sam Leffler
ab501dd65d Drain link state event changes posted during vap destroy. This is a
band-aid for the general problem that if_link_state_change can be
called between if_detach and if_free leaving a task queued that has
been free'd.

Spotted by:	thompsa
Reviewed by:	rwatson
Approved by:	re (rwatson)
2009-08-12 21:19:19 +00:00
Qing Li
09b0354839 A piece of code was added to install a host route when an IPv6 interface
address is configured with a /128 prefix. This is no longer necessary due
to r192011. In fact that code conflicts with r192011. This patch removes
the host route installation when detecting the /128 prefix, and instead
let the code added by r192011 to install the loopback route for that IPv6
interface address.

Reviewed by:	bz
Approved by:	re
2009-08-12 19:15:26 +00:00
Jung-uk Kim
a36599cce7 Always embed pointer to BPF JIT function in BPF descriptor
to avoid inconsistency when opt_bpf.h is not included.

Reviewed by:	rwatson
Approved by:	re (rwatson)
2009-08-12 17:28:53 +00:00
Rick Macklem
0990c0a12c Add a check for a NULL mbuf ptr at the beginning of xdrmbuf_inline()
so that it returns failure instead of crashing when "m->m_len" is
executed and m == NULL. The mbuf ptr can be NULL when a call to
xdrmbuf_getbytes() gets the bytes it needs, but they are at the end
of a short RPC reply. When this happens, xdrmbuf_getbytes() returns
success, but advances the mbuf ptr (xdrs->x_private) to m_next, which
is NULL. If this is followed by a call to xdrmbuf_getlong(), it calls
xdrmbuf_inline(), which would cause a crash by accessing "m->m_len".

Tested by:	pho, serenity at exscape dot org
Approved by:	re (rwatson), kib (mentor)
2009-08-12 16:27:51 +00:00
Robert Noland
2aadd82afe Add support for radeon RS880 IGP chips to drm.
Approved by:	re (kib)
MFC after:	0 days
2009-08-12 12:57:02 +00:00
Robert Noland
fb6891522e Add some additional radeon pci ids to drm.
Approved by:	re (kib)
MFC after:	0 days
2009-08-12 12:50:15 +00:00
Bjoern A. Zeeb
57aea6dfb8 Make the kernel compile without IP networking by moving
a variable under a proper #ifdef.

Approved by:	re (rwatson)
2009-08-12 12:12:23 +00:00
Bjoern A. Zeeb
f2140faed1 Add ddb show dpcpu_off command to ease dpcpu memory debugging.
While show pcpu prints pc_dynamic this also prints the original
memory address as well as the maths.

Once dpcpu goes NUMA this is considered to help debugging as well.

Reviewed by:	rwatson
Approved by:	re
2009-08-12 12:06:16 +00:00
Bjoern A. Zeeb
281c86a4ef Update DDB show vnet command to print all used and available information.
Reviewed by:	rwatson, zec
Approved by:	re
2009-08-12 12:00:21 +00:00
Robert Watson
9eb3e4639a Correctly audit real gids following changes to the audit record argument
interface.

Approved by:	re (kib)
2009-08-12 10:45:45 +00:00
Robert Watson
c1aee61218 Reverse misordered unlock and lock in at_control for netatalk phase I
addresses.

Submitted by:	Russell Cattelan <cattelan at thebarn.com>
Approved by:	re (kib)
2009-08-12 10:44:13 +00:00
Bjoern A. Zeeb
1b501e53f3 Put minimum alignment on the dpcpu and vnet section so that ld
when adding the __start_ symbol knows the expected section alignment
and can place the __start_ symbol correctly.

These sections will not support symbols with super-cache line alignment
requirements.

For full details, see posting to freebsd-current, 2009-08-10,
Message-ID: <20090810133111.C93661@maildrop.int.zabbadoz.net>.

Debugging and testing patches by:
		Kamigishi Rei (spambox haruhiism.net),
		np, lstewart, jhb, kib, rwatson
Tested by:	Kamigishi Rei, lstewart
Reviewed by:	kib
Approved by:	re
2009-08-12 10:26:03 +00:00
Robert Watson
9d2eb78bcb Add padding to struct inpcb, missed during our padding sweep earlier in
the release cycle.

Approved by:	re (kensmith)
2009-08-02 22:47:08 +00:00
Robert Watson
315e3e38fa Many network stack subsystems use a single global data structure to hold
all pertinent statatistics for the subsystem.  These structures are
sometimes "borrowed" by kernel modules that require a place to store
statistics for similar events.

Add KPI accessor functions for statistics structures referenced by kernel
modules so that they no longer encode certain specifics of how the data
structures are named and stored.  This change is intended to make it
easier to move to per-CPU network stats following 8.0-RELEASE.

The following modules are affected by this change:

      if_bridge
      if_cxgb
      if_gif
      ip_mroute
      ipdivert
      pf

In practice, most of these statistics consumers should, in fact, maintain
their own statistics data structures rather than borrowing structures
from the base network stack.  However, that change is too agressive for
this point in the release cycle.

Reviewed by:	bz
Approved by:	re (kib)
2009-08-02 19:43:32 +00:00
Julian Elischer
9734411552 Stop uuidgen(2) from crashing in vimage kerenels.
make curvnet valid when needed.

Reviewed by:	bz@
Approved by:	re (kib)
2009-08-02 16:59:02 +00:00
Attilio Rao
444b91868b Make the newbus subsystem Giant free by adding the new newbus sxlock.
The newbus lock is responsible for protecting newbus internIal structures,
device states and devclass flags. It is necessary to hold it when all
such datas are accessed. For the other operations, softc locking should
ensure enough protection to avoid races.

Newbus lock is automatically held when virtual operations on the device
and bus are invoked when loading the driver or when the suspend/resume
take place. For other 'spourious' operations trying to access/modify
the newbus topology, newbus lock needs to be automatically acquired and
dropped.

For the moment Giant is also acquired in some key point (modules subsystem)
in order to avoid problems before the 8.0 release as module handlers could
make assumptions about it. This Giant locking should go just after
the release happens.

Please keep in mind that the public interface can be expanded in order
to provide more support, if there are really necessities at some point
and also some bugs could arise as long as the patch needs a bit of
further testing.

Bump __FreeBSD_version in order to reflect the newbus lock introduction.

Reviewed by:    ed, hps, jhb, imp, mav, scottl
No answer by:   ariff, thompsa, yongari
Tested by:      pho,
                G. Trematerra <giovanni dot trematerra at gmail dot com>,
                Brandon Gooch <jamesbrandongooch at gmail dot com>
Sponsored by:   Yahoo! Incorporated
Approved by:	re (ksmith)
2009-08-02 14:28:40 +00:00
Ed Schouten
d40b91cb13 Fix two bugs related to TTY input:
- fix write() on pseudo-terminal masters to return the amount of bytes
  passed to the TTY, not the amount of bytes read from user.

- fix ttydisc_rint_bypass() to set the high watermark when it cannot
  write all input, just like ttydisc_rint() itself.

Approved by:	re (kib)
2009-08-02 14:25:26 +00:00
Ed Schouten
61fb73de41 Make the MacBook3,1 boot again.
Approved by:	re (kib)
2009-08-02 11:26:23 +00:00
Robert Watson
6aad5c1c93 The colour was red as shall be the letters of this warning to people upon
boot if the experimental VIMAGE feature was compiled into the kernel.

Submitted by:	bz
Reviewed by:	zec
Approved by:	re (vimage blanket)
2009-08-01 22:22:45 +00:00
Robert Watson
c8f6a13820 Minor style tweaks.
Approved by:	re (vimage blanket)
2009-08-01 21:58:32 +00:00
Robert Watson
6bc2c7b70c Make the vnet alloc/destroy paths a bit easier to followg by merging
vnet_data_init/vnet_data_destroy into vnet_alloc/vnet_destroy.

Reviewed by:	bz, zec
Approved by:	re (vimage blanket)
2009-08-01 21:54:15 +00:00
Robert Watson
7429a3f3d8 Remove vnet_foreach() utility function, which previously allowed
vnet.c to iterate virtual network stacks without being aware of
the implementation details previously hidden in kern_vimage.c.
Now they are in the same file, so remove this added complexity.

Reviewed by:	bz
Approved by:	re (vimage blanket)
2009-08-01 20:24:45 +00:00
Robert Watson
530c006014 Merge the remainder of kern_vimage.c and vimage.h into vnet.c and
vnet.h, we now use jails (rather than vimages) as the abstraction
for virtualization management, and what remained was specific to
virtual network stacks.  Minor cleanups are done in the process,
and comments updated to reflect these changes.

Reviewed by:	bz
Approved by:	re (vimage blanket)
2009-08-01 19:26:27 +00:00
Matt Jacob
2df76c160b Add 8Gb support (isp_2500). Fix a fair number of configuration and
firmware loading bugs.

Target mode support has received some serious attention to make it
more usable and stable.

Some backward compatible additions to CAM have been made that make
target mode async events easier to deal with have also been put
into place.

Further refinement and better support for NP-IV (N-port Virtualization)
is now in place.

Code for release prior to RELENG_7 has been stripped away for code clarity.

Sponsored by: Copan Systems

Reviewed by:    scottl, ken, jung-uk kim
Approved by:    re
2009-08-01 01:04:26 +00:00
Matt Jacob
b965588786 Add 8Gb card firmware. Update some 2Gb and 4Gb f/w sets.
Split 4Gb and 8Gb into pieces that can be either multi_id
capable or not.

Reviewed by:	scottl, ken
Approved by:	re
2009-08-01 00:57:34 +00:00
Sam Leffler
faf666aa77 fix misplaced #endif that caused tdma handling to be merged with ESS handling
(causing tdma scanning to break)

Approved by:	re (kib)
2009-07-31 19:13:16 +00:00
Sam Leffler
2dfcbb0e2e Filter setting IFF_PROMISC on tdma vaps; we don't want the underyling device
to be in promiscuous mode as we have a h/w bssid.

Approved by:	re (kib)
2009-07-31 19:12:19 +00:00
Weongyo Jeong
6418e06a48 add upgt
Approved by:	re (kib)
2009-07-31 17:57:16 +00:00
Jamie Gritton
42bebd8205 Make the "enforce_statfs" default 2 (most restrictive) in jail_set(2),
instead of whatever the parent/system has (which is generally 0).  This
mirrors the old-style default used for jail(2) in conjunction with the
security.jail.enforce_statfs sysctl.

Approved by:	re (kib), bz (mentor)
2009-07-31 16:00:41 +00:00
John Baldwin
87eca70e0c Fix some LORs between vnode locks and filedescriptor table locks.
- Don't grab the filedesc lock just to read fd_cmask.
- Drop vnode locks earlier when mounting the root filesystem and before
  sanitizing stdin/out/err file descriptors during execve().

Submitted by:	kib
Approved by:	re (rwatson)
MFC after:	1 week
2009-07-31 13:40:06 +00:00
Kevin Lo
6ece67d83f Free allocated Rx ring dma memory/tags.
Reviewed by: yongari@
Approved by: re (kib)
2009-07-31 09:57:42 +00:00
Weongyo Jeong
aa7eaa2fb6 fixes a typo for DWA120 device ID.
Reported by:	Alexander Kuznetsov <skritku at gmail.com>
Approved by:	re (kib)
2009-07-30 18:53:06 +00:00
Xin LI
16e324fc56 Show interface name which received short CARP packet (e.g. a VRRP packet),
in order to match other codepaths nearby.  This makes troubleshooting
easier.

Approved by:	re (kib)
MFC after:	1 month
2009-07-30 17:40:47 +00:00
Jamie Gritton
2b0d6f815e Remove a LOR, where the the sleepable allprison_lock was being obtained
in prison_equal_ip4/6 while an inp mutex was held.  Locking allprison_lock
can be avoided by making a restriction on the IP addresses associated with
jails:

Don't allow the "ip4" and "ip6" parameters to be changed after a jail is
created.  Setting the "ip4.addr" and "ip6.addr" parameters is allowed,
but only if the jail was already created with either ip4/6=new or
ip4/6=disable.  With this restriction, the prison flags in question
(PR_IP4_USER and PR_IP6_USER) become read-only and can be checked
without locking.

This also allows the simplification of a messy code path that was needed
to handle an existing prison gaining an IP address list.

PR:		kern/136899
Reported by:	Dirk Meyer
Approved by:	re (kib), bz (mentor)
2009-07-30 14:28:56 +00:00
Robert Watson
ed3db012fc Reorder and recomment vnet.c and vnet.h on the basis that they are no longer
solely about the virtual network stack memory allocator.

Approved by:	re (vimage blanket)
2009-07-30 12:41:19 +00:00
Robert Watson
4b0026b9fb Add two new privileges for use by OpenAFS, which will be supported for
FreeBSD 8.x.

MFC after:	3 days
Submitted by:	Benjamin Kaduk <kaduk at MIT.EDU>
Approved by:	re (kib)
2009-07-30 08:41:06 +00:00
Alfred Perlstein
9e90dc16ba Missed this file for r195963:
USB core:
  - add support for defragging of written device data.
  - improve handling of alternate settings in device side mode.
  - correct return value from usbd_get_no_alts() function.
  - reported by: HPS
  - P4 ID: 166156, 166168

  - report USB device release information to devd and pnpinfo.
  - reported by: MIHIRA Sanpei Yoshiro
  - P4 ID: 166221

Submitted by:	hps
Approved by:	re
2009-07-30 00:57:54 +00:00
Alfred Perlstein
65c7bc9cbf USB CORE - Improve HID parsing
See PR description for more info. Patch is
implemented differently than suggested, but
having the same result.

PR:     usb/137188

Submitted by:	hps
Approved by:	re
2009-07-30 00:17:08 +00:00
Alfred Perlstein
bf3254f22e USB CORE - compat Linux:
- Patch request from Tim Borgeaud:
- add automatic locking
- add refcount for killing URB's

Submitted by:	hps
Approved by:	re
2009-07-30 00:16:50 +00:00
Alfred Perlstein
a0c61406ad USB controller:
- allow disabling "root_mount_hold()" by setting "hw.usb.no_boot_wait" sysctl

Submitted by:	hps
Approved by:	re
2009-07-30 00:16:32 +00:00
Alfred Perlstein
4743e6da52 ULPT:
- add conditional printer status checking
- P4 ID: 166176

Submitted by:	hps
Approved by:	re
2009-07-30 00:16:06 +00:00
Alfred Perlstein
bd73b18714 USB core:
- add support for defragging of written device data.
- improve handling of alternate settings in device side mode.
- correct return value from usbd_get_no_alts() function.
- reported by: HPS
- P4 ID: 166156, 166168

- report USB device release information to devd and pnpinfo.
- reported by: MIHIRA Sanpei Yoshiro
- P4 ID: 166221

Submitted by:	hps
Approved by:	re
2009-07-30 00:15:50 +00:00
Alfred Perlstein
b27b901cb4 USB serial:
- add new ID for Huawei
- P4 ID: 166150

PR:             usb/136761

Submitted by:	hps
Approved by:	re
2009-07-30 00:15:17 +00:00
Alfred Perlstein
3d11bc19a0 USB audio:
- code factoring patch from "Eygene Ryabinkin"
- P4 ID: 166149

Submitted by:	hps
Approved by:	re
2009-07-30 00:14:56 +00:00
Alfred Perlstein
dddb25f98a USB CORE:
- Add minimum polling support to drive UMASS
  and UKBD in case of panic.
- Add extra check to ukbd probe to fix problem about
  mouse devices attaching like keyboards.
- P4 ID: 166148

Submitted by:	hps
Approved by:	re
2009-07-30 00:14:34 +00:00
Alfred Perlstein
81f8288460 USB input
- add support for setting the UMS polling rate through -F option
           passed to moused.
         - requested by Alexander Best
         - P4 ID: 166075

PR:             usb/125264

Submitted by:	hps
Approved by:	re
2009-07-30 00:13:09 +00:00
Alfred Perlstein
f724bcec31 USB controller:
- patch from Alexander Motin <mav@freebsd.org>
          - add more ID's
          - P4 ID: 165805

Submitted by:	hps
Approved by:	re
2009-07-30 00:12:47 +00:00
Konstantin Belousov
ed190ad06a Fix XEN build breakage, by implementing pmap_invalidate_cache_range()
and using it when appropriate. Merge analogue of the r195836
optimization to XEN.

Approved by:	re (kensmith)
2009-07-29 19:38:33 +00:00
Jamie Gritton
bdfc8cc4cd Don't allow mixing the "vnet" and "ip4/6" jail parameters, since vnet
jails have their own IP stack and don't have access to the parent IP
addresses anyway.  Note that a virtual network stack forms a break
between prisons with regard to the list of allowed IP addresses.

Approved by:	re (kib), bz (mentor)
2009-07-29 16:46:59 +00:00
Jamie Gritton
8986e3a023 Change the default value of the "ip4" and "ip6" jail parameters to
"disable", which only allows access to the parent/physical system's
IP addresses when specifically directed.  Change the default value of
"host" to "new", and don't copy the parent host values, to insulate
jails from the parent hostname et al.

Approved by:	re (kib), bz (mentor)
2009-07-29 16:41:02 +00:00
Rick Macklem
6a795918c1 Fix the experimental nfs client so that it only calls ncl_vinvalbuf()
for NFSv2 and not NFSv4 when nfscl_mustflush() returns 0. Since
nfscl_mustflush() only returns 0 when there is a valid write delegation
issued to the client, it only affects the case of an NFSv4 mount with
callbacks/delegations enabled.

Approved by:	 re (kensmith), kib (mentor)
2009-07-29 14:50:31 +00:00
Konstantin Belousov
8a5ac5d56f As was done in r195820 for amd64, use clflush for flushing cache lines
when memory page caching attributes changed, and CPU does not support
self-snoop, but implemented clflush, for i386.

Take care of possible mappings of the page by sf buffer by utilizing
the mapping for clflush, otherwise map the page transiently. Amd64
used direct map.

Proposed and reviewed by:  alc
Approved by:   re (kensmith)
2009-07-29 08:49:58 +00:00
Robert Watson
791b0ad2bf Eliminate ARG_UPATH[12] arguments to AUDIT_ARG_UPATH() and instead
provide specific macros, AUDIT_ARG_UPATH1() and AUDIT_ARG_UPATH2()
to capture path information for audit records.  This allows us to
move the definitions of ARG_* out of the public audit header file,
as they are an implementation detail of our current kernel-internal
audit record, which may change.

Approved by:	re (kensmith)
Obtained from:	TrustedBSD Project
MFC after:	1 month
2009-07-29 07:44:43 +00:00
Robert Watson
a9bcca799e Revise header comments for vnet.h as we now implement VNET_SYSINIT, not
just VNET_DEFINE in vnet.h.

Approved by:	re (vimage blanket)
2009-07-28 22:17:34 +00:00
Robert Watson
b146fc1bf0 Rework vnode argument auditing to follow the same structure, in order
to avoid exposing ARG_ macros/flag values outside of the audit code in
order to name which one of two possible vnodes will be audited for a
system call.

Approved by:	re (kib)
Obtained from:	TrustedBSD Project
MFC after:	1 month
2009-07-28 21:52:24 +00:00
Robert Watson
e4b4bbb665 Audit file descriptors passed to fooat(2) system calls, which are used
instead of the root/current working directory as the starting point for
lookups.  Up to two such descriptors can be audited.  Add audit record
BSM encoding for fooat(2).

Note: due to an error in the OpenBSM 1.1p1 configuration file, a
further change is required to that file in order to fix openat(2)
auditing.

Approved by:	re (kib)
Reviewed by:	rdivacky (fooat(2) portions)
Obtained from:	TrustedBSD Project
MFC after:	1 month
2009-07-28 21:39:58 +00:00
Julian Elischer
3d1001cb11 Startup the vnet part of initialization a bit after the global part.
Fixes crash on boot if ipfw compiled in.

Submitted by:	tegge@
Reviewed by:	tegge@
Approved by:	re (kib)
2009-07-28 19:58:07 +00:00
Julian Elischer
7973fba3a4 Somewhere along the line accept sockets stopped honoring the
FIB selected for them. Fix this.

Reviewed by:	ambrisko
Approved by:	re (kib)
MFC after:	3 days
2009-07-28 19:43:27 +00:00
Qing Li
9fca4f79c7 The new flow table caches both the routing table entry as well as the
L2 information. For an indirect route the cached L2 entry contains the
MAC address of the gateway. Typically the default route is used to
transmit multicast packets when explicit multicast routes are not
available. The ether_output() function bypasses L2 resolution function
if it verifies the L2 cache is valid, because the cached L2 address
(a unicast MAC address) is copied into the packets as the destination
MAC address. This validation, however, does not apply to broadcast and
multicast packets because the destination MAC address is mapped
according to a standard method instead.

Submitted by:	Xin Li
Reviewed by:	bz
Approved by:	re
2009-07-28 17:16:54 +00:00
Michael Tuexen
bf3d517756 Fix a bug where wrong initialization value
in used for an SCTP specific sysctl variable.

Approved by: re, rrs(mentor).
MFC after: 2 weeks.
2009-07-28 15:07:41 +00:00
Randall Stewart
cfde3ff70b Turns out that when a receiver forwards through its TNS's the
processing code holds the read lock (when processing a
FWD-TSN for pr-sctp). If it finds stranded data that
can be given to the application, it calls sctp_add_to_readq().
The readq function also grabs this lock. So if INVAR is on
we get a double recurse on a non-recursive lock and panic.

This fix will change it so that readq() function gets a
flag to tell if the lock is held, if so then it does not
get the lock.

Approved by:	re@freebsd.org (Kostik Belousov)
MFC after:	1 week
2009-07-28 14:09:06 +00:00
Weongyo Jeong
62dcd8657b adds DLINK2 DWA120 device.
PR:		usb/136950
Reported by:	Alexander Kuznetsov <skritku at gmail.com>
Approved by:	re (kib)
2009-07-27 20:17:20 +00:00
Qing Li
df813b7ea2 This patch does the following:
- Allow loopback route to be installed for address assigned to
      interface of IFF_POINTOPOINT type.
    - Install loopback route for an IPv4 interface addreess when the
      "useloopback" sysctl variable is enabled. Similarly, install
      loopback route for an IPv6 interface address when the sysctl variable
      "nd6_useloopback" is enabled. Deleting loopback routes for interface
      addresses is unconditional in case these sysctl variables were
      disabled after an interface address has been assigned.

Reviewed by:	bz
Approved by:	re
2009-07-27 17:08:06 +00:00
John Baldwin
8e3764c0a6 Fix the freebsd32 versions of semsys(), shmsys(), and msgsys() to use the
old ABI versions of the relevant control system call (e.g.
freebsd7_freebsd32_msgctl() instead of freebsd32_msgctl() for msgsys()).

Approved by:	re (kib)
2009-07-27 16:03:04 +00:00
Pawel Jakub Dawidek
abd8353f5d We don't support ephemeral IDs in FreeBSD and without this fix ZFS can
panic when in zfs_fuid_create_cred() when userid is negative. It is
converted to unsigned value which makes IS_EPHEMERAL() macro to
incorrectly report that this is ephemeral ID. The most reasonable
solution for now is to always report that the given ID is not ephemeral.

PR:		kern/132337
Submitted by:	Matthew West <freebsd@r.zeeb.org>
Tested by:	Thomas Backman <serenity@exscape.org>, Michael Reifenberger <mike@reifenberger.com>
Approved by:	re (kib)
MFC after:	2 weeks
2009-07-27 14:52:34 +00:00
Rui Paulo
3ca80f0dbc Mesh fixes, namely:
* don't clobber proxy entries
* HWMP seq number processing, including discard of old frames
* flush routing table entries based on nexthop
* print route flags in ifconfig
* more debugging messages and comments

Proxy changes submitted by sam.

Approved by:	re (kib)
2009-07-27 14:22:09 +00:00
Rui Paulo
1f93ae9453 Refine the MacBook hack to only match early models that have Intel ICH.
Discussed with:	kjim
Approved by:	re (kib)
2009-07-27 13:51:55 +00:00
Michael Tuexen
8e71b6947a Fix the handling of unordered messages when using
PR-SCTP.

Approved by: re, rrs (mentor)
MFC after: 3 weeks.
2009-07-27 13:41:45 +00:00
Michael Tuexen
4420d9a062 Get rid of unused field. This will also be deleted
in the official speciication of the SCTP socket API.

Approved by:re, rrs (mentor)
2009-07-27 12:09:32 +00:00
Michael Tuexen
47a490cbbc Add a missing unlock for the inp lock when
returning early from sctp_add_to_readq().

Approved by: re, rrs (mentor)
MFC after: 2 weeks.
2009-07-26 15:06:59 +00:00
Alexander Motin
b06555e4fc Restore PATA device probe order, broken by PMP support implementation,
requesting IDENTIFY from slave device first. This order is important
for proper cable type detection by master device.

PR:		kern/136438
Approved by:	re (kib)
2009-07-26 14:04:48 +00:00
Bjoern A. Zeeb
d0ea47437a Update epair(4) to the new netisr implementation and polish
things a bit:
- use dpcpu data to track the ifps with packets queued up,
- per-cpu locking and driver flags
- along with .nh_drainedcpu and NETISR_POLICY_CPU.
- Put the mbufs in flight reference count, preventing interfaces
  from going away, under INVARIANTS as this is a general problem
  of the stack and should be solved in if.c/netisr but still good
  to verify the internal queuing logic.
- Permit changing the MTU to virtually everythinkg like we do for loopback.

Hook epair(4) up to the build.

Approved by:	re (kib)
2009-07-26 12:20:07 +00:00
Bjoern A. Zeeb
be31e5e7b5 Make the in-kernel logic for the SIOCSIFVNET, SIOCSIFRVNET ioctls
(ifconfig ifN (-)vnet <jname|jid>) work correctly.

Move vi_if_move to if.c and split it up into two functions(*),
one for each ioctl.

In the reclaim case, correctly set the vnet before calling if_vmove.

Instead of silently allowing a move of an interface from the current
vnet to the current vnet, return an error. (*)

There is some duplicate interface name checking before actually moving
the interface between network stacks without locking and thus race
prone. Ideally if_vmove will correctly and automagically handle these
in the future.

Suggested by:	rwatson (*)
Approved by:	re (kib)
2009-07-26 11:29:26 +00:00
Alexander Motin
1a00526bec Add note, that ahci(4) and siis(4) supersede ata(4) drivers.
Approved by:	re (implicitly)
2009-07-25 18:45:09 +00:00
Alexander Motin
e19ef875b1 Add ahci and siis drivers to NOTES.
Approved by:	re (implicitly)
2009-07-25 17:40:49 +00:00
Jamie Gritton
7cbf72137f Some jail parameters (in particular, "ip4" and "ip6" for IP address
restrictions) were found to be inadequately described by a boolean.
Define a new parameter type with three values (disable, new, inherit)
to handle these and future cases.

Approved by:	re (kib), bz (mentor)
Discussed with:	rwatson
2009-07-25 14:48:57 +00:00
Julian Elischer
9d85f50ad5 Catch ipfw up to the rest of the vimage code.
It got left behind when it moved to its new location.

Approved by:	re (kensmith)
2009-07-25 06:42:42 +00:00
Jack F Vogel
45289e2ded Improvement on the last change, this gives a precise
way to tell the one and only interface that a vlan
event is for. Thanks to John Baldwin for the patch.

Approved by: re
2009-07-24 21:35:52 +00:00
Brooks Davis
254d03c508 Introduce a new sysctl process mib, kern.proc.groups which adds the
ability to retrieve the group list of each process.

Modify procstat's -s option to query this mib when the kinfo_proc
reports that the field has been truncated.  If the mib does not exist,
fall back to the truncated list.

Reviewed by:	rwatson
Approved by:	re (kib)
MFC after:	2 weeks
2009-07-24 19:12:19 +00:00
John Baldwin
5f60e483c8 Bump __FreeBSD_version for the introduction of OBJT_SG.
Approved by:	re (kensmith)
2009-07-24 18:31:04 +00:00
Jack F Vogel
387424df40 This delta fixes two bugs:
- When a vlan event occurs a check was not made that
    the event was actually for the interface, thus resulting
    in a panic. All three drivers have this vulnerability. Add
    a check for this condition.
  - Secondly, there was a duplicate buf_ring free in the em
    driver resulting in a panic on unload. Remove.

Approved by:  re
2009-07-24 16:57:49 +00:00
Jack F Vogel
8bd0025fdd A small number of systems in the ICH9/10 family have a flash
part that is made up of 8K banks rather than 4K, if these
systems are using bank 1 then the last change in this code
breaks the bank read, resulting in an invalid checksum of
the eeprom during driver load. This change fixes this.

Approved by:  re
2009-07-24 16:54:22 +00:00
Sam Leffler
8ed1835dea revert OACTIVE part of r195845; instead fix the comment so it does not refer
to the old hack removed in r193312

Approved by:	re (implicit)
2009-07-24 15:37:02 +00:00
Sam Leffler
ff5aac8eb7 correct handling of IFF_PROMISC; this should not be pushed to the parent
device except for monitor and ahdemo mode vaps

Reviewed by:	rpaulo
Approved by:	re (kensmith)
2009-07-24 15:28:29 +00:00
Sam Leffler
983a2c892b monitor mode vaps are meant to be read-only so they can operate on any
frequency w/o regulatory issues, do this by hooking if_transmit and
if_output with routines that discard all transmits

Reviewed by:	thompsa, cbzimmer (intent)
Approved by:	re (kensmith)
2009-07-24 15:27:02 +00:00
Sam Leffler
224881b466 o kill old code no longer needed after r193312
o count output packets+errors for frames sent through ieee80211_output

Approved by:	re (kensmith)
2009-07-24 15:22:12 +00:00
John Baldwin
63cbfee7b0 Remove debugging that crept in with previous commit.
Reported by:	nwhitehorn
Approved by:	re (kib)
2009-07-24 15:06:49 +00:00
Brooks Davis
1b5768be71 Revert the changes to struct kinfo_proc in r194498. Instead, fill
in up to 16 (KI_NGROUPS) values and steal a bit from ki_cr_flags
(all bits currently unused) to indicate overflow with the new flag
KI_CRF_GRP_OVERFLOW.

This fixes procstat -s.

Approved by: re (kib)
2009-07-24 15:03:10 +00:00
John Baldwin
013818111a Add a new type of VM object: OBJT_SG. An OBJT_SG object is very similar to
a device pager (OBJT_DEVICE) object in that it uses fictitious pages to
provide aliases to other memory addresses.  The primary difference is that
it uses an sglist(9) to determine the physical addresses for a given offset
into the object instead of invoking the d_mmap() method in a device driver.

Reviewed by:	alc
Approved by:	re (kensmith)
MFC after:	2 weeks
2009-07-24 13:50:29 +00:00
Robert Watson
d0728d7174 Introduce and use a sysinit-based initialization scheme for virtual
network stacks, VNET_SYSINIT:

- Add VNET_SYSINIT and VNET_SYSUNINIT macros to declare events that will
  occur each time a network stack is instantiated and destroyed.  In the
  !VIMAGE case, these are simply mapped into regular SYSINIT/SYSUNINIT.
  For the VIMAGE case, we instead use SYSINIT's to track their order and
  properties on registration, using them for each vnet when created/
  destroyed, or immediately on module load for already-started vnets.
- Remove vnet_modinfo mechanism that existed to serve this purpose
  previously, as well as its dependency scheme: we now just use the
  SYSINIT ordering scheme.
- Implement VNET_DOMAIN_SET() to allow protocol domains to declare that
  they want init functions to be called for each virtual network stack
  rather than just once at boot, compiling down to DOMAIN_SET() in the
  non-VIMAGE case.
- Walk all virtualized kernel subsystems and make use of these instead
  of modinfo or DOMAIN_SET() for init/uninit events.  In some cases,
  convert modular components from using modevent to using sysinit (where
  appropriate).  In some cases, do minor rejuggling of SYSINIT ordering
  to make room for or better manage events.

Portions submitted by:	jhb (VNET_SYSINIT), bz (cleanup)
Discussed with:		jhb, bz, julian, zec
Reviewed by:		bz
Approved by:		re (VIMAGE blanket)
2009-07-23 20:46:49 +00:00
Alan Cox
3313145243 Eliminate unnecessary cache and TLB flushes by pmap_change_attr(). (This
optimization was implemented in the amd64 version roughly 1 year ago.)

Approved by:	re (kensmith)
2009-07-23 19:43:23 +00:00
Nathan Whitehorn
ccf6415e82 Fix serial console on Apple Xserve G5 by falling back to input-device-1
if input-device is unavailable. The Xserve G5 defaults to using
screen/keyboard for output-device/input-device even if these are not
installed, and then falls back to serial ports at boot time.

Reviewed by:	marcel
Hardware from:	grehan
Approved by:	re (kib)
2009-07-23 12:51:27 +00:00
Rick Macklem
93a15e2054 When vfs.newnfs.callback_addr is set to an IPv4 address, the
experimental NFSv4 client might try and use it as an IPv6 address,
breaking callbacks. The fix simply initializes the isinet6 variable
for this case.

Approved by:	re (kensmith), kib (mentor)
2009-07-22 18:10:44 +00:00
Edward Tomasz Napierala
d2ceff236a Fix extattr_list_file(2) on ZFS in case the attribute directory
doesn't exist and user doesn't have write access to the file.
Without this fix, it returns bogus value instead of 0.  For some
reason this didn't manifest on my kernel compiled with -O0.

PR:		kern/136601
Submitted by:	Jaakko Heinonen <jh at saunalahti dot fi>
Approved by:	re (kib)
2009-07-22 15:15:58 +00:00
Rick Macklem
c79e697621 Add changes to the experimental nfs client to use the PBDRY flag for
msleep(9) when a vnode lock or similar may be held. The changes are
just a clone of the changes applied to the regular nfs client by
r195703.

Approved by:	re (kensmith), kib (mentor)
2009-07-22 14:37:53 +00:00
Konstantin Belousov
206a336872 When the page caching attributes are changed, after new mapping is
established, OS shall flush the caches on all processors that may have
used the mapping previously. This operation is not needed if processors
support self-snooping. If not, but clflush instruction is implemented
on the CPU, series of the clflush can be used on the mapping region.
Otherwise, we have to flush the whole cache. The later operation is very
expensive, and AMD-made CPUs do not have self-snooping.

Implement cache flush for remapped region by using clflush for amd64,
when supported by CPU.

Proposed and reviewed by:	alc
Approved by:	re (kensmith)
2009-07-22 14:32:38 +00:00
Rick Macklem
7f1968ba10 When using an NFSv4 mount in the experimental nfs client with delegations
being issued from the server, there was a case where an Open issued locally
based on the delegation would be released before the associated vnode
became inactive. If the delegation was recalled after the open was released,
an Open against the server would not have been acquired and subsequent I/O
operations would need to use the special stateid of all zeros. This patch
fixes that case.

Approved by:	re (kensmith), kib (mentor)
2009-07-22 14:32:28 +00:00
Andrew Gallatin
94c7d993a3 mxge's tunable hw.mxge.rss_hash_type cannot be set from the
loader, because it uses a reserved suffix (_type).  Fix
this by removing the "_" and renaming the tunable to
hw.mxge.rss_hashtype.  The old (rss_hash_type) tunable is
still fetched, in case people load the driver via scripts.
When both are present in the kernel environment,
the new value (hw.mxge.rss_hashtype) overrides the old
value.

Approved by:	re (kib)
2009-07-22 11:57:34 +00:00
Bjoern A. Zeeb
a08362ce46 sysctl_msec_to_ticks is used with both virtualized and
non-vrtiualized sysctls so we cannot used one common function.

Add a macro to convert the arg1 in the virtualized case to
vnet.h to not expose the maths to all over the code.

Add a wrapper for the single virtualized call, properly handling
arg1 and call the default implementation from there.

Convert the two over places to use the new macro.

Reviewed by:	rwatson
Approved by:	re (kib)
2009-07-21 21:58:55 +00:00
Sam Leffler
e50821abe7 store mesh timers as ticks and sysctls for changing the defaults
Reviewed by:	rpaulo
Approved by:	re (kib)
2009-07-21 19:38:22 +00:00
Sam Leffler
411ccf5f63 Correct handling of keys that already have a hardware/device key index:
this was broken in r183248 when the check of wk_keyix was replaced by
a check of IEEE80211_KEY_DEVKEY (because the flag was clobbered).  Define
IEEE80211_KEY_DEVICE to specify flags that are owned by net80211/driver
and use this to preserve IEEE80211_KEY_DEVKEY so we don't ask the driver
for another key index when we already have one.

Testing by:	Daniel Thiele, Wes Morgan
Reviewed by:	rpaulo
Approved by:	re (kib)
2009-07-21 19:36:32 +00:00
Sam Leffler
bd6c4dd05d correct setup of opt_ddb.h
Submitted by:	jkim
Approved by:	re (kib)
2009-07-21 19:24:53 +00:00
Sam Leffler
82cadd5aa0 Fix handling of AR_RX_FILTER_BSSID: write the shadow value for AR_MISC_MODE
so other register writes preserve the setting of AR_MISC_MODE_BSSID_MATCH_FORCE.

Reviewed by:	rpaulo
Approved by:	re (kib)
2009-07-21 19:23:34 +00:00
Marius Strobl
fada2a867d Add a MD __PCI_BAR_ZERO_VALID which denotes that BARs containing 0
actually specify valid bases that should be treated just as normal.
The PCI specifications have no indication that 0 would be a magic value
indicating a disabled BAR as commonly used on at least amd64 and i386
but not sparc64. It's unclear what to do in pci_delete_resource()
instead of writing 0 to a BAR though as there's no (other) way do
disable individual BARs so its decoding is left enabled in case of
__PCI_BAR_ZERO_VALID for now.

Approved by:	re (kib), jhb
MFC after:	1 week
2009-07-21 19:06:39 +00:00
Sam Leffler
fe0dd78965 track whether any mesh vaps are present to correctly setup the rx filter
when, for example, an ap vap is created first

Reviewed by:	rpaulo
Approved by:	re (kib)
2009-07-21 19:01:04 +00:00
Alan Cox
9e367360e3 Catch up with r195249, "Improve the handling of cpuset with interrupts."
Specifically, update the return type of xenpic_assign_cpu() so that this
file compiles again.

Approved by:	re (kib)
2009-07-21 16:54:11 +00:00
Rui Paulo
a6d54dae20 Enable mesh support.
Submitted by:	jkim
Approved by:	re (kib)
2009-07-21 14:23:05 +00:00
Rui Paulo
5b4d5f9ee2 Improve the printf message when a module failed to load. This gives the
user some clue about the possibility of a __FreeBSD_version mismatch.

Discussed with:	rwatson, jhb
Approved by:	re (kib)
2009-07-21 14:18:25 +00:00
Alexander Motin
67b87e4429 Add siis CAM driver for SiliconImage SiI3124/3132/3531 SATA2 controllers.
Driver supports Serial ATA and ATAPI devices, Port Multipliers
(including FIS-based switching), hardware command queues (31 command
per port) and Native Command Queuing. This is probably the second on
popularity, after AHCI, type of SATA2 controllers, that benefits from
using CAM, because of hardware command queuing support.

Approved by:    re (kib)
2009-07-21 12:32:46 +00:00
Rafal Jaworowski
570255d8a5 Do not use OCP85XX_LBC_OFF twice when accessing LBC registers on MPC85XX.
It turns LBC control registers were not programmed correctly on MPC85XX. We
were accessing bogus addresses as the base offset (OCP85XX_LBC_OFF) was
erroneously added during offset calculations.  Effectively the state of LBC
control registers was not altered by the kernel initialization code, but
everything worked as long as we coincided to use the same settings (LBC decode
windows) as firmware has initialized.

Submitted by:	Lukasz Wojcik
Reviewed by:	marcel
Approved by:	re (kensmith)
Obtained from:	Semihalf
2009-07-21 08:38:45 +00:00
Rafal Jaworowski
84a5818564 Make dcache_inv_range() point to the proper routines on ARM9 and ARM9E/ARM10.
On some ARM variations CPU func dispatcher has the D-cache invalidate method
point to write-back invalidate, which is wrong, and can lead to a crash/panic
on affected platforms.

Spotted by:	HPS
Reviewed by:	cognet
Approved by:	re (kib)
2009-07-21 08:29:19 +00:00
Coleman Kane
e0d12c4b94 Fix regression in last set of commits. Submitted via e-mail and then
nagged again via PR. Thank Paul for his persistence and contributions.

PR:		136895
Submitted by:	Paul B. Mahol <onemda@gmail.com>
Reviewed by:	sam (timeout, 10 days), weongyo (timeout, 10 days), me
Approved by:	re (Kostik Belousov <kostikbel@gmail.com>)
2009-07-20 23:21:19 +00:00
Robert Watson
a511354af4 Back out the moving in r195782 of V_ip_id's initialization from the top
back to the bottom of ip_init() as found in 7.x.  I missed the fact that
the bottom half of the init routine only runs in the !VNET case.

Submitted by:	zec
Approved by:	re (vimage blanket)
2009-07-20 19:40:09 +00:00
Edward Tomasz Napierala
65588fd503 Fix permission handling for extended attributes in ZFS. Without
this change, ZFS uses SunOS Alternate Data Streams semantics - each
EA has its own permissions, which are set at EA creation time
and - unlike SunOS - invisible to the user and impossible to change.
From the user point of view, it's just broken: sometimes access
is granted when it shouldn't be, sometimes it's denied when
it shouldn't be.

This patch makes it behave just like UFS, i.e. depend on current
file permissions.  Also, it fixes returned error codes (ENOATTR
instead of ENOENT) and makes listextattr(2) return 0 instead
of EPERM where there is no EA directory (i.e. the file never had
any EA).

Reviewed by:	pjd (idea, not actual code)
Approved by:	re (kib)
2009-07-20 19:16:42 +00:00
Rui Paulo
c104cff26e More mesh bits, namely:
* bridge support (sam)
* handling of errors (sam)
* deletion of inactive routing entries
* more debug msgs (sam)
* fixed some inconsistencies with the spec.
* decap is now specific to mesh (sam)
* print mesh seq. no. on ifconfig list mesh
* small perf. improvements

Reviewed by:	sam
Approved by:	re (kib)
2009-07-20 19:12:08 +00:00
Robert Watson
0a4747d4d0 Garbage collect vnet module registrations that have neither constructors
nor destructors, as there's no actual work to do.

In most cases, the constructors weren't needed because of the existing
protocol initialization functions run by net_init_domain() as part of
VNET_MOD_NET, or they were eliminated when support for static
initialization of virtualized globals was added.

Garbage collect dependency references to modules without constructors or
destructors, notably VNET_MOD_INET and VNET_MOD_INET6.

Reviewed by:	bz
Approved by:	re (vimage blanket)
2009-07-20 13:55:33 +00:00
Rafal Jaworowski
331f685743 ARM pmap fixes.
a)  nocache-remap problem

   When a page is remapped into a non-cacheable virtual memory region there
   was no associated write-back invalidate operation performed. We remove
   writeback of the original buffer size from bus_dmamem_alloc() and add
   appropriate L1/L2 flush operation.

b) missing write-back invalidate operation

   In pmap_kremove a page is removed so we must do a write-back
   invalidate operation aligned to the page virtual address.

Submitted by:	Michal Hajduk
Reviewed by:	Mark Tinguely, rpaulo, stas
Approved by:	re (kib)
Obtained from:	Semihalf
2009-07-20 07:53:07 +00:00
Robert Watson
17ef1feb8a Add macros VNET_SETNAME and VNET_SYMPREFIX, and expose to userspace if
_WANT_VNET is defined.  This way we don't need separate definitions in
libkvm.

Reviewed by:	bz
Approved by:	re (vimage blanket)
2009-07-20 07:50:50 +00:00
Scott Long
62fdee1dda Fix an apparently harmless typo.
Approved by:	re
2009-07-20 03:59:00 +00:00
Alan Cox
9861cbc6ca Change the handling of fictitious pages by pmap_page_set_memattr() on
amd64 and i386.  Essentially, fictitious pages provide a mechanism for
creating aliases for either normal or device-backed pages.  Therefore,
pmap_page_set_memattr() on a fictitious page needn't update the direct
map or flush the cache.  Such actions are the responsibility of the
"primary" instance of the page or the device driver that "owns" the
physical address.  For example, these actions are already performed by
pmap_mapdev().

The device pager needn't restore the memory attributes on a fictitious
page before releasing it.  It's now pointless.

Add pmap_page_set_memattr() to the Xen pmap.

Approved by:	re (kib)
2009-07-19 21:40:19 +00:00
Konstantin Belousov
7ac5806bfb When buffer write is failed, it is wrong for brelse() to invalidate
portion of the page that was written. Among other problems, this
page might be picked up by pagedaemon, with failed assertion in
vm_pageout_flush() about validity of the page.

Reported and tested by:	pho
Approved by:	re (kensmith)
MFC after:	3 weeks
2009-07-19 20:25:59 +00:00
Robert Watson
006e9db452 Normalize field naming for struct vnet, fix two debugging printfs that
print them.

Reviewed by:	bz
Approved by:	re (kensmith, kib)
2009-07-19 17:40:45 +00:00
Ken Smith
3ca3047aee Bump the version of all non-symbol-versioned shared libraries in
preparation for 8.0-RELEASE.  Add the previous version of those
libraries to ObsoleteFiles.inc and bump __FreeBSD_Version.

Reviewed by:    kib
Approved by:    re (rwatson)
2009-07-19 17:25:24 +00:00
Sam Leffler
030d10ee97 add urtw
Approved by:	re (kib)
2009-07-19 16:54:24 +00:00
Rick Macklem
80e556956e Fix two bugs in the experimental nfs client:
- When the root vnode was acquired during mounting, mnt_stat.f_iosize was
  still set to 0, so getnewvnode() would set bo_bsize == 0. This would
  confuse getblk(), so that it always returned the first block causing
  the problem when the root directory of the mount point was greater
  than one block in size. It was fixed by setting mnt_stat.f_iosize to
  NFS_DIRBLKSIZ before calling ncl_nget() to acquire the root vnode.
- NFSMNT_INT was being set temporarily while the initial connect to a
  server was being done. This erroneously configured the krpc for
  interruptible RPCs, which caused problems because signals weren't
  being masked off as they would have been for interruptible mounts.
  This code was deleted to fix the problem. Since mount_nfs does an
  NFS null RPC before the mount system call, connections to the server
  should work ok.

Tested by:	swell dot k at gmail dot com
Approved by:	re (kensmith), kib (mentor)
2009-07-19 16:44:26 +00:00
Robert Watson
94da184f68 Expose the definitions of 'struct vnet' and 'VNET_MAGIC_N' to userspace
if _WANT_VNET is defined.  This is required so that libkvm can locate
virtual network stack instances in order to reach their global variables
for monitoring and crashdump analysis.

Reviewed by:	bz
Approved by:	re (kib)
2009-07-19 15:21:42 +00:00
Robert Watson
5ee847d3ac Reimplement and/or implement vnet list locking by replacing a mostly
unused custom mutex/condvar-based sleep locks with two locks: an
rwlock (for non-sleeping use) and sxlock (for sleeping use).  Either
acquired for read is sufficient to stabilize the vnet list, but both
must be acquired for write to modify the list.

Replace previous no-op read locking macros, used in various places
in the stack, with actual locking to prevent race conditions.  Callers
must declare when they may perform unbounded sleeps or not when
selecting how to lock.

Refactor vnet sysinits so that the vnet list and locks are initialized
before kernel modules are linked, as the kernel linker will use them
for modules loaded by the boot loader.

Update various consumers of these KPIs based on whether they may sleep
or not.

Reviewed by:	bz
Approved by:	re (kib)
2009-07-19 14:20:53 +00:00
Sam Leffler
519f677aff Move code that does payload realigment to a new routine, ieee80211_realign,
so it can be reused.  While here rewrite the logic to always use a single mbuf.

Reviewed by:	rpaulo
Approved by:	re (kib)
2009-07-18 20:19:53 +00:00
Bruce M Simpson
b36c89e55f Fix a problem, whereby misbehaving IPv6 applications, which don't include
a valid zone ID or interface identifier in a v6 multicast leave, would
trigger a fairly paranoid KASSERT().

Observed with Boost++ regression tests on ref8.freebsd.org.

Approved by:	re (kib)
2009-07-18 17:38:18 +00:00
Ulf Lilleengen
b79cac0f92 - Fix the issue with read access count modification on RAID-5 plexes properly.
If the access counts were not increased and decreased in equal numbers by
  gvinum consumers, the read access count would be inconsistent with the write
  access count. Instead, modify the read access count with the write access
  count directly to prevent any inconsistencies.

Approved by:	re (kib)
2009-07-18 11:12:48 +00:00
Alan Cox
13de722155 An addendum to r195649, "Add support to the virtual memory system for
configuring machine-dependent memory attributes...":

Don't set the memory attribute for a "real" page that is allocated to
a device object in vm_page_alloc().  It is a pointless act, because
the device pager replaces this "real" page with a "fake" page and sets
the memory attribute on that "fake" page.

Eliminate pointless code from pmap_cache_bits() on amd64.

Employ the "Self Snoop" feature supported by some x86 processors to
avoid cache flushes in the pmap.

Approved by:	re (kib)
2009-07-18 01:50:05 +00:00
Alexander Motin
b2da774623 Fix copy-paste bug. Use regular non-polled mode for executing FLUSHCACHE
command on disk close.

Approved by:	re (implicitly)
2009-07-17 21:48:08 +00:00
Rick Macklem
874bb76647 Patch the regular nfs client in a manner analagous to
r195704 for the experimental client. The patch avoids calling vn_lock()
for the case where nfs_nget() has acquired the same vnode as dvp,
since nfs_nget() has already locked the vnode.

Reviewed by:	kib, jhb
Approved by:	re (kensmith), kib (mentor)
2009-07-17 19:38:07 +00:00
Rui Paulo
8b9fde4324 Add IEEE80211_SUPPORT_MESH, following similar change to nanobsd and
other GENERIC kernels.

Approved by:	re (kib)
2009-07-17 18:35:45 +00:00
Jamie Gritton
7afcbc18b3 Remove the interim vimage containers, struct vimage and struct procg,
and the ioctl-based interface that supported them.

Approved by:	re (kib), bz (mentor)
2009-07-17 14:48:21 +00:00
Robert Watson
597df30e62 Import OpenBSM 1.1p1 from vendor branch to 8-CURRENT, populating
contrib/openbsm and a subset also imported into sys/security/audit.
This patch release addresses several minor issues:

- Fixes to AUT_SOCKUNIX token parsing.
- IPv6 support for au_to_me(3).
- Improved robustness in the parsing of audit_control, especially long
  flags/naflags strings and whitespace in all fields.
- Add missing conversion of a number of FreeBSD/Mac OS X errnos to/from BSM
  error number space.

MFC after:	3 weeks
Obtained from:	TrustedBSD Project
Sponsored by:	Apple, Inc.
Approved by:	re (kib)
2009-07-17 14:02:20 +00:00
Robert Watson
5d171016e7 Vendor import of OpenBSM 1.1p1, which incorporates the following changes
since the last imported OpenBSM release:

OpenBSM 1.1p1

- Fixes to AUT_SOCKUNIX token parsing.
- IPv6 support for au_to_me(3).
- Improved robustness in the parsing of audit_control, especially long
  flags/naflags strings and whitespace in all fields.
- Add missing conversion of a number of FreeBSD/Mac OS X errnos to/from BSM
  error number space.

Obtained from:  TrustedBSD Project
Sponsored by:   Apple, Inc.
2009-07-17 12:18:39 +00:00
Robert Watson
1e77c1056a Remove unused VNET_SET() and related macros; only VNET_GET() is
ever actually used.  Rename VNET_GET() to VNET() to shorten
variable references.

Discussed with:	bz, julian
Reviewed by:	bz
Approved by:	re (kensmith, kib)
2009-07-16 21:13:04 +00:00
Alexander Motin
d96aeec8ff Limit IOCATAREQUEST ioctl data size to controller's maximum I/O size.
It fixes kernel panic when requested size is too large (0xffffffff),

PR:             kern/136726
Approved by:    re (kib)
MFC after:      2 weeks
2009-07-16 19:48:39 +00:00
Ken Smith
e57e15a182 Prepare for the 8.0-BETA2 builds.
Approved by:	re (implicit)
2009-07-15 17:29:05 +00:00
Andriy Gapon
b064b6d1cd dtrace_gethrtime: improve scaling of TSC ticks to nanoseconds
Currently dtrace_gethrtime uses formula similar to the following for
converting TSC ticks to nanoseconds:
rdtsc() * 10^9 / tsc_freq
The dividend overflows 64-bit type and wraps-around every 2^64/10^9 =
18446744073 ticks which is just a few seconds on modern machines.

Now we instead use precalculated scaling factor of
10^9*2^N/tsc_freq < 2^32 and perform TSC value multiplication separately
for each 32-bit half.  This allows to avoid overflow of the dividend
described above.
The idea is taken from OpenSolaris.
This has an added feature of always scaling TSC with invariant value
regardless of TSC frequency changes. Thus the timestamps will not be
accurate if TSC actually changes, but they are always proportional to
TSC ticks and thus monotonic. This should be much better than current
formula which produces wildly different non-monotonic results on when
tsc_freq changes.

Also drop write-only 'cp' variable from amd64 dtrace_gethrtime_init()
to make it identical to the i386 twin.

PR:		kern/127441
Tested by:	Thomas Backman <serenity@exscape.org>
Reviewed by:	jhb
Discussed with:	current@, bde, gnn
Silence from:	jb
Approved by:	re (gnn)
MFC after:	1 week
2009-07-15 17:07:39 +00:00
Robert Watson
cd2b95a237 r195699 introduced an assertion regarding when progbits data in kernel
modules was present, which turns out to be false in some situations.
Back out the assertion.

Reported by:	Luiz Otavio O Souza <lists.br at gmail.com>,
		Florian Smeets <flo at kasimir.com>
Approved by:	re (kensmith) (implicit)
2009-07-15 09:19:01 +00:00
Robert Watson
c1e200ffcc Add missing license line for vnet.h, correct white space nit.
Approved by:	re (kensmith) (implicit)
2009-07-15 00:56:15 +00:00
Rick Macklem
405229f913 Fix the experimental nfs client so that it does not cause a
"share->excl" panic when doing a lookup of dotdot at the root
of a server's file system. The patch avoids calling vn_lock()
for that case, since nfscl_nget() has already acquired a lock
for the vnode.

Approved by:	re (kensmith), kib (mentor)
2009-07-14 23:10:23 +00:00
Konstantin Belousov
b35687df13 Use PBDRY flag for msleep(9) in NFS and NLM when sleeping thread owns
kernel resources that block other threads, like vnode locks. The SIGSTOP
sent to such thread (process, rather) shall not stop it until thread
releases the resources.

Tested by:	pho
Reviewed by:	jhb
Approved by:	re (kensmith)
2009-07-14 22:54:29 +00:00
Konstantin Belousov
f33a947b56 Add new msleep(9) flag PBDY that shall be specified together with
PCATCH, to indicate that thread shall not be stopped upon receipt of
SIGSTOP until it reaches the kernel->usermode boundary.

Also change thread_single(SINGLE_NO_EXIT) to only stop threads at
the user boundary unconditionally.

Tested by:	pho
Reviewed by:	jhb
Approved by:	re (kensmith)
2009-07-14 22:52:46 +00:00
Konstantin Belousov
79799053a7 Move the repeated code to calculate the number of the threads in the
process that still need to be suspended or exited from thread_single
into the new function calc_remaining().

Tested by:	pho
Reviewed by:	jhb
Approved by:	re (kensmith)
2009-07-14 22:51:31 +00:00
Konstantin Belousov
c8167830f9 When wakeup(9) is going to notify swapper, assert that wait channel is not
equal to &proc0. It shall be not, since proc0 stack is not swappable, and
kick_proc0() is wakeup(&proc0).

Reviewed by:	jhb
Approved by:	re (kensmith)
2009-07-14 22:50:41 +00:00
Robert Watson
eddfbb763d Build on Jeff Roberson's linker-set based dynamic per-CPU allocator
(DPCPU), as suggested by Peter Wemm, and implement a new per-virtual
network stack memory allocator.  Modify vnet to use the allocator
instead of monolithic global container structures (vinet, ...).  This
change solves many binary compatibility problems associated with
VIMAGE, and restores ELF symbols for virtualized global variables.

Each virtualized global variable exists as a "reference copy", and also
once per virtual network stack.  Virtualized global variables are
tagged at compile-time, placing the in a special linker set, which is
loaded into a contiguous region of kernel memory.  Virtualized global
variables in the base kernel are linked as normal, but those in modules
are copied and relocated to a reserved portion of the kernel's vnet
region with the help of a the kernel linker.

Virtualized global variables exist in per-vnet memory set up when the
network stack instance is created, and are initialized statically from
the reference copy.  Run-time access occurs via an accessor macro, which
converts from the current vnet and requested symbol to a per-vnet
address.  When "options VIMAGE" is not compiled into the kernel, normal
global ELF symbols will be used instead and indirection is avoided.

This change restores static initialization for network stack global
variables, restores support for non-global symbols and types, eliminates
the need for many subsystem constructors, eliminates large per-subsystem
structures that caused many binary compatibility issues both for
monitoring applications (netstat) and kernel modules, removes the
per-function INIT_VNET_*() macros throughout the stack, eliminates the
need for vnet_symmap ksym(2) munging, and eliminates duplicate
definitions of virtualized globals under VIMAGE_GLOBALS.

Bump __FreeBSD_version and update UPDATING.

Portions submitted by:  bz
Reviewed by:            bz, zec
Discussed with:         gnn, jamie, jeff, jhb, julian, sam
Suggested by:           peter
Approved by:            re (kensmith)
2009-07-14 22:48:30 +00:00
John Baldwin
0fe0ed8bf8 - Change mmap() to fail requests with EINVAL that pass a length of 0. This
behavior is mandated by POSIX.
- Do not fail requests that pass a length greater than SSIZE_MAX
  (such as > 2GB on 32-bit platforms).  The 'len' parameter is actually
  an unsigned 'size_t' so negative values don't really make sense.

Submitted by:	Alexander Best  alexbestms at math.uni-muenster.de
Reviewed by:	alc
Approved by:	re (kib)
MFC after:	1 week
2009-07-14 19:45:36 +00:00
Bjoern A. Zeeb
6c19585325 Re-add opt_inet.h, as we did in r193862 and lost yet again.
Approved by:	re (kib)
2009-07-14 19:32:36 +00:00
Alexander Motin
5487ee5507 Disable MSI by default for nVidia MCP55 chipset.
It is reported to be broken in the same way as MCP51.

PR:		kern/136429
Approved by:	re (kib)
2009-07-14 19:18:31 +00:00
Ariff Abdullah
df41a638f7 - Do aggresive saturation on various polynomial interpolators.
This dramatically pushing 99.9% interpolations and quantizations
  error _below_ -180dB on 32bit dynamic range, resulting extremely
  high quality conversion.
- Use BSPLINE interpolator for filter oversampling factor greater or
  equal than 64 (log2 6).

Approved by:	re (kib)
2009-07-14 18:53:34 +00:00
Ed Maste
c25ebae365 Change xpt_scan_bus to scsi_scan_bus and xpt_scan_lun to scsi_scan_lun
in comments and printfs to match new function names after refacoring.

Approved by:	re
2009-07-14 18:44:17 +00:00
Ed Maste
399831bce6 Fix leaks in probestart, probedone, and scsi_scan_bus. Also free
page_list using the matching malloc type for the allocation.

Approved by:	re
Reviewed by:	scottl [1]
MFC after:	1 week

[1] Original patch was against xpt_cam.c, prior to the cam refactoring.
2009-07-14 17:26:37 +00:00
Lawrence Stewart
5f1ff8136a Fix a buglet that slipped into r195654. My buildworld/buildkernel sanity
check missed this because cxgb's TOM is currently commented out of the build
system.

Submitted by:	Navdeep Parhar <np at FreeBSD dot org>
Approved by:	re (kensmith), kensmith (mentor temporarily unavailable)
2009-07-14 11:53:21 +00:00
Tai-hwa Liang
3d22427cff Adding hardware ID for RTL810x PCIe found on HP Pavilion DV2-1022AX.
Reviewed by:	yongari
Approved by:	re (kib, kensmith)
2009-07-14 04:35:13 +00:00
Jung-uk Kim
e8c4d3e407 Match PCI Express root bridge _HID directly instead of
relying on _CID.

Reviewed by:	jhb
Approved by:	re (kib)
2009-07-13 21:36:31 +00:00
Alexander Motin
3ccda2f312 Fix copy-paste bug, enabling SIM PMP support, when it was not really found.
Approved by:	re (implicitly)
2009-07-13 21:21:30 +00:00
Scott Long
de985f6174 Revert the CISS driver to 64K i/o, the previous change was in error and
missing a lot of needed infrastructure.

Approved by:	re
2009-07-13 20:19:29 +00:00
Rui Paulo
aec0289d59 Fix inline function declaration and prototype.
Approved by:	re (kensmith)
2009-07-13 18:23:58 +00:00
Alan Cox
bc5c48f12f Correct an error of omission in r195649 ("Add support to the virtual memory
system for configuring machine-dependent memory attributes: ...").  In
r195649, the "vm_cache_mode_t/vm_memattr_t" parameter was removed from
vm_phys_alloc_contig().

Approved by:	re (kensmith)
2009-07-13 18:11:59 +00:00
Alexander Motin
45a30a41d2 Fix Marvel SATA controllers operation, broken by rev. 188765,
by using uninitialized variable.

Tested by:	Chris Hedley
Approved by:	re (kensmith)
2009-07-13 18:01:49 +00:00
Lawrence Stewart
91a5ebde45 Fix a race in the manipulation of the V_tcp_sack_globalholes global variable,
which is currently not protected by any type of lock. When triggered, the bug
would sometimes cause a panic when the TCP activity to an affected machine
eventually slowed during a lull. The panic only occurs if INVARIANTS is compiled
into the kernel, and has laid dormant for some time as a result of INVARIANTS
being off by default except in FreeBSD-CURRENT.

Switch to atomic operations in the locations where the variable is changed.
Reads have not been updated to be protected by atomics, so there is a
possibility of accounting errors in any given calculation where the variable is
read. This is considered unlikely to occur in the wild, and will not cause
serious harm on rare occasions where it does.

Thanks to Robert Watson for debugging help.

Reported by:	Kamigishi Rei <spambox at haruhiism dot net>
Tested by:	Kamigishi Rei <spambox at haruhiism dot net>
Reviewed by:	silby
Approved by:	re (rwatson), kensmith (mentor temporarily unavailable)
2009-07-13 11:59:38 +00:00
Lawrence Stewart
237fbe0a1c Replace struct tcpopt with a proxy toeopt struct in the TOE driver interface to
the TCP syncache. This returns struct tcpopt to being private within the TCP
implementation, thus allowing it to be modified without ABI concerns.

The patch breaks the ABI. Bump __FreeBSD_version to 800103 accordingly. The cxgb
driver is the only TOE consumer affected by this change, and needs to be
recompiled along with the kernel.

Suggested by:	rwatson
Reviewed by:	rwatson, kmacy
Approved by:	re (kensmith), kensmith (mentor temporarily unavailable)
2009-07-13 11:51:02 +00:00
Alexander Motin
f09f8e3e0b Rename ATA probe driver to "aprobe" to resolve name conflict with SCSI
and fix loading cam as module.

Approved by:	re (implicitly)
2009-07-13 06:12:21 +00:00
Alan Cox
3153e878dd Add support to the virtual memory system for configuring machine-
dependent memory attributes:

Rename vm_cache_mode_t to vm_memattr_t.  The new name reflects the
fact that there are machine-dependent memory attributes that have
nothing to do with controlling the cache's behavior.

Introduce vm_object_set_memattr() for setting the default memory
attributes that will be given to an object's pages.

Introduce and use pmap_page_{get,set}_memattr() for getting and
setting a page's machine-dependent memory attributes.  Add full
support for these functions on amd64 and i386 and stubs for them on
the other architectures.  The function pmap_page_set_memattr() is also
responsible for any other machine-dependent aspects of changing a
page's memory attributes, such as flushing the cache or updating the
direct map.  The uses include kmem_alloc_contig(), vm_page_alloc(),
and the device pager:

  kmem_alloc_contig() can now be used to allocate kernel memory with
  non-default memory attributes on amd64 and i386.

  vm_page_alloc() and the device pager will set the memory attributes
  for the real or fictitious page according to the object's default
  memory attributes.

Update the various pmap functions on amd64 and i386 that map pages to
incorporate each page's memory attributes in the mapping.

Notes: (1) Inherent to this design are safety features that prevent
the specification of inconsistent memory attributes by different
mappings on amd64 and i386.  In addition, the device pager provides a
warning when a device driver creates a fictitious page with memory
attributes that are inconsistent with the real page that the
fictitious page is an alias for. (2) Storing the machine-dependent
memory attributes for amd64 and i386 as a dedicated "int" in "struct
md_page" represents a compromise between space efficiency and the ease
of MFCing these changes to RELENG_7.

In collaboration with: jhb

Approved by:	re (kib)
2009-07-12 23:31:20 +00:00
Qing Li
05b262e264 This patch adds a host route to an interface address (that is assigned
to a non loopback/ppp link type) through the loopback interface. Prior
to the new L2/L3 rewrite, this host route was explicitly created when
processing the IPv6 address assignment. This loopback host route is
deleted when that IPv6 address is removed from the interface.

Reviewed by:	bz, gnn
Approved by:	re
2009-07-12 19:20:55 +00:00
Rick Macklem
089f366ab0 Add calls to the experimental nfs client for the case of an "intr" mount,
so that signals that aren't supposed to terminate RPCs in progress are
masked off during the RPC.

Approved by:	re (kensmith), kib (mentor)
2009-07-12 17:07:35 +00:00
Rick Macklem
ad86aef9af Fix the handling of dotdot in lookup for the experimental nfs client
in a manner analagous to the change in r195294 for the regular nfs client.

Approved by:	re (kensmith), kib (mentor)
2009-07-12 17:02:17 +00:00
Marcel Moolenaar
c5e2d1885c Isochronous transfers only have 1 frame buffer, but multiple
frame lengths. The frame buffer is at index 0.

Approved by:	re (kensmith)
Obtained from:	HPS
2009-07-12 16:50:32 +00:00
Marcel Moolenaar
58f6d27320 MFp4:
USB CORE: busdma improvement

      For single segment allocations the boundary field
      of the BUSDMA tag should be zero. Currently all
      single segment allocations are less than or equal
      to 4096 bytes, so the limit does not kick in. If
      any single segment USB allocations would be greater
      than 4K, then it would be a problem.

Approved by:	re (kensmith)
Obtained from:	HPS
2009-07-12 16:46:43 +00:00
Konstantin Belousov
529ab57b9a When VM_MAP_WIRE_HOLESOK is not specified and vm_map_wire(9) encounters
non-readable and non-executable map entry, the entry is skipped from
wiring and loop is aborted. But, since MAP_ENTRY_WIRE_SKIPPED was not
set for the map entry, its wired_count is later erronously decremented.
vm_map_delete(9) for such map entry stuck in "vmmaps".

Properly set MAP_ENTRY_WIRE_SKIPPED when aborting the loop.

Reported by:	John Marshall <john.marshall riverwillow com au>
Approved by:	re (kensmith)
2009-07-12 12:37:38 +00:00
Lawrence Stewart
962ebef8c0 Pad the following TCP related structs to allow MFCs of upcoming features/fixes
back to the 8 branch:

tcp_var.h
- struct sackhint
- struct tcpcb
- struct tcpstat

The patch breaks the ABI. Bump __FreeBSD_version to 800102 accordingly. User
space tools that rely on the size of any of these structs (e.g. sockstat) need
to be recompiled.

Reviewed by:	rpaulo, sam, andre, rwatson
Approved by:	re & mentor (gnn)
2009-07-12 09:14:28 +00:00
Marcel Moolenaar
31d22003dd Rename option USBVERBOSE to USB_VERBOSE for 2 reasons:
1.  USB_VERBOSE is more consistent with USB_DEBUG,
2.  sys/dev/usb/usb_device.c uses option USB_VERBOSE and
    not USBVERBOSE.

POLA with the USBVERBOSE option as it's found in 7-STABLE
has been considered but found insignificant in the face
of the USB stack overhaul.

Approved by:	re (kensmith)
2009-07-12 04:48:47 +00:00
Nathan Whitehorn
f136543e41 Increase the size of the page table on 64-bit PowerPC machines as a
bandaid to prevent exhaustion of the primary and secondary hash groups
in the event of extreme stress on the PMAP layer (e.g. a forkbomb). This
wastes memory, and should be revised to properly handle PTEG spills instead.

Suggested by:	grehan
Approved by:	re (kensmith)
2009-07-12 04:07:52 +00:00
Marcel Moolenaar
b54b764c72 Revert rev 192323 (nfs_common.c only):
The D-cache flushing added here was to deal with I-cache
incoherency observed on ia64. However, the problem was
in the implementation of pmap_enter_object() for ia64:
it was missing I-cache coherency logic for prefaulted
pages. After this got added in rev 195625, testing showed
that no D-cache flushing was required.

The SIGILL that was observed on Book-E (see commit log
for rev 192323) ended up not being related to I-cache
incoherency, but was found to be caused by bad memory.
This discovery further undermined the need for D-cache
flushing in the NFS I/O code, triggering the reversal.

Approved by:	re (kensmith)
2009-07-12 03:53:52 +00:00
Marcel Moolenaar
dd8a461e83 In nvpair_native_embedded_array(), meaningless pointers are zeroed.
The programmer was aware that alignment was not guaranteed in the
packed structure and used bzero() to NULL out the pointers.
However, on ia64, the compiler is quite agressive in finding ILP
and calls to bzero() are often replaced by simple assignments (i.e.
stores). Especially when the width or size in question corresponds
with a store instruction (i.e. st1, st2, st4 or st8).

The problem here is not a compiler bug. The address of the memory
to zero-out was given by '&packed->nvl_priv' and given the type of
the 'packed' pointer the compiler could assume proper alignment for
the replacement of bzero() with an 8-byte wide store to be valid.
The problem is with the programmer. The programmer knew that the
address did not have the alignment guarantees needed for a regular
assignment, but failed to inform the compiler of that fact. In
fact, the programmer told the compiler the opposite: alignment is
guaranteed.

The fix is to avoid using a pointer of type "nvlist_t *" and
instead use a "char *" pointer as the basis for calculating the
address. This tells the compiler that only 1-byte alignment can
be assumed and the compiler will either keep the bzero() call
or instead replace it with a sequence of byte-wise stores. Both
are valid.

Approved by:	re (kib)
2009-07-11 22:43:20 +00:00
Colin Percival
7d845dde8d Remove build timestamps from the following files:
/boot/kernel/hptrr.ko
/etc/mail/*.cf
/lib/libcrypto.so.5
/usr/bin/ntpq
/usr/sbin/amd
/usr/sbin/iasl
/usr/sbin/ntpd
/usr/sbin/ntpdate
/usr/sbin/ntpdc

There does not appear to be any purpose to having these timestamps, and
they have the irritating consequence that the aforementioned files will
be different every time they are rebuilt.

After this commit, the only remaining build timestamps are in the kernel,
the boot loaders, /usr/include/osreldate.h (the year in the copyright
notice), and lib*.a (the timestamps on all of the included .o files).

Reviewed by:	scottl (hptrr), gshapiro (sendmail), simon (openssl),
		roberto (ntp), jkim (acpica)
Approved by:	re (kib)
2009-07-11 22:30:37 +00:00