Commit Graph

183602 Commits

Author SHA1 Message Date
Jung-uk Kim
7311dad7ee 'u_long' is consistently spelled 'unsigned long' in this file. Fix it. 2013-08-29 23:09:34 +00:00
Jung-uk Kim
346c9ecee8 Partially revert r254880. The bitmap operations actually use long type now. 2013-08-29 22:46:21 +00:00
Kenneth D. Merry
ee5bd4fc5a Bump up the default timeouts for move commands in the ch(4) driver
to 15 minutes, and 5 minutes for things like READ ELEMENT STATUS.

This is needed to account for the worst case scenarios on at least
some Spectra Logic tape libraries.

Sponsored by:	Spectra Logic
MFC after:	3 days
2013-08-29 21:25:27 +00:00
Jung-uk Kim
33e00c6789 Fix the incomplete conversion from atomic_t to long for test_bit(). 2013-08-29 20:51:12 +00:00
Jung-uk Kim
9dcd4b1293 Clarify confusions between atomic_t and bitmap. Fix bitmap operations
accordingly.
2013-08-29 20:40:45 +00:00
Justin T. Gibbs
76acc41fb7 Implement vector callback for PVHVM and unify event channel implementations
Re-structure Xen HVM support so that:
	- Xen is detected and hypercalls can be performed very
	  early in system startup.
	- Xen interrupt services are implemented using FreeBSD's native
	  interrupt delivery infrastructure.
	- the Xen interrupt service implementation is shared between PV
	  and HVM guests.
	- Xen interrupt handlers can optionally use a filter handler
	  in order to avoid the overhead of dispatch to an interrupt
	  thread.
	- interrupt load can be distributed among all available CPUs.
	- the overhead of accessing the emulated local and I/O apics
	  on HVM is removed for event channel port events.
	- a similar optimization can eventually, and fairly easily,
	  be used to optimize MSI.

Early Xen detection, HVM refactoring, PVHVM interrupt infrastructure,
and misc Xen cleanups:

Sponsored by: Spectra Logic Corporation

Unification of PV & HVM interrupt infrastructure, bug fixes,
and misc Xen cleanups:

Submitted by: Roger Pau Monné
Sponsored by: Citrix Systems R&D

sys/x86/x86/local_apic.c:
sys/amd64/include/apicvar.h:
sys/i386/include/apicvar.h:
sys/amd64/amd64/apic_vector.S:
sys/i386/i386/apic_vector.s:
sys/amd64/amd64/machdep.c:
sys/i386/i386/machdep.c:
sys/i386/xen/exception.s:
sys/x86/include/segments.h:
	Reserve IDT vector 0x93 for the Xen event channel upcall
	interrupt handler.  On Hypervisors that support the direct
	vector callback feature, we can request that this vector be
	called directly by an injected HVM interrupt event, instead
	of a simulated PCI interrupt on the Xen platform PCI device.
	This avoids all of the overhead of dealing with the emulated
	I/O APIC and local APIC.  It also means that the Hypervisor
	can inject these events on any CPU, allowing upcalls for
	different ports to be handled in parallel.

sys/amd64/amd64/mp_machdep.c:
sys/i386/i386/mp_machdep.c:
	Map Xen per-vcpu area during AP startup.

sys/amd64/include/intr_machdep.h:
sys/i386/include/intr_machdep.h:
	Increase the FreeBSD IRQ vector table to include space
	for event channel interrupt sources.

sys/amd64/include/pcpu.h:
sys/i386/include/pcpu.h:
	Remove Xen HVM per-cpu variable data.  These fields are now
	allocated via the dynamic per-cpu scheme.  See xen_intr.c
	for details.

sys/amd64/include/xen/hypercall.h:
sys/dev/xen/blkback/blkback.c:
sys/i386/include/xen/xenvar.h:
sys/i386/xen/clock.c:
sys/i386/xen/xen_machdep.c:
sys/xen/gnttab.c:
	Prefer FreeBSD primatives to Linux ones in Xen support code.

sys/amd64/include/xen/xen-os.h:
sys/i386/include/xen/xen-os.h:
sys/xen/xen-os.h:
sys/dev/xen/balloon/balloon.c:
sys/dev/xen/blkback/blkback.c:
sys/dev/xen/blkfront/blkfront.c:
sys/dev/xen/console/xencons_ring.c:
sys/dev/xen/control/control.c:
sys/dev/xen/netback/netback.c:
sys/dev/xen/netfront/netfront.c:
sys/dev/xen/xenpci/xenpci.c:
sys/i386/i386/machdep.c:
sys/i386/include/pmap.h:
sys/i386/include/xen/xenfunc.h:
sys/i386/isa/npx.c:
sys/i386/xen/clock.c:
sys/i386/xen/mp_machdep.c:
sys/i386/xen/mptable.c:
sys/i386/xen/xen_clock_util.c:
sys/i386/xen/xen_machdep.c:
sys/i386/xen/xen_rtc.c:
sys/xen/evtchn/evtchn_dev.c:
sys/xen/features.c:
sys/xen/gnttab.c:
sys/xen/gnttab.h:
sys/xen/hvm.h:
sys/xen/xenbus/xenbus.c:
sys/xen/xenbus/xenbus_if.m:
sys/xen/xenbus/xenbusb_front.c:
sys/xen/xenbus/xenbusvar.h:
sys/xen/xenstore/xenstore.c:
sys/xen/xenstore/xenstore_dev.c:
sys/xen/xenstore/xenstorevar.h:
	Pull common Xen OS support functions/settings into xen/xen-os.h.

sys/amd64/include/xen/xen-os.h:
sys/i386/include/xen/xen-os.h:
sys/xen/xen-os.h:
	Remove constants, macros, and functions unused in FreeBSD's Xen
	support.

sys/xen/xen-os.h:
sys/i386/xen/xen_machdep.c:
sys/x86/xen/hvm.c:
	Introduce new functions xen_domain(), xen_pv_domain(), and
	xen_hvm_domain().  These are used in favor of #ifdefs so that
	FreeBSD can dynamically detect and adapt to the presence of
	a hypervisor.  The goal is to have an HVM optimized GENERIC,
	but more is necessary before this is possible.

sys/amd64/amd64/machdep.c:
sys/dev/xen/xenpci/xenpcivar.h:
sys/dev/xen/xenpci/xenpci.c:
sys/x86/xen/hvm.c:
sys/sys/kernel.h:
	Refactor magic ioport, Hypercall table and Hypervisor shared
	information page setup, and move it to a dedicated HVM support
	module.

	HVM mode initialization is now triggered during the
	SI_SUB_HYPERVISOR phase of system startup.  This currently
	occurs just after the kernel VM is fully setup which is
	just enough infrastructure to allow the hypercall table
	and shared info page to be properly mapped.

sys/xen/hvm.h:
sys/x86/xen/hvm.c:
	Add definitions and a method for configuring Hypervisor event
	delievery via a direct vector callback.

sys/amd64/include/xen/xen-os.h:
sys/x86/xen/hvm.c:

sys/conf/files:
sys/conf/files.amd64:
sys/conf/files.i386:
	Adjust kernel build to reflect the refactoring of early
	Xen startup code and Xen interrupt services.

sys/dev/xen/blkback/blkback.c:
sys/dev/xen/blkfront/blkfront.c:
sys/dev/xen/blkfront/block.h:
sys/dev/xen/control/control.c:
sys/dev/xen/evtchn/evtchn_dev.c:
sys/dev/xen/netback/netback.c:
sys/dev/xen/netfront/netfront.c:
sys/xen/xenstore/xenstore.c:
sys/xen/evtchn/evtchn_dev.c:
sys/dev/xen/console/console.c:
sys/dev/xen/console/xencons_ring.c
	Adjust drivers to use new xen_intr_*() API.

sys/dev/xen/blkback/blkback.c:
	Since blkback defers all event handling to a taskqueue,
	convert this task queue to a "fast" taskqueue, and schedule
	it via an interrupt filter.  This avoids an unnecessary
	ithread context switch.

sys/xen/xenstore/xenstore.c:
	The xenstore driver is MPSAFE.  Indicate as much when
	registering its interrupt handler.

sys/xen/xenbus/xenbus.c:
sys/xen/xenbus/xenbusvar.h:
	Remove unused event channel APIs.

sys/xen/evtchn.h:
	Remove all kernel Xen interrupt service API definitions
	from this file.  It is now only used for structure and
	ioctl definitions related to the event channel userland
	device driver.

	Update the definitions in this file to match those from
	NetBSD.  Implementing this interface will be necessary for
	Dom0 support.

sys/xen/evtchn/evtchnvar.h:
	Add a header file for implemenation internal APIs related
	to managing event channels event delivery.  This is used
	to allow, for example, the event channel userland device
	driver to access low-level routines that typical kernel
	consumers of event channel services should never access.

sys/xen/interface/event_channel.h:
sys/xen/xen_intr.h:
	Standardize on the evtchn_port_t type for referring to
	an event channel port id.  In order to prevent low-level
	event channel APIs from leaking to kernel consumers who
	should not have access to this data, the type is defined
	twice: Once in the Xen provided event_channel.h, and again
	in xen/xen_intr.h.  The double declaration is protected by
	__XEN_EVTCHN_PORT_DEFINED__ to ensure it is never declared
	twice within a given compilation unit.

sys/xen/xen_intr.h:
sys/xen/evtchn/evtchn.c:
sys/x86/xen/xen_intr.c:
sys/dev/xen/xenpci/evtchn.c:
sys/dev/xen/xenpci/xenpcivar.h:
	New implementation of Xen interrupt services.  This is
	similar in many respects to the i386 PV implementation with
	the exception that events for bound to event channel ports
	(i.e. not IPI, virtual IRQ, or physical IRQ) are further
	optimized to avoid mask/unmask operations that aren't
	necessary for these edge triggered events.

	Stubs exist for supporting physical IRQ binding, but will
	need additional work before this implementation can be
	fully shared between PV and HVM.

sys/amd64/amd64/mp_machdep.c:
sys/i386/i386/mp_machdep.c:
sys/i386/xen/mp_machdep.c
sys/x86/xen/hvm.c:
	Add support for placing vcpu_info into an arbritary memory
	page instead of using HYPERVISOR_shared_info->vcpu_info.
	This allows the creation of domains with more than 32 vcpus.

sys/i386/i386/machdep.c:
sys/i386/xen/clock.c:
sys/i386/xen/xen_machdep.c:
sys/i386/xen/exception.s:
	Add support for new event channle implementation.
2013-08-29 19:52:18 +00:00
Jung-uk Kim
ea4447500d - Remove test_and_set_bit() macro. It is unused since r255037.
- Relax atomic_read() and atomic_set() macros.  Linux does not require any
memory barrier.  Also, these macros may be even reordered or optimized away
according to the API documentation:

https://www.kernel.org/doc/Documentation/atomic_ops.txt
2013-08-29 19:47:52 +00:00
Adrian Chadd
310915a45a Convert the if_lagg rwlock to an rmlock.
We've been seeing lots of cache line contention (but not lock contention!)
in our workloads between the various TX and RX threads going on.

The write lock is only grabbed when configuration changes are made - which
are infrequent.

With this patch, the contention and cycles spent waiting for updates
disappear.

Sponsored by:	Netflix, Inc.
2013-08-29 19:35:14 +00:00
Jung-uk Kim
0289d87d9c Fix atomic operations on context_flag without altering semantics. 2013-08-29 18:36:47 +00:00
Xin LI
4b0f6e5b60 Add directories that is installed as part of bsdconfig.
These are included unconditionally for now because bsdconfig
is currently installed unconditionally.

This fixes 'make -j 17 installworld' caused by a race
condition.

MFC candidate.
2013-08-29 17:45:13 +00:00
Xin LI
0cfb18af64 Add a few missing language directories for /usr. 2013-08-29 17:40:03 +00:00
Ed Maste
f357c00bf7 Update to 2013-08-29 NetBSD libexecinfo snapshot
This adds my patch to use the kern.proc.pathname sysctl instead of
relying on procfs(5).
2013-08-29 16:57:55 +00:00
Kenneth D. Merry
880e57b635 Fix some issues in change 254760 pointed out by Bruce Evans:
- Remove excessive parenthesis
 - Use KNF continuation indentation
 - Cut down on excessive continuation lines
 - More consistent style in messages
 - Use uprintf() instead of printf()

Submitted by:	bde
2013-08-29 16:41:40 +00:00
Marcel Moolenaar
4fc4997535 Work-around a timing problem with the ITE IT8513E now that the core
calls ns8250_bus_ipend() almost immediately after ns8250_bus_attach().
As it appears, a line break condition is being signalled for almost
all received characters due to this. A delay of 150ms seems enough
to allow the H/W to settle and to avoid the problem.
More analysis is needed, but for now a regression has been addressed.

Reported by: kevlo@
Tested by: kevlo@
2013-08-29 16:26:04 +00:00
John Baldwin
e289e9f2ca Don't return an error for socket timeouts that are too large. Just
cap them to INT_MAX ticks instead.

PR:		kern/181416 (r254699 really)
Requested by:	bde
MFC after:	2 weeks
2013-08-29 15:59:05 +00:00
Antoine Brodin
b9aa88b0c1 Fix after r255014 2013-08-29 15:58:20 +00:00
Alan Cox
51321f7c31 Significantly reduce the cost, i.e., run time, of calls to madvise(...,
MADV_DONTNEED) and madvise(..., MADV_FREE).  Specifically, introduce a new
pmap function, pmap_advise(), that operates on a range of virtual addresses
within the specified pmap, allowing for a more efficient implementation of
MADV_DONTNEED and MADV_FREE.  Previously, the implementation of
MADV_DONTNEED and MADV_FREE relied on per-page pmap operations, such as
pmap_clear_reference().  Intuitively, the problem with this implementation
is that the pmap-level locks are acquired and released and the page table
traversed repeatedly, once for each resident page in the range
that was specified to madvise(2).  A more subtle flaw with the previous
implementation is that pmap_clear_reference() would clear the reference bit
on all mappings to the specified page, not just the mapping in the range
specified to madvise(2).

Since our malloc(3) makes heavy use of madvise(2), this change can have a
measureable impact.  For example, the system time for completing a parallel
"buildworld" on a 6-core amd64 machine was reduced by about 1.5% to 2.0%.

Note: This change only contains pmap_advise() implementations for a subset
of our supported architectures.  I will commit implementations for the
remaining architectures after further testing.  For now, a stub function is
sufficient because of the advisory nature of pmap_advise().

Discussed with: jeff, jhb, kib
Tested by:      pho (i386), marcel (ia64)
Sponsored by:   EMC / Isilon Storage Division
2013-08-29 15:49:05 +00:00
Ed Maste
06d02944c7 Vendor import of NetBSD's libexecinfo at 2013-08-29 2013-08-29 15:20:12 +00:00
Adrian Chadd
e733e239ee Migrate iwn(4) to use the new ieee80211_tx_complete() API.
Tested:

* Intel 5100, STA mode
2013-08-29 13:56:44 +00:00
Adrian Chadd
808d6d430f Remove the duplicate LLC_MISS event and put it in the right order. 2013-08-29 13:52:51 +00:00
Luiz Otavio O Souza
973bf10594 Prevent the full restart cycle every time arge_start() is called. Only
(re)start the interface when it is down.  This change fix a race with
BOOTP where the response packet is lost because the interface is being
reset by a netmask change right after send the packet.

PR:		178318
Approved by:	adrian (mentor)
2013-08-29 12:48:12 +00:00
Andreas Tobler
e9fb2ea069 Remove GNU_PATCH leftover. 2013-08-29 11:40:45 +00:00
Navdeep Parhar
91d1b03d36 Merge r254736 from user/np/cxl_tuning.
Don't leak tags when M_NOFREE | M_PKTHDR mbufs are freed.

Reviewed by:	andre
2013-08-29 08:07:35 +00:00
Navdeep Parhar
480e603c79 Merge r254386 from user/np/cxl_tuning. Add an INET|INET6 check missing
in said revision.

r254386:
Flush inactive LRO entries periodically.
2013-08-29 06:26:22 +00:00
Pedro F. Giffuni
4b97b38825 Drop build option switch for the older GNU patch.
As promised, drop the option to make the older GNU patch
the default.

GNU patch is still being built but something drastic may
happen to it to it before Release.
2013-08-29 00:38:24 +00:00
Jung-uk Kim
8e0caa60f2 Correct atomic operations in i915. 2013-08-28 23:59:38 +00:00
Jung-uk Kim
6ac96a7e7e Fix a compiler warning and add couple of VM map types. 2013-08-28 23:43:28 +00:00
Navdeep Parhar
c93d567412 Whitespace nit. 2013-08-28 23:15:05 +00:00
Navdeep Parhar
7127e6acf0 Merge r254336 from user/np/cxl_tuning.
Add a last-modified timestamp to each LRO entry and provide an interface
to flush all inactive entries.  Drivers decide when to flush and what
the inactivity threshold should be.

Network drivers that process an rx queue to completion can enter a
livelock type situation when the rate at which packets are received
reaches equilibrium with the rate at which the rx thread is processing
them.  When this happens the final LRO flush (normally when the rx
routine is done) does not occur.  Pure ACKs and segments with total
payload < 64K can get stuck in an LRO entry.  Symptoms are that TCP
tx-mostly connections' performance falls off a cliff during heavy,
unrelated rx on the interface.

Flushing only inactive LRO entries works better than any of these
alternates that I tried:
- don't LRO pure ACKs
- flush _all_ LRO entries periodically (every 'x' microseconds or every
  'y' descriptors)
- stop rx processing in the driver periodically and schedule remaining
  work for later.

Reviewed by:	andre
2013-08-28 23:00:34 +00:00
Jung-uk Kim
2d0196c405 Fix a compiler warning. With this fix, a negative time can be converted to
a struct timeval and back to the original nanoseconds correctly.
2013-08-28 22:57:49 +00:00
Kenneth D. Merry
3b5f179d2a Support storing 7 additional file flags in tmpfs:
UF_SYSTEM, UF_SPARSE, UF_OFFLINE, UF_REPARSE, UF_ARCHIVE, UF_READONLY,
and UF_HIDDEN.

Sort the file flags tmpfs supports alphabetically.  tmpfs now
supports the same flags as UFS, with the exception of SF_SNAPSHOT.

Reported by:	bdrewery, antoine
Sponsored by:	Spectra Logic
2013-08-28 22:12:56 +00:00
Jilles Tjoelker
d1d4d95209 libutil: Use O_CLOEXEC for internal file descriptors from open(). 2013-08-28 21:10:37 +00:00
Navdeep Parhar
319a31ea18 Change t4_list_lock and t4_uld_list_lock from mutexes to sx'es.
- tom_uninit had to be reworked not to hold the adapter lock (a mutex)
  around t4_deactivate_uld, which acquires the uld_list_lock.
- the ifc_match for the interface cloner that creates the tracer ifnet
  had to be reworked as the kernel calls ifc_match with the global
  if_cloners_mtx held.
2013-08-28 20:59:22 +00:00
Navdeep Parhar
9800517691 Add hooks in base cxgbe(4) for the iWARP upper-layer driver. Update a
couple of assertions in the TOE driver as well.
2013-08-28 20:45:45 +00:00
Jung-uk Kim
10f1472e41 Reduce diff against stable/9 slightly. 2013-08-28 20:10:56 +00:00
David C Somayajulu
67ffef6b9e ql_minidump() should be performed only by port 0.
Submitted by: David C Somayajulu
2013-08-28 20:07:00 +00:00
Robert Watson
7b223d2286 Xref capsicum(4) and procdesc(4) from pdfork(2).
Suggested by:	sbruno
MFC after:	3 days
2013-08-28 20:00:25 +00:00
Robert Watson
5ea1c4a2df Add a simple procdesc(4) man page describing "options PROCDESC" and the
high-level facility, supplementing pdfork(2) and friends.  Update capsicum.4
to xref.

Suggested by:	sbruno
MFC after:	3 days
2013-08-28 19:49:32 +00:00
Jung-uk Kim
663593f19f Do not save/restore video memory if we are not using linear frame buffer.
Note this partially revert r233896.
2013-08-28 19:06:22 +00:00
Jung-uk Kim
b1bba126f0 Make sure to free stale buffer before allocating new one for safety. 2013-08-28 18:13:37 +00:00
Jung-uk Kim
7afb475f63 Avoid unnecessary signedness conversion. 2013-08-28 17:58:30 +00:00
Kirk McKusick
2ce451f089 In looking at block layouts as part of fixing filesystem block
allocations under low free-space conditions (-r254995), determine
that old block-preference search order used before -r249782 worked
a bit better. This change reverts to that block-preference search order.

MFC after:	2 weeks
2013-08-28 17:46:32 +00:00
Kirk McKusick
28702816d8 A performance problem was reported in PR kern/181226:
I have 25TB Dell PERC 6 RAID5 array. When it becomes almost
    full (10-20GB free), processes which write data to it start
    eating 100% CPU and write speed drops below 1MB/sec (normally
    to gives 400MB/sec). The revision at which it first became
    apparent was http://svnweb.freebsd.org/changeset/base/249782.

The offending change reserved an area in each cylinder group to
store metadata. The new algorithm attempts to save this area for
metadata and allows its use for non-metadata only after all the
data areas have been exhausted. The size of the reserved area
defaults to half of minfree, so the filesystem reports full before
the data area can completely fill. However, in this report, the
filesystem has had minfree reduced to 1% thus forcing the metadata
area to be used for data. As the filesystem approached full, it
had only metadata areas left to allocate. The result was that
every block allocation had to scan summary data for 30,000 cylinder
groups before falling back to searching up to 30,000 metadata areas.

The fix is to give up on saving the metadata areas once the free
space reserve drops below 2%. The effect of this change is to use
the old algorithm of just accepting the first available block that
we find. Since most filesystems use the default 5% minfree, this
will have no effect on their operation. For those that want to push
to the limit, they will get their crappy block placements quickly.

Submitted by:  Dmitry Sivachenko
Fix Tested by: Dmitry Sivachenko
PR:            kern/181226
MFC after:     2 weeks
2013-08-28 17:38:05 +00:00
Steve Kargl
e826e6be2b * Whitespace. 2013-08-28 16:59:55 +00:00
George V. Neville-Neil
06e8e46410 Add firmware for Centrino 2200-N wireless devices.
Driver software for this firmware will be updated in a following commit.
2013-08-28 15:12:51 +00:00
Gavin Atkinson
9e9a57a307 After writing a kernel core dump into /var/crash, call sync(8).
If we panic again shortly after boot (say, within 30 seconds), any core
dump we wrote out may be lost on reboot.  In this situation, we really
want to keep that core file, as it may be the only way to have the issue
resolved.  Call sync(8) after writing out the core file and running
crashinfo(8), in the hope that these will not be lost if we panic
again.  sync(8) is only called in the case where there is a core dump
to be written out, so won't be called during normal boots.

Discovered by:	Trying to debug an IPSEC panic
MFC after:	1 week
2013-08-28 15:12:15 +00:00
Luiz Otavio O Souza
c6905524b4 Fix a few typos for s25fl types.
Approved by:	adrian (mentor)
2013-08-28 14:49:36 +00:00
Luiz Otavio O Souza
b3cb2c4a93 Make ar71xx_spi attach the next free unit of spibus and not only spibus0.
Approved by:	adrian (mentor)
2013-08-28 14:46:15 +00:00
Luiz Otavio O Souza
2557c4beb1 Add the default hints to make the GPIO pins, rf led and reset switch work
out of the box on RouterStation.

PR:	177832
Submitted by:	Petko Bordjukov (bordjukov@gmail.com)
Approved by:	adrian (mentor)
2013-08-28 14:43:04 +00:00
Luiz Otavio O Souza
72405c6be0 Properly free gpiobus ivars when gpiobus_parse_pins() fails and also on
gpiobus detachment.

Suggested by:	imp
Approved by:	adrian (mentor)
2013-08-28 14:39:24 +00:00