Commit Graph

1809 Commits

Author SHA1 Message Date
David Christensen
4e4007688c Substantial rewrite of bxe(4) to add support for the BCM57712 and
BCM578XX controllers.

Approved by:	re
MFC after:	4 weeks
2013-09-20 20:18:49 +00:00
Edward Tomasz Napierala
009ea47eb2 Bring in the new iSCSI target and initiator.
Reviewed by:	ken (parts)
Approved by:	re (delphij)
Sponsored by:	FreeBSD Foundation
2013-09-14 15:29:06 +00:00
Mark Murray
9d32fc31c7 MFC 2013-09-07 07:58:29 +00:00
Cy Schubert
bfc88dcbf7 Update ipfilter 4.1.28 --> 5.1.2.
Approved by:		glebius (mentor)
BSD Licensed by:	Darren Reed <darrenr@reed.wattle.id.au> (author)
2013-09-06 23:11:19 +00:00
Mark Murray
0fbf163e60 MFC 2013-09-06 17:42:12 +00:00
Pawel Jakub Dawidek
7008be5bd7 Change the cap_rights_t type from uint64_t to a structure that we can extend
in the future in a backward compatible (API and ABI) way.

The cap_rights_t represents capability rights. We used to use one bit to
represent one right, but we are running out of spare bits. Currently the new
structure provides place for 114 rights (so 50 more than the previous
cap_rights_t), but it is possible to grow the structure to hold at least 285
rights, although we can make it even larger if 285 rights won't be enough.

The structure definition looks like this:

	struct cap_rights {
		uint64_t	cr_rights[CAP_RIGHTS_VERSION + 2];
	};

The initial CAP_RIGHTS_VERSION is 0.

The top two bits in the first element of the cr_rights[] array contain total
number of elements in the array - 2. This means if those two bits are equal to
0, we have 2 array elements.

The top two bits in all remaining array elements should be 0.
The next five bits in all array elements contain array index. Only one bit is
used and bit position in this five-bits range defines array index. This means
there can be at most five array elements in the future.

To define new right the CAPRIGHT() macro must be used. The macro takes two
arguments - an array index and a bit to set, eg.

	#define	CAP_PDKILL	CAPRIGHT(1, 0x0000000000000800ULL)

We still support aliases that combine few rights, but the rights have to belong
to the same array element, eg:

	#define	CAP_LOOKUP	CAPRIGHT(0, 0x0000000000000400ULL)
	#define	CAP_FCHMOD	CAPRIGHT(0, 0x0000000000002000ULL)

	#define	CAP_FCHMODAT	(CAP_FCHMOD | CAP_LOOKUP)

There is new API to manage the new cap_rights_t structure:

	cap_rights_t *cap_rights_init(cap_rights_t *rights, ...);
	void cap_rights_set(cap_rights_t *rights, ...);
	void cap_rights_clear(cap_rights_t *rights, ...);
	bool cap_rights_is_set(const cap_rights_t *rights, ...);

	bool cap_rights_is_valid(const cap_rights_t *rights);
	void cap_rights_merge(cap_rights_t *dst, const cap_rights_t *src);
	void cap_rights_remove(cap_rights_t *dst, const cap_rights_t *src);
	bool cap_rights_contains(const cap_rights_t *big, const cap_rights_t *little);

Capability rights to the cap_rights_init(), cap_rights_set(),
cap_rights_clear() and cap_rights_is_set() functions are provided by
separating them with commas, eg:

	cap_rights_t rights;

	cap_rights_init(&rights, CAP_READ, CAP_WRITE, CAP_FSTAT);

There is no need to terminate the list of rights, as those functions are
actually macros that take care of the termination, eg:

	#define	cap_rights_set(rights, ...)				\
		__cap_rights_set((rights), __VA_ARGS__, 0ULL)
	void __cap_rights_set(cap_rights_t *rights, ...);

Thanks to using one bit as an array index we can assert in those functions that
there are no two rights belonging to different array elements provided
together. For example this is illegal and will be detected, because CAP_LOOKUP
belongs to element 0 and CAP_PDKILL to element 1:

	cap_rights_init(&rights, CAP_LOOKUP | CAP_PDKILL);

Providing several rights that belongs to the same array's element this way is
correct, but is not advised. It should only be used for aliases definition.

This commit also breaks compatibility with some existing Capsicum system calls,
but I see no other way to do that. This should be fine as Capsicum is still
experimental and this change is not going to 9.x.

Sponsored by:	The FreeBSD Foundation
2013-09-05 00:09:56 +00:00
Mark Murray
77de2c3f58 Separate out the Software RNG entropy harvesting queue and thread into its own files.
Submitted by:	 Arthur Mesh <arthurmesh@gmail.com>
2013-08-30 11:42:57 +00:00
Mark Murray
f27c28dc6e MFC 2013-08-30 11:38:34 +00:00
Justin T. Gibbs
9f40021f28 Introduce a new, HVM compatible, paravirtualized timer driver for Xen.
Use this new driver for both PV and HVM instances.

This driver requires a Xen hypervisor that supports vector callbacks,
VCPUOP hypercalls, and reports that it has a "safe PV clock".

New timer driver:
Submitted by: will
Sponsored by: Spectra Logic Corporation

PV port to new driver, and bug fixes:
Submitted by: Roger Pau Monné
Sponsored by: Citrix Systems R&D

sys/dev/xen/timer/timer.c:
	- Register a PV timer device driver which (currently)
	  implements device_{identify,probe,attach} and stubs
	  device_detach.  The detach routine requires functionality
	  not provided by timecounters(4).  The suspend and resume
	  routines need additional work (due to Xen requiring that
	  the hypercalls be executed on the target VCPU), and aren't
	  needed for our purposes.

	- Make sure there can only be one device instance of this
	  driver, and that it only registers one eventtimers(4) and
	  one timecounters(4) device interface.  Make both interfaces
	  use PCPU data as needed.

	- Match, with a few style cleanups & API differences, the
	  Xen versions of the "fetch time" functions.

	- Document the magic scale_delta() better for the i386 version.

	- When registering the event timer, bind a separate event
	  channel for the timer VIRQ to the device's event timer
	  interrupt handler for each active VCPU.  Describe each
	  interrupt as "xen_et:c%d", so they can be identified per
	  CPU in "vmstat -i" or "show intrcnt" in KDB.

	- When scheduling a timer into the hypervisor, try up to
	  60 times if the hypervisor rejects the time as being in
	  the past.  In the common case, this retry shouldn't happen,
	  and if it does, it should only happen once.  This is
	  because the event timer advertises a minimum period of
	  100usec, which is only less than the usual hypercall round
	  trip time about 1 out of every 100 tries.  (Unlike other
	  similar drivers, this one actually checks whether the
	  hypervisor accepted the singleshot timer set hypercall.)

	- Implement a RTC PV clock based on the hypervisor wallclock.

sys/conf/files:
	- Add dev/xen/timer/timer.c if the kernel configuration
	  includes either the XEN or XENHVM options.

sys/conf/files.i386:
sys/i386/include/xen/xen_clock_util.h:
sys/i386/xen/clock.c:
sys/i386/xen/xen_clock_util.c:
sys/i386/xen/mp_machdep.c:
sys/i386/xen/xen_rtc.c:
	- Remove previous PV timer used in i386 XEN PV kernels, the
	  new timer introduced in this change is used instead (so
	  we share the same code between PVHVM and PV).

MFC after: 2 weeks
2013-08-29 23:11:58 +00:00
Justin T. Gibbs
76acc41fb7 Implement vector callback for PVHVM and unify event channel implementations
Re-structure Xen HVM support so that:
	- Xen is detected and hypercalls can be performed very
	  early in system startup.
	- Xen interrupt services are implemented using FreeBSD's native
	  interrupt delivery infrastructure.
	- the Xen interrupt service implementation is shared between PV
	  and HVM guests.
	- Xen interrupt handlers can optionally use a filter handler
	  in order to avoid the overhead of dispatch to an interrupt
	  thread.
	- interrupt load can be distributed among all available CPUs.
	- the overhead of accessing the emulated local and I/O apics
	  on HVM is removed for event channel port events.
	- a similar optimization can eventually, and fairly easily,
	  be used to optimize MSI.

Early Xen detection, HVM refactoring, PVHVM interrupt infrastructure,
and misc Xen cleanups:

Sponsored by: Spectra Logic Corporation

Unification of PV & HVM interrupt infrastructure, bug fixes,
and misc Xen cleanups:

Submitted by: Roger Pau Monné
Sponsored by: Citrix Systems R&D

sys/x86/x86/local_apic.c:
sys/amd64/include/apicvar.h:
sys/i386/include/apicvar.h:
sys/amd64/amd64/apic_vector.S:
sys/i386/i386/apic_vector.s:
sys/amd64/amd64/machdep.c:
sys/i386/i386/machdep.c:
sys/i386/xen/exception.s:
sys/x86/include/segments.h:
	Reserve IDT vector 0x93 for the Xen event channel upcall
	interrupt handler.  On Hypervisors that support the direct
	vector callback feature, we can request that this vector be
	called directly by an injected HVM interrupt event, instead
	of a simulated PCI interrupt on the Xen platform PCI device.
	This avoids all of the overhead of dealing with the emulated
	I/O APIC and local APIC.  It also means that the Hypervisor
	can inject these events on any CPU, allowing upcalls for
	different ports to be handled in parallel.

sys/amd64/amd64/mp_machdep.c:
sys/i386/i386/mp_machdep.c:
	Map Xen per-vcpu area during AP startup.

sys/amd64/include/intr_machdep.h:
sys/i386/include/intr_machdep.h:
	Increase the FreeBSD IRQ vector table to include space
	for event channel interrupt sources.

sys/amd64/include/pcpu.h:
sys/i386/include/pcpu.h:
	Remove Xen HVM per-cpu variable data.  These fields are now
	allocated via the dynamic per-cpu scheme.  See xen_intr.c
	for details.

sys/amd64/include/xen/hypercall.h:
sys/dev/xen/blkback/blkback.c:
sys/i386/include/xen/xenvar.h:
sys/i386/xen/clock.c:
sys/i386/xen/xen_machdep.c:
sys/xen/gnttab.c:
	Prefer FreeBSD primatives to Linux ones in Xen support code.

sys/amd64/include/xen/xen-os.h:
sys/i386/include/xen/xen-os.h:
sys/xen/xen-os.h:
sys/dev/xen/balloon/balloon.c:
sys/dev/xen/blkback/blkback.c:
sys/dev/xen/blkfront/blkfront.c:
sys/dev/xen/console/xencons_ring.c:
sys/dev/xen/control/control.c:
sys/dev/xen/netback/netback.c:
sys/dev/xen/netfront/netfront.c:
sys/dev/xen/xenpci/xenpci.c:
sys/i386/i386/machdep.c:
sys/i386/include/pmap.h:
sys/i386/include/xen/xenfunc.h:
sys/i386/isa/npx.c:
sys/i386/xen/clock.c:
sys/i386/xen/mp_machdep.c:
sys/i386/xen/mptable.c:
sys/i386/xen/xen_clock_util.c:
sys/i386/xen/xen_machdep.c:
sys/i386/xen/xen_rtc.c:
sys/xen/evtchn/evtchn_dev.c:
sys/xen/features.c:
sys/xen/gnttab.c:
sys/xen/gnttab.h:
sys/xen/hvm.h:
sys/xen/xenbus/xenbus.c:
sys/xen/xenbus/xenbus_if.m:
sys/xen/xenbus/xenbusb_front.c:
sys/xen/xenbus/xenbusvar.h:
sys/xen/xenstore/xenstore.c:
sys/xen/xenstore/xenstore_dev.c:
sys/xen/xenstore/xenstorevar.h:
	Pull common Xen OS support functions/settings into xen/xen-os.h.

sys/amd64/include/xen/xen-os.h:
sys/i386/include/xen/xen-os.h:
sys/xen/xen-os.h:
	Remove constants, macros, and functions unused in FreeBSD's Xen
	support.

sys/xen/xen-os.h:
sys/i386/xen/xen_machdep.c:
sys/x86/xen/hvm.c:
	Introduce new functions xen_domain(), xen_pv_domain(), and
	xen_hvm_domain().  These are used in favor of #ifdefs so that
	FreeBSD can dynamically detect and adapt to the presence of
	a hypervisor.  The goal is to have an HVM optimized GENERIC,
	but more is necessary before this is possible.

sys/amd64/amd64/machdep.c:
sys/dev/xen/xenpci/xenpcivar.h:
sys/dev/xen/xenpci/xenpci.c:
sys/x86/xen/hvm.c:
sys/sys/kernel.h:
	Refactor magic ioport, Hypercall table and Hypervisor shared
	information page setup, and move it to a dedicated HVM support
	module.

	HVM mode initialization is now triggered during the
	SI_SUB_HYPERVISOR phase of system startup.  This currently
	occurs just after the kernel VM is fully setup which is
	just enough infrastructure to allow the hypercall table
	and shared info page to be properly mapped.

sys/xen/hvm.h:
sys/x86/xen/hvm.c:
	Add definitions and a method for configuring Hypervisor event
	delievery via a direct vector callback.

sys/amd64/include/xen/xen-os.h:
sys/x86/xen/hvm.c:

sys/conf/files:
sys/conf/files.amd64:
sys/conf/files.i386:
	Adjust kernel build to reflect the refactoring of early
	Xen startup code and Xen interrupt services.

sys/dev/xen/blkback/blkback.c:
sys/dev/xen/blkfront/blkfront.c:
sys/dev/xen/blkfront/block.h:
sys/dev/xen/control/control.c:
sys/dev/xen/evtchn/evtchn_dev.c:
sys/dev/xen/netback/netback.c:
sys/dev/xen/netfront/netfront.c:
sys/xen/xenstore/xenstore.c:
sys/xen/evtchn/evtchn_dev.c:
sys/dev/xen/console/console.c:
sys/dev/xen/console/xencons_ring.c
	Adjust drivers to use new xen_intr_*() API.

sys/dev/xen/blkback/blkback.c:
	Since blkback defers all event handling to a taskqueue,
	convert this task queue to a "fast" taskqueue, and schedule
	it via an interrupt filter.  This avoids an unnecessary
	ithread context switch.

sys/xen/xenstore/xenstore.c:
	The xenstore driver is MPSAFE.  Indicate as much when
	registering its interrupt handler.

sys/xen/xenbus/xenbus.c:
sys/xen/xenbus/xenbusvar.h:
	Remove unused event channel APIs.

sys/xen/evtchn.h:
	Remove all kernel Xen interrupt service API definitions
	from this file.  It is now only used for structure and
	ioctl definitions related to the event channel userland
	device driver.

	Update the definitions in this file to match those from
	NetBSD.  Implementing this interface will be necessary for
	Dom0 support.

sys/xen/evtchn/evtchnvar.h:
	Add a header file for implemenation internal APIs related
	to managing event channels event delivery.  This is used
	to allow, for example, the event channel userland device
	driver to access low-level routines that typical kernel
	consumers of event channel services should never access.

sys/xen/interface/event_channel.h:
sys/xen/xen_intr.h:
	Standardize on the evtchn_port_t type for referring to
	an event channel port id.  In order to prevent low-level
	event channel APIs from leaking to kernel consumers who
	should not have access to this data, the type is defined
	twice: Once in the Xen provided event_channel.h, and again
	in xen/xen_intr.h.  The double declaration is protected by
	__XEN_EVTCHN_PORT_DEFINED__ to ensure it is never declared
	twice within a given compilation unit.

sys/xen/xen_intr.h:
sys/xen/evtchn/evtchn.c:
sys/x86/xen/xen_intr.c:
sys/dev/xen/xenpci/evtchn.c:
sys/dev/xen/xenpci/xenpcivar.h:
	New implementation of Xen interrupt services.  This is
	similar in many respects to the i386 PV implementation with
	the exception that events for bound to event channel ports
	(i.e. not IPI, virtual IRQ, or physical IRQ) are further
	optimized to avoid mask/unmask operations that aren't
	necessary for these edge triggered events.

	Stubs exist for supporting physical IRQ binding, but will
	need additional work before this implementation can be
	fully shared between PV and HVM.

sys/amd64/amd64/mp_machdep.c:
sys/i386/i386/mp_machdep.c:
sys/i386/xen/mp_machdep.c
sys/x86/xen/hvm.c:
	Add support for placing vcpu_info into an arbritary memory
	page instead of using HYPERVISOR_shared_info->vcpu_info.
	This allows the creation of domains with more than 32 vcpus.

sys/i386/i386/machdep.c:
sys/i386/xen/clock.c:
sys/i386/xen/xen_machdep.c:
sys/i386/xen/exception.s:
	Add support for new event channle implementation.
2013-08-29 19:52:18 +00:00
Mark Murray
12278dbb57 MFC 2013-08-26 10:40:25 +00:00
Mark Johnston
57f6086735 Implement the ip, tcp, and udp DTrace providers. The probe definitions use
dynamic translation so that their arguments match the definitions for
these providers in Solaris and illumos. Thus, existing scripts for these
providers should work unmodified on FreeBSD.

Tested by:	gnn, hiren
MFC after:	1 month
2013-08-25 21:54:41 +00:00
Mark Murray
ddbfa6b19e 1) example (partially humorous random_adaptor, that I call "EXAMPLE")
* It's not meant to be used in a real system, it's there to show how
   the basics of how to create interfaces for random_adaptors. Perhaps
   it should belong in a manual page

2) Move probe.c's functionality in to random_adaptors.c
 * rename random_ident_hardware() to random_adaptor_choose()

3) Introduce a new way to choose (or select) random_adaptors via tunable
"rngs_want" It's a list of comma separated names of adaptors, ordered
by preferences. I.e.:
rngs_want="yarrow,rdrand"

Such setting would cause yarrow to be preferred to rdrand. If neither of
them are available (or registered), then system will default to
something reasonable (currently yarrow). If yarrow is not present, then
we fall back to the adaptor that's first on the list of registered
adaptors.

4) Introduce a way where RNGs can play a role of entropy source. This is
mostly useful for HW rngs.

The way I envision this is that every HW RNG will use this
functionality by default. Functionality to disable this is also present.
I have an example of how to use this in random_adaptor_example.c (see
modload event, and init function)

5) fix kern.random.adaptors from
kern.random.adaptors: yarrowpanicblock
to
kern.random.adaptors: yarrow,panic,block

6) add kern.random.active_adaptor to indicate currently selected
adaptor:
root@freebsd04:~ # sysctl kern.random.active_adaptor
kern.random.active_adaptor: yarrow

Submitted by:	Arthur Mesh <arthurmesh@gmail.com>
2013-08-24 13:54:56 +00:00
Edward Tomasz Napierala
9732e4fd92 Move the old iSCSI initiator source to a more appropriate place
(sys/dev/iscsi_initiator/ instead of sys/dev/iscsi/initiator/), to make
room for the new one.  This is also more logical location (kernel module
being named iscsi_initiator.ko, for example).  There is no ongoing work
on this I know of, so it shouldn't make life harder for anyone.

There are no functional changes, apart from "svn mv" and adjusting paths.
2013-08-22 14:02:34 +00:00
Pawel Jakub Dawidek
0dac22d8ea Implement 32bit versions of the cap_ioctls_limit(2) and cap_ioctls_get(2)
system calls as unsigned longs have different size on i386 and amd64.

Reported by:	jilles
Sponsored by:	The FreeBSD Foundation
2013-08-18 10:30:41 +00:00
Pedro F. Giffuni
d7511a40a7 Add read-only support for extents in ext2fs.
Basic support for extents was implemented by Zheng Liu as part
of his Google Summer of Code in 2010. This support is read-only
at this time.

In addition to extents we also support the huge_file extension
for read-only purposes. This works nicely with the additional
support for birthtime/nanosec timestamps and dir_index that
have been added lately.

The implementation may not work for all ext4 filesystems as
it doesn't support some features that are being enabled by
default on recent linux like flex_bg. Nevertheless, the feature
should be very useful for migration or simple access in
filesystems that have been converted from ext2/3 or don't use
incompatible features.

Special thanks to Zheng Liu for his dedication and continued
work to support ext2 in FreeBSD.

Submitted by:	Zheng Liu (lz@)
Reviewed by:	Mike Ma, Christoph Mallon (previous version)
Sponsored by:	Google Inc.
MFC after:	3 weeks
2013-08-12 21:34:48 +00:00
David E. O'Brien
5711939b63 * Add random_adaptors.[ch] which is basically a store of random_adaptor's.
random_adaptor is basically an adapter that plugs in to random(4).
  random_adaptor can only be plugged in to random(4) very early in bootup.
  Unplugging random_adaptor from random(4) is not supported, and is probably a
  bad idea anyway, due to potential loss of entropy pools.
  We currently have 3 random_adaptors:
  + yarrow
  + rdrand (ivy.c)
  + nehemeiah

* Remove platform dependent logic from probe.c, and move it into
  corresponding registration routines of each random_adaptor provider.
  probe.c doesn't do anything other than picking a specific random_adaptor
  from a list of registered ones.

* If the kernel doesn't have any random_adaptor adapters present then the
  creation of /dev/random is postponed until next random_adaptor is kldload'ed.

* Fix randomdev_soft.c to refer to its own random_adaptor, instead of a
  system wide one.

Submitted by: arthurmesh@gmail.com, obrien
Obtained from: Juniper Networks
Reviewed by: so (des)
2013-08-09 15:31:50 +00:00
David E. O'Brien
0e6a0799a9 Back out r253779 & r253786. 2013-07-31 17:21:18 +00:00
Rui Paulo
31d9867769 Import OpenBSD's rsu(4) WLAN driver.
Support chipsets are the Realtek RTL8188SU, RTL8191SU, and RTL8192SU.

Many thanks to Idwer Vollering for porting/writing the man page and for
testing.

Reviewed by:	adrian, hselasky
Obtained from:	OpenBSD
Tested by:	kevlo, Idwer Vollering <vidwer at gmail.com>
2013-07-30 02:07:57 +00:00
David E. O'Brien
99ff83da74 Decouple yarrow from random(4) device.
* Make Yarrow an optional kernel component -- enabled by "YARROW_RNG" option.
  The files sha2.c, hash.c, randomdev_soft.c and yarrow.c comprise yarrow.

* random(4) device doesn't really depend on rijndael-*.  Yarrow, however, does.

* Add random_adaptors.[ch] which is basically a store of random_adaptor's.
  random_adaptor is basically an adapter that plugs in to random(4).
  random_adaptor can only be plugged in to random(4) very early in bootup.
  Unplugging random_adaptor from random(4) is not supported, and is probably a
  bad idea anyway, due to potential loss of entropy pools.
  We currently have 3 random_adaptors:
  + yarrow
  + rdrand (ivy.c)
  + nehemeiah

* Remove platform dependent logic from probe.c, and move it into
  corresponding registration routines of each random_adaptor provider.
  probe.c doesn't do anything other than picking a specific random_adaptor
  from a list of registered ones.

* If the kernel doesn't have any random_adaptor adapters present then the
  creation of /dev/random is postponed until next random_adaptor is kldload'ed.

* Fix randomdev_soft.c to refer to its own random_adaptor, instead of a
  system wide one.

Submitted by: arthurmesh@gmail.com, obrien
Obtained from: Juniper Networks
Reviewed by: obrien
2013-07-29 20:26:27 +00:00
Alfred Perlstein
3d30404f83 Fix watchdog pretimeout.
The original API calls for pow2ns, however the new APIs from
Linux call for seconds.

We need to be able to convert to/from 2^Nns to seconds in both
userland and kernel to fix this and properly compare units.
2013-07-27 20:47:01 +00:00
Navdeep Parhar
caf20efcde Add support for packet-sniffing tracers to cxgbe(4). This works with
all T4 and T5 based cards and is useful for analyzing TSO, LRO, TOE, and
for general purpose monitoring without tapping any cxgbe or cxl ifnet
directly.

Tracers on the T4/T5 chips provide access to Ethernet frames exactly as
they were received from or transmitted on the wire.  On transmit, a
tracer will capture a frame after TSO segmentation, hw VLAN tag
insertion, hw L3 & L4 checksum insertion, etc.  It will also capture
frames generated by the TCP offload engine (TOE traffic is normally
invisible to the kernel).  On receive, a tracer will capture a frame
before hw VLAN extraction, runt filtering, other badness filtering,
before the steering/drop/L2-rewrite filters or the TOE have had a go at
it, and of course before sw LRO in the driver.

There are 4 tracers on a chip.  A tracer can trace only in one direction
(tx or rx).  For now cxgbetool will set up tracers to capture the first
128B of every transmitted or received frame on a given port.  This is a
small subset of what the hardware can do.  A pseudo ifnet with the same
name as the nexus driver (t4nex0 or t5nex0) will be created for tracing.
The data delivered to this ifnet is an additional copy made inside the
chip.  Normal delivery to cxgbe<n> or cxl<n> will be made as usual.

/* watch cxl0, which is the first port hanging off t5nex0. */
# cxgbetool t5nex0 tracer 0 tx0  (watch what cxl0 is transmitting)
# cxgbetool t5nex0 tracer 1 rx0  (watch what cxl0 is receiving)
# cxgbetool t5nex0 tracer list
# tcpdump -i t5nex0   <== all that cxl0 sees and puts on the wire

If you were doing TSO, a tcpdump on cxl0 may have shown you ~64K
"frames" with no L3/L4 checksum but this will show you the frames that
were actually transmitted.

/* all done */
# cxgbetool t5nex0 tracer 0 disable
# cxgbetool t5nex0 tracer 1 disable
# cxgbetool t5nex0 tracer list
# ifconfig t5nex0 destroy
2013-07-26 22:04:11 +00:00
Luiz Otavio O Souza
b9f07b864b Add the support for 802.1q and port based vlans for arswitch.
Tested on: RB450G (standalone ar8316), RSPRO (standalone ar8316) and
TPLink MR-3220 (ar724x integrated switch).

Approved by:	adrian (mentor)
Obtained from:	zrouter
2013-07-23 14:24:22 +00:00
Rui Paulo
ae230e21f9 Fix the urtwnfw definitions. We can now use urtwnfw in kernel config files. 2013-07-13 07:17:18 +00:00
Andre Oppermann
81d392a09d Improve SYN cookies by encoding the MSS, WSCALE (window scaling) and SACK
information into the ISN (initial sequence number) without the additional
use of timestamp bits and switching to the very fast and cryptographically
strong SipHash-2-4 MAC hash algorithm to protect the SYN cookie against
forgeries.

The purpose of SYN cookies is to encode all necessary session state in
the 32 bits of our initial sequence number to avoid storing any information
locally in memory.  This is especially important when under heavy spoofed
SYN attacks where we would either run out of memory or the syncache would
fill with bogus connection attempts swamping out legitimate connections.

The original SYN cookies method only stored an indexed MSS values in the
cookie.  This isn't sufficient anymore and breaks down in the presence of
WSCALE information which is only exchanged during SYN and SYN-ACK.  If we
can't keep track of it then we may severely underestimate the available
send or receive window. This is compounded with large windows whose size
information on the TCP segment header is even lower numerically.  A number
of years back SYN cookies were extended to store the additional state in
the TCP timestamp fields, if available on a connection.  While timestamps
are common among the BSD, Linux and other *nix systems Windows never enabled
them by default and thus are not present for the vast majority of clients
seen on the Internet.

The common parameters used on TCP sessions have changed quite a bit since
SYN cookies very invented some 17 years ago.  Today we have a lot more
bandwidth available making the use window scaling almost mandatory.  Also
SACK has become standard making recovering from packet loss much more
efficient.

This change moves all necessary information into the ISS removing the need
for timestamps.  Both the MSS (16 bits) and send WSCALE (4 bits) are stored
in 3 bit indexed form together with a single bit for SACK.  While this is
significantly less than the original range, it is sufficient to encode all
common values with minimal rounding.

The MSS depends on the MTU of the path and with the dominance of ethernet
the main value seen is around 1460 bytes.  Encapsulations for DSL lines
and some other overheads reduce it by a few more bytes for many connections
seen.  Rounding down to the next lower value in some cases isn't a problem
as we send only slightly more packets for the same amount of data.

The send WSCALE index is bit more tricky as rounding down under-estimates
the available send space available towards the remote host, however a small
number values dominate and are carefully selected again.

The receive WSCALE isn't encoded at all but recalculated based on the local
receive socket buffer size when a valid SYN cookie returns.  A listen socket
buffer size is unlikely to change while active.

The index values for MSS and WSCALE are selected for minimal rounding errors
based on large traffic surveys.  These values have to be periodically
validated against newer traffic surveys adjusting the arrays tcp_sc_msstab[]
and tcp_sc_wstab[] if necessary.

In addition the hash MAC to protect the SYN cookies is changed from MD5
to SipHash-2-4, a much faster and cryptographically secure algorithm.

Reviewed by:	dwmalone
Tested by:	Fabian Keil <fk@fabiankeil.de>
2013-07-11 15:29:25 +00:00
Hiren Panchasara
3c9d5a037d Adding urtwn(4) firmware and related changes.
Reviewed by:	rpaulo
Approved by:	sbruno (mentor)
2013-07-10 08:21:09 +00:00
Pedro F. Giffuni
5f5458322a Add files related to ext2 HTree implementation
These should've been added along with r252890

Reported by:	gonzo
PointyHat:	pfg
MFC after:	1 week
2013-07-07 01:12:29 +00:00
Navdeep Parhar
f72b68a1bf - Include the T5 firmware with the driver.
- Update the T4 firmware to the latest.
- Minor reorganization and updates to the version macros, etc.

Obtained from:	Chelsio
MFC after:	1 day
2013-07-03 23:52:15 +00:00
Peter Wemm
6c6f1f0185 Add an entry for filemon. 2013-07-03 20:22:12 +00:00
Davide Italiano
237abf0c56 - Trim an unused and bogus Makefile for mount_smbfs.
- Reconnect with some minor modifications, in particular now selsocket()
internals are adapted to use sbintime units after recent'ish calloutng
switch.
2013-06-28 21:00:08 +00:00
Jeff Roberson
5f51836645 - Add a general purpose resource allocator, vmem, from NetBSD. It was
originally inspired by the Solaris vmem detailed in the proceedings
   of usenix 2001.  The NetBSD version was heavily refactored for bugs
   and simplicity.
 - Use this resource allocator to allocate the buffer and transient maps.
   Buffer cache defrags are reduced by 25% when used by filesystems with
   mixed block sizes.  Ultimately this may permit dynamic buffer cache
   sizing on low KVA machines.

Discussed with:	alc, kib, attilio
Tested by:	pho
Sponsored by:	EMC / Isilon Storage Division
2013-06-28 03:51:20 +00:00
Oleksandr Tymoshenko
db5815641c Rename run(4) firmware file from runfw to run.fw. Previous name was the
same as top-level target name for "device runfw" kernel option and
caused cyclic dependancy that lead to kernel build breakage

Module change is not strictly required and done for name unification sake

PR:		conf/175751
Submitted by:	    Issei <i10a at herbmint.jp>
2013-06-21 18:16:54 +00:00
Jack F Vogel
fd75b91d13 Add quad port probe support, this gives the admin proper information about the slot
(which should be a PCIE Gen 3 slot for this adapter) by looking back thru the PCI
parent devices to the slot device.

The fix above also corrects the bandwidth display to GT/s rather than the
incorrect Gb/s

Next, allow the use of ALTQ if you select the compile option IXGBE_LEGACY_TX.

Allow the use of 'unsupported' optic modules by a compile option as well.

Add a phy reset capability into the stop code, this is so a static configured
driver will still behave properly when taken down (not being able to unload it).

This revision synchronizes the shared code with Intel internal current code,
and note that it now includes DCB supporting code, this was necessitated by
some internal changes with the code, but it also will provide the opportunity
to develop this feature in the core driver down the road.

I have edited the README to get rid of some of the worse anachronisms in it
as well, its by no means as robust as I might wish at this point however.

Oh, I also have included some conditional stuff in the code so it will be
compatible in both the 9.X and 10 environments.

Performance has been a focus in recent changes and I believe this revision
driver will perform very well in most workloads.

MFC after: 2 weeks
2013-06-18 21:28:19 +00:00
Scott Long
c8789c34fd This is an addendum to r251837.
Missed adding the new references to cam_compat.c to the various makefiles.

Obtained from:	Netflix
2013-06-17 10:21:38 +00:00
Adrian Chadd
216ca2346f Migrate the LNA mixing diversity machinery from the AR9285 HAL to the driver.
The AR9485 chip and AR933x SoC both implement LNA diversity.
There are a few extra things that need to happen before this can be
flipped on for those chips (mostly to do with setting up the different
bias values and LNA1/LNA2 RSSI differences) but the first stage is
putting this code into the driver layer so it can be reused.

This has the added benefit of making it easier to expose configuration
options and diagnostic information via the ioctl API.  That's not yet
being done but it sure would be nice to do so.

Tested:

* AR9285, with LNA diversity enabled
* AR9285, with LNA diversity disabled in EEPROM
2013-06-12 14:52:57 +00:00
Rui Paulo
c2c2fc4d86 Import Kevin Lo's port of urtwn(4) from OpenBSD. urtwn(4) is a driver for the
Realtek RTL8188CU/RTL8192CU USB IEEE 802.11b/g/n wireless cards.
This driver requires microcode which is available in FreeBSD ports:
net/urtwn-firmware-kmod.

Hiren ported the urtwn(4) man page from OpenBSD and Glen just commited a port
for the firmware.

TODO:
- 802.11n support
- Stability fixes - the driver can sustain lots of traffic but has trouble
coping with simultaneous iperf sessions.
- fix debugging

MFC after:	2 months
Tested by:	kevlo, hiren, gjb
2013-06-08 16:02:31 +00:00
Adrian Chadd
b70f530bc7 Bring over the initial static bluetooth coexistence configuration
for the WB195 combo NIC - an AR9285 w/ an AR3011 USB bluetooth NIC.

The AR3011 is wired up using a 3-wire coexistence scheme to the AR9285.

The code in if_ath_btcoex.c sets up the initial hardware mapping
and coexistence configuration.  There's nothing special about it -
it's static; it doesn't try to configure bluetooth / MAC traffic priorities
or try to figure out what's actually going on.  It's enough to stop basic
bluetooth traffic from causing traffic stalls and diassociation from
the wireless network.

To use this code, you must have the above NIC.  No, it won't work
for the AR9287+AR3012, nor the AR9485, AR9462 or AR955x combo cards.

Then you set a kernel hint before boot or before kldload, where 'X'
is the unit number of your AR9285 NIC:

# kenv hint.ath.X.btcoex_profile=wb195

This will then appear in your boot messages:

[100482] athX: Enabling WB195 BTCOEX

This code is going to evolve pretty quickly (well, depending upon my
spare time) so don't assume the btcoex API is going to stay stable.

In order to use the bluetooth side, you must also load in firmware using
ath3kfw and the binary firmware file (ath3k-1.fw in my case.)

Tested:

* AR9280, no interference
* WB195 - AR9285 + AR3011 combo; STA mode; basic bluetooth inquiries
  were enough to cause traffic stalls and disassociations.  This has
  stopped with the btcoex profile code.

TODO:

* Importantly - the AR9285 needs ASPM disabled if bluetooth coexistence
  is enabled.  No, I don't know why.  It's likely some kind of bug to do
  with the AR3011 sending bluetooth coexistence signals whilst the device
  is asleep.  Since we don't actually sleep the MAC just yet, it shouldn't
  be a problem.  That said, to be totally correct:

  + ASPM should be disabled - upon attach and wakeup
  + The PCIe powersave HAL code should never be called

  Look at what the ath9k driver does for inspiration.

* Add WB197 (AR9287+AR3012) support
* Add support for the AR9485, which is another combo like the AR9285
* The later NICs have a different signaling mechanism between the MAC
  and the bluetooth device; I haven't even begun to experiment with
  making that HAL code work.  But it should be a lot more automatic.

* The hardware can do much more interesting traffic weighting with
  bluetooth and wifi traffic.  None of this is currently used.
  Ideally someone would code up something to watch the bluetooth traffic
  GPIO (via an interrupt) and then watch it go high/low; then figure out
  what the bluetooth traffic is and adjust things appropriately.

* If I get the time I may add in some code to at least track this stuff
  and expose statistics.  But it's up to someone else to experiment with
  the bluetooth coexistence support and add the interesting stuff (like
  "real" detection of bulk, audio, etc bluetooth traffic patterns and
  change wifi parameters appropriately - eg, maximum aggregate length,
  transmit power, using quiet time to control TX duty cycle, etc.)
2013-06-07 09:02:02 +00:00
Achim Leubner
dce93cd06d Driver 'aacraid' added. Supports Adaptec by PMC RAID controller families Series 6, 7, 8 and upcoming products. Older Adaptec RAID controller families are supported by the 'aac' driver.
Approved by:	scottl (mentor)
2013-05-24 09:22:43 +00:00
Marcel Moolenaar
cb34ed4434 Add basic support for FDT to i386 & amd64. This change includes:
1.  Common headers for fdt.h and ofw_machdep.h under x86/include
    with indirections under i386/include and amd64/include.
2.  New modinfo for loader provided FDT blob.
3.  Common x86_init_fdt() called from hammer_time() on amd64 and
    init386() on i386.
4.  Split-off FDT specific low-level console functions from FDT
    bus methods for the uart(4) driver. The low-level console
    logic has been moved to uart_cpu_fdt.c and is used for arm,
    mips & powerpc only. The FDT bus methods are shared across
    all architectures.
5.  Add dev/fdt/fdt_x86.c to hold the fdt_fixup_table[] and the
    fdt_pic_table[] arrays. Both are empty right now.

FDT addresses are I/O ports on x86. Since the core FDT code does
not handle different address spaces, adding support for both I/O
ports and memory addresses requires some thought and discussion.
It may be better to use a compile-time option that controls this.

Obtained from:	Juniper Networks, Inc.
2013-05-21 03:05:49 +00:00
Jung-uk Kim
a9d8d09c46 Merge ACPICA 20130517. 2013-05-20 23:52:49 +00:00
Jeff Roberson
f2cc1285c2 - Add a new general purpose path-compressed radix trie which can be used
with any structure containing a uint64_t index.  The tree code
   auto-generates type safe wrappers.
 - Eliminate the buf splay and replace it with pctrie.  This is not only
   significantly faster with large files but also allows for the possibility
   of shared locking.

Reviewed by:    alc, attilio
Sponsored by:   EMC / Isilon Storage Division
2013-05-12 04:05:01 +00:00
Adrian Chadd
248dd6039d Bring in a basic ethernet switch driver for the IP17x series of
switches.

These are notably found on some AR71xx based Mikrotik boards.

Submitted by:	Luiz Otavio O Souza <loos.br@gmail.com>
Reviewed by:	ray
2013-05-08 20:58:41 +00:00
Adrian Chadd
0e468be195 Add the AR9300 HAL into the kernel and module builds.
Tested:

* make universe (honest!)
2013-05-02 07:05:34 +00:00
Brooks Davis
b3caab66ad MFP4 changes 222065 and 222068:
Add a simplebus attachment for cfi(4)'s FDT support and move
cfi_bus_fdt.c to sys/conf/files so non-ppc architectures are supported.

Sponsored by:	DARPA, AFRL
2013-04-30 18:33:29 +00:00
Jung-uk Kim
895f26a936 Merge ACPICA 20130418. 2013-04-19 23:49:34 +00:00
Adrian Chadd
7b796c4039 Implement a very basic multi-PHY aware switch device.
This is intended to be used as a stop-gap for switch devices
which expose multiple ethernet PHYs but we don't have a driver
for - here, etherswitchcfg and the general switch configuration
API can be used to interface to said PHYs.

Submitted by:	Luiz Otavio O Souza <loos.br@gmail.com>
2013-04-19 17:50:38 +00:00
Kenneth D. Merry
adb974068b Move the NFS FHA (File Handle Affinity) code from sys/nfsserver to
sys/nfs, since it is now shared by the two NFS servers.

Suggested by:	rmacklem
Sponsored by:	Spectra Logic
MFC after:	2 weeks
2013-04-17 22:42:43 +00:00
Kenneth D. Merry
d96b98a360 Revamp the old NFS server's File Handle Affinity (FHA) code so that
it will work with either the old or new server.

The FHA code keeps a cache of currently active file handles for
NFSv2 and v3 requests, so that read and write requests for the same
file are directed to the same group of threads (reads) or thread
(writes).  It does not currently work for NFSv4 requests.  They are
more complex, and will take more work to support.

This improves read-ahead performance, especially with ZFS, if the
FHA tuning parameters are configured appropriately.  Without the
FHA code, concurrent reads that are part of a sequential read from
a file will be directed to separate NFS threads.  This has the
effect of confusing the ZFS zfetch (prefetch) code and makes
sequential reads significantly slower with clients like Linux that
do a lot of prefetching.

The FHA code has also been updated to direct write requests to nearby
file offsets to the same thread in the same way it batches reads,
and the FHA code will now also send writes to multiple threads when
needed.

This improves sequential write performance in ZFS, because writes
to a file are now more ordered.  Since NFS writes (generally
less than 64K) are smaller than the typical ZFS record size
(usually 128K), out of order NFS writes to the same block can
trigger a read in ZFS.  Sending them down the same thread increases
the odds of their being in order.

In order for multiple write threads per file in the FHA code to be
useful, writes in the NFS server have been changed to use a LK_SHARED
vnode lock, and upgrade that to LK_EXCLUSIVE if the filesystem
doesn't allow multiple writers to a file at once.  ZFS is currently
the only filesystem that allows multiple writers to a file, because
it has internal file range locking.  This change does not affect the
NFSv4 code.

This improves random write performance to a single file in ZFS, since
we can now have multiple writers inside ZFS at one time.

I have changed the default tuning parameters to a 22 bit (4MB)
window size (from 256K) and unlimited commands per thread as a
result of my benchmarking with ZFS.

The FHA code has been updated to allow configuring the tuning
parameters from loader tunable variables in addition to sysctl
variables.  The read offset window calculation has been slightly
modified as well.  Instead of having separate bins, each file
handle has a rolling window of bin_shift size.  This minimizes
glitches in throughput when shifting from one bin to another.

sys/conf/files:
	Add nfs_fha_new.c and nfs_fha_old.c.  Compile nfs_fha.c
	when either the old or the new NFS server is built.

sys/fs/nfs/nfsport.h,
sys/fs/nfs/nfs_commonport.c:
	Bring in changes from Rick Macklem to newnfs_realign that
	allow it to operate in blocking (M_WAITOK) or non-blocking
	(M_NOWAIT) mode.

sys/fs/nfs/nfs_commonsubs.c,
sys/fs/nfs/nfs_var.h:
	Bring in a change from Rick Macklem to allow telling
	nfsm_dissect() whether or not to wait for mallocs.

sys/fs/nfs/nfsm_subs.h:
	Bring in changes from Rick Macklem to create a new
	nfsm_dissect_nonblock() inline function and
	NFSM_DISSECT_NONBLOCK() macro.

sys/fs/nfs/nfs_commonkrpc.c,
sys/fs/nfsclient/nfs_clkrpc.c:
	Add the malloc wait flag to a newnfs_realign() call.

sys/fs/nfsserver/nfs_nfsdkrpc.c:
	Setup the new NFS server's RPC thread pool so that it will
	call the FHA code.

	Add the malloc flag argument to newnfs_realign().

	Unstaticize newnfs_nfsv3_procid[] so that we can use it in
	the FHA code.

sys/fs/nfsserver/nfs_nfsdsocket.c:
	In nfsrvd_dorpc(), add NFSPROC_WRITE to the list of RPC types
	that use the LK_SHARED lock type.

sys/fs/nfsserver/nfs_nfsdport.c:
	In nfsd_fhtovp(), if we're starting a write, check to see
	whether the underlying filesystem supports shared writes.
	If not, upgrade the lock type from LK_SHARED to LK_EXCLUSIVE.

sys/nfsserver/nfs_fha.c:
	Remove all code that is specific to the NFS server
	implementation.  Anything that is server-specific is now
	accessed through a callback supplied by that server's FHA
	shim in the new softc.

	There are now separate sysctls and tunables for the FHA
	implementations for the old and new NFS servers.  The new
	NFS server has its tunables under vfs.nfsd.fha, the old
	NFS server's tunables are under vfs.nfsrv.fha as before.

	In fha_extract_info(), use callouts for all server-specific
	code.  Getting file handles and offsets is now done in the
	individual server's shim module.

	In fha_hash_entry_choose_thread(), change the way we decide
	whether two reads are in proximity to each other.
	Previously, the calculation was a simple shift operation to
	see whether the offsets were in the same power of 2 bucket.
	The issue was that there would be a bucket (and therefore
	thread) transition, even if the reads were in close
	proximity.  When there is a thread transition, reads wind
	up going somewhat out of order, and ZFS gets confused.

	The new calculation simply tries to see whether the offsets
	are within 1 << bin_shift of each other.  If they are, the
	reads will be sent to the same thread.

	The effect of this change is that for sequential reads, if
	the client doesn't exceed the max_reqs_per_nfsd parameter
	and the bin_shift is set to a reasonable value (22, or
	4MB works well in my tests), the reads in any sequential
	stream will largely be confined to a single thread.

	Change fha_assign() so that it takes a softc argument.  It
	is now called from the individual server's shim code, which
	will pass in the softc.

	Change fhe_stats_sysctl() so that it takes a softc
	parameter.  It is now called from the individual server's
	shim code.  Add the current offset to the list of things
	printed out about each active thread.

	Change the num_reads and num_writes counters in the
	fha_hash_entry structure to 32-bit values, and rename them
	num_rw and num_exclusive, respectively, to reflect their
	changed usage.

	Add an enable sysctl and tunable that allows the user to
	disable the FHA code (when vfs.XXX.fha.enable = 0).  This
	is useful for before/after performance comparisons.

nfs_fha.h:
	Move most structure definitions out of nfs_fha.c and into
	the header file, so that the individual server shims can
	see them.

	Change the default bin_shift to 22 (4MB) instead of 18
	(256K).  Allow unlimited commands per thread.

sys/nfsserver/nfs_fha_old.c,
sys/nfsserver/nfs_fha_old.h,
sys/fs/nfsserver/nfs_fha_new.c,
sys/fs/nfsserver/nfs_fha_new.h:
	Add shims for the old and new NFS servers to interface with
	the FHA code, and callbacks for the

	The shims contain all of the code and definitions that are
	specific to the NFS servers.

	They setup the server-specific callbacks and set the server
	name for the sysctl and loader tunable variables.

sys/nfsserver/nfs_srvkrpc.c:
	Configure the RPC code to call fhaold_assign() instead of
	fha_assign().

sys/modules/nfsd/Makefile:
	Add nfs_fha.c and nfs_fha_new.c.

sys/modules/nfsserver/Makefile:
	Add nfs_fha_old.c.

Reviewed by:	rmacklem
Sponsored by:	Spectra Logic
MFC after:	2 weeks
2013-04-17 21:00:22 +00:00
Ivan Voras
c072011223 Introduce glabel labels based on GEOM ident attributes. In this initial
implementation, error on the side of conservatism and only create labels
for GEOMs of classes DISK and MULTIPATH.

Discussed with:	trasz
Approved by:	silence from freebsd-geom@
2013-04-15 16:09:24 +00:00
Gleb Smirnoff
4e76af6a41 Merge from projects/counters: counter(9).
Introduce counter(9) API, that implements fast and raceless counters,
provided (but not limited to) for gathering of statistical data.

See http://lists.freebsd.org/pipermail/freebsd-arch/2013-April/014204.html
for more details.

In collaboration with:	kib
Reviewed by:		luigi
Tested by:		ae, ray
Sponsored by:		Nginx, Inc.
2013-04-08 19:40:53 +00:00