Commit Graph

100576 Commits

Author SHA1 Message Date
Alexander V. Chernikov
a13a821641 Merge projects/ipfw to HEAD.
Main user-visible changes are related to tables:

* Tables are now identified by names, not numbers.
 There can be up to 65k tables with up to 63-byte long names.
* Tables are now set-aware (default off), so you can switch/move
 them atomically with rules.
* More functionality is supported (swap, lock, limits, user-level lookup,
 batched add/del) by generic table code.
* New table types are added (flow) so you can match multiple packet fields at once.
* Ability to add different type of lookup algorithms for particular
 table type has been added.
* New table algorithms are added (cidr:hash, iface:array, number:array and
 flow:hash) to make certain types of lookup more effective.
* Table value are now capable of holding multiple data fields for
  different tablearg users

Performance changes:
* Main ipfw lock was converted to rmlock
* Rule counters were separated from rule itself and made per-cpu.
* Radix table entries fits into 128 bytes
* struct ip_fw is now more compact so more rules will fit into 64 bytes
* interface tables uses array of existing ifindexes for faster match

ABI changes:
All functionality supported by old ipfw(8) remains functional.
 Old & new binaries can work together with the following restrictions:
* Tables named other than ^\d+$ are shown as table(65535) in
 ruleset in old binaries

Internal changes:.
Changing table ids to numbers resulted in format modification for
 most sockopt codes. Old sopt format was compact, but very hard to
 extend (no versioning, inability to add more opcodes), so
* All relevant opcodes were converted to TLV-based versioned IP_FW3-based codes.
* The remaining opcodes were also converted to be able to eliminate
 all older opcodes at once
* All IP_FW3 handlers uses special API instead of calling sooptcopy*
 directly to ease adding another communication methods
* struct ip_fw is now different for kernel and userland
* tablearg value has been changed to 0 to ease future extensions
* table "values" are now indexes in special value array which
 holds extended data for given index
* Batched add/delete has been added to tables code
* Most changes has been done to permit batched rule addition.
* interface tracking API has been added (started on demand)
 to permit effective interface tables operations
* O(1) skipto cache, currently turned off by default at
 compile-time (eats 512K).

* Several steps has been made towards making libipfw:
  * most of new functions were separated into "parse/prepare/show
    and actuall-do-stuff" pieces (already merged).
  * there are separate functions for parsing text string into "struct ip_fw"
    and printing "struct ip_fw" to supplied buffer (already merged).
* Probably some more less significant/forgotten features

MFC after:	1 month
Sponsored by:	Yandex LLC
2014-10-09 19:32:35 +00:00
Neel Natu
5295c3e61d Support Intel-specific MSRs that are accessed when booting up a linux in bhyve:
- MSR_PLATFORM_INFO
- MSR_TURBO_RATIO_LIMITx
- MSR_RAPL_POWER_UNIT

Reviewed by:	grehan
MFC after:	1 week
2014-10-09 19:13:33 +00:00
Edward Tomasz Napierala
1609230854 Remove remnants of some cleanup; no functional changes.
Sponsored by:	The FreeBSD Foundation
2014-10-09 18:49:58 +00:00
Alexander V. Chernikov
3e22054698 Merge HEAD@r272834 2014-10-09 18:03:12 +00:00
Baptiste Daroussin
16b2833f90 Backout r272825 every useland usage of ufs/ufs/dir.h are now broken with that change 2014-10-09 17:26:29 +00:00
Adrian Chadd
88c94345f2 Shuffle things.
Suggested by:	jhb

Differential Revision:	D906
Sponsored by:	Norse Corp
2014-10-09 16:48:42 +00:00
Warner Losh
f5bb55249a When building with a newer GCC, suppress some warnings for the
moment. The kernel isn't ready for them without a lot of work.
2014-10-09 16:39:10 +00:00
Alexander V. Chernikov
f9ab623bf2 Bump ipfw module version. 2014-10-09 16:12:01 +00:00
Alexander V. Chernikov
779b53d008 Sync to HEAD@r272825. 2014-10-09 15:35:28 +00:00
Baptiste Daroussin
92a1d420ba Use offsetof() from sys/types.h instead of a custom one
This fixes build with recent gcc versions
2014-10-09 15:26:22 +00:00
Marcel Moolenaar
2e7634503e Regenerate after r272823:
Move the SCTP syscalls to netinet with the rest of the SCTP code.

Submitted by:	Steve Kiernan <stevek@juniper.net>
Reviewed by:	tuexen, rrs
Obtained from:	Juniper Networks, Inc.
2014-10-09 15:19:35 +00:00
Marcel Moolenaar
80b47aefa1 Move the SCTP syscalls to netinet with the rest of the SCTP code. The
syscalls themselves are tightly coupled with the network stack and
therefore should not be in the generic socket code.

The following four syscalls have been marked as NOSTD so they can be
dynamically registered in sctp_syscalls_init() function:
  sys_sctp_peeloff
  sys_sctp_generic_sendmsg
  sys_sctp_generic_sendmsg_iov
  sys_sctp_generic_recvmsg

The syscalls are also set up to be dynamically registered when COMPAT32
option is configured.

As a side effect of moving the SCTP syscalls, getsock_cap needs to be
made available outside of the uipc_syscalls.c source file.  A proper
prototype has been added to the sys/socketvar.h header file.

API tests from the SCTP reference implementation have been run to ensure
compatibility. (http://code.google.com/p/sctp-refimpl/source/checkout)

Submitted by:	Steve Kiernan <stevek@juniper.net>
Reviewed by:	tuexen, rrs
Obtained from:	Juniper Networks, Inc.
2014-10-09 15:16:52 +00:00
Hans Petter Selasky
77d68af5df Add sysctl knob to disable port power on a specific USB HUB. You need
to reset the USB HUB using "usbconfig -d X.Y reset" or boot having the
setting in /boot/loader.conf before it activates.
2014-10-09 14:43:43 +00:00
Alexander V. Chernikov
4c060d851c Fix core on table destroy inroduced by table values code.
Rename @ti array copy to 'ti_copy'.
2014-10-09 14:33:20 +00:00
Alexander V. Chernikov
ce575f539f * Wire large user buffer before processing GET request.
* Fix incorrect size calculation for IP_FW_XGET request.
2014-10-09 12:37:53 +00:00
Baptiste Daroussin
72b2ef06e2 Only catch the line from the compiler output where 'version' is a word
This allows to build the kernel with gcc 4.9.1 from ports
2014-10-09 12:35:17 +00:00
Alexander Motin
259f12da87 Make iSCSI connection close somewhat less aggressive.
It allows to push out some final data from the send queue to the socket
before its close.  In particular, it increases chances for logout response
to be delivered to the initiator.
2014-10-09 09:12:08 +00:00
Xin LI
eba15cf463 MFV r272804:
Refactor the code and stop restore_object from creating two transactions.

Illumos issue:
    3693 restore_object uses at least two transactions to restore an object

MFC after:	2 weeks
2014-10-09 07:52:51 +00:00
Xin LI
ce44f14b41 MFV r272803:
Illumos issue:
    5175 implement dmu_read_uio_dbuf() to improve cached read performance

MFC after:	2 weeks
2014-10-09 07:18:40 +00:00
Hans Petter Selasky
f80ccb40c7 Refine support for disabling USB enumeration to allow device detach
and suspend and resume of existing devices.

MFC after:	2 weeks
2014-10-09 06:24:06 +00:00
Alexander Motin
840a5dd447 Use proper variable when looping through periphs with CAM_PERIPH_FREE.
PR:		194256
Submitted by:	Scott M. Ferris <smferris@gmail.com>
MFC after:	3 days
Sponsored by:	EMC/Isilon Storage Division
2014-10-09 05:53:58 +00:00
Bryan Venteicher
514929b193 Move the calls to u_tun_func() into udp6_append()
A similar cleanup for UDPv4 was performed in r220620.

Phabricator:	https://reviews.freebsd.org/D383
Reviewed by:	gnn
MFC after:	1 month
2014-10-09 05:42:07 +00:00
Adrian Chadd
fc4f524a6e Missing from previous commit - keep the VM domain -> PXM mapping
array and use it to map PXM -> VM domain when needed.

Differential Revision:	D906
Reviewed by:	jhb
2014-10-09 05:34:28 +00:00
Adrian Chadd
ffcf962dab Add a bus method to fetch the VM domain for the given device/bus.
* Add a bus_if.m method - get_domain() - returning the VM domain or
  ENOENT if the device isn't in a VM domain;
* Add bus methods to print out the domain of the device if appropriate;
* Add code in srat.c to save the PXM -> VM domain mapping that's done and
  expose a function to translate VM domain -> PXM;
* Add ACPI and ACPI PCI methods to check if the bus has a _PXM attribute
  and if so map it to the VM domain;
* (.. yes, this works recursively.)
* Have the pci bus glue print out the device VM domain if present.

Note: this is just the plumbing to start enumerating information -
it doesn't at all modify behaviour.

Differential Revision:	D906
Reviewed by:	jhb
Sponsored by:	Norse Corp
2014-10-09 05:33:25 +00:00
Bryan Venteicher
c19f98eb74 Check for mbuf copy failure when there are multiple multicast sockets
This partitular case is the only path where the mbuf could be NULL.
udp_append() checked for a NULL mbuf only after invoking the tunneling
callback. Our only in tree tunneling callback - SCTP - assumed a non
NULL mbuf, and it is a bit odd to make the callbacks responsible for
checking this condition.

This also reduces the differences between the IPv4 and IPv6 code.

MFC after:	1 month
2014-10-09 05:17:47 +00:00
Bryan Venteicher
1298fd9af4 Add M_FLOWID to M_COPYFLAGS
The M_FLOWID flag should be propagated to the new mbuf pkthdr in
m_move_pkthdr() and m_dup_pkthdr(). The new mbuf already got the
existing flowid value, but would be ignored since the flag was
not set.

Phabricator:	https://reviews.freebsd.org/D914
Reviewed by:	adrian
Obtained from:	NetApp
MFC after:	1 week
2014-10-09 04:40:19 +00:00
Marcel Moolenaar
383f423be1 Fix draining in ttydev_leave():
1.  ERESTART is not only returned when the revoke count changed. It
    is also returned when a signal is received. While a change in
    the revoke count should be ignored, a signal should not.
2.  Waiting until the output queue is entirely drained can cause a
    hang when the underlying device is stuck or broken.

Have tty_drain() take care of this by telling it when we're leaving.
When leaving, tty_drain() will use a timed wait to address point 2
above and it will check the revoke count to handle point 1 above.
The timeout is set to 1 second, which is arbitrary and long enough
to expect a change in the output queue.

Discussed with: jilles@
Reported by: Yamagi Burmeister <lists@yamagi.org>
2014-10-09 02:30:38 +00:00
Marcel Moolenaar
75c2b79df8 Apply r269126 to tty_timedwait():
Don't return ERESTART when the device is gone.
2014-10-09 01:59:25 +00:00
Marcel Moolenaar
511037bd43 Properly NUL-terminate the on-stack buffer for reading /boot.config
or /boot/config. In qemu, on a warm boot, the stack is not all zeroes
and we parse beyond the file's contents.

Obtained from:	Juniper Networks, Inc.
2014-10-09 01:54:32 +00:00
Andrey V. Elsukov
5b7a43f546 When tunneling interface is going to insert mbuf into netisr queue after stripping
outer header, consider it as new packet and clear the protocols flags.

This fixes problems when IPSEC traffic goes through various tunnels and router
doesn't send ICMP/ICMPv6 errors.

PR:		174602
Obtained from:	Yandex LLC
MFC after:	2 weeks
Sponsored by:	Yandex LLC
2014-10-08 21:23:34 +00:00
Mark Johnston
5eaae1411f Pass up the error status of minidumpsys() to its callers.
PR:		193761
Submitted by:	Conrad Meyer <conrad.meyer@isilon.com>
Sponsored by:	EMC / Isilon Storage Division
2014-10-08 20:25:21 +00:00
Alexander Motin
3b4ae4621f Remove one second wait for threads exit from icl_conn_close().
Switch it from polling with pause() to using cv_wait()/cv_signal().
2014-10-08 19:54:42 +00:00
Gavin Atkinson
c5dcd62615 It looks like an entry for the R215 is not required in cdce(4) after all. 2014-10-08 19:49:10 +00:00
Konstantin Belousov
07a92f34d6 Add an argument to the x86 pmap_invalidate_cache_range() to request
forced invalidation of the cache range regardless of the presence of
self-snoop feature.  Some recent Intel GPUs in some modes are not
coherent, and dirty lines in CPU cache must be flushed before the
pages are transferred to GPU domain.

Reviewed by:	alc (previous version)
Tested by:	pho (amd64)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-10-08 16:48:03 +00:00
John Baldwin
232e8b52b0 Add schedgraph traces for callout handlers. Specifically, a callwheel logs
a running event each time it executes a callout function.  The event
includes the function pointer, argument, and whether or not it was run from
hardware interrupt context.  The callwheel is marked idle when each handler
completes.  This effectively logs the duration of each callout routine in
the graph.
2014-10-08 16:22:59 +00:00
Alexander Motin
82315915e0 Properly report 12Gbps connection rate.
Reviewed by:	kadesai, slm
MFC after:	1 week
2014-10-08 16:22:26 +00:00
Michael Tuexen
9ba6106020 Ensure that the list of streams sent in a stream reset parameter fits
in an mbuf-cluster.
Thanks to Peter Bostroem for drawing my attention to this part of the code.
2014-10-08 15:30:59 +00:00
Michael Tuexen
e29127de2e Ensure that the number of stream reported in srs_number_streams is
consistent with the amount of data provided in the SCTP_RESET_STREAMS
socket option.
Thanks to Peter Bostroem from Google for drawing my attention to
this part of the code.
2014-10-08 15:29:49 +00:00
Andrey V. Elsukov
5c6af4bbf5 Fix comment.
MFC after:	1 week
2014-10-08 12:33:31 +00:00
Alexander Motin
5396f4d279 Implement software (mode page) and hardware (config) write protection. 2014-10-08 12:24:24 +00:00
Andrey V. Elsukov
0478dc0c16 Add an ability to set dumpdev via loader(8) tunable.
MFC after:	3 weeks
2014-10-08 12:18:16 +00:00
Alexander V. Chernikov
be8bc45790 Add IP_FW_DUMP_SOPTCODES sopt to be able to determine
which opcodes are currently available in kernel.
2014-10-08 11:12:14 +00:00
Kashyap D Desai
8e72737119 No logical code change in this pathc.
Only Style 9 changes for mrsas driver.

Reviewed by:	ambrisko
MFC after:	2 weeks
Sponsored by:	AVAGO Technologies
2014-10-08 10:14:37 +00:00
Edward Tomasz Napierala
5d28b9ed32 Simplify; no functional changes.
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2014-10-08 09:44:02 +00:00
Kashyap D Desai
7efff25b10 Driver version upgrade.
Bring head mrsas same as internal Phase 6.5.

Reviewed by:	ambrisko
MFC after:	2 weeks
Sponsored by:	AVAGO Technologies
2014-10-08 09:39:18 +00:00
Kashyap D Desai
839ee02531 In the passthru IOCTL path, the mfi command pool was freely accessible N times
where as there are limited number(32) of mfi commands in the pool.
The mfi command pool is now restricted to 27 simultaneous accesses by using
a counting semaphore while calling the passthru function.

In the mrsas_cam.c source file there was a same function name mrsas_poll(),
which was same as the mrsas_poll() implemented in the mrsas.c file for the
polling interface.
To clearly distinguish the functionality by usage we have renamed the former
as mrsas_cam_poll().

In the passthru function let's say it has got an mfi command from the pool
but it has failed in one of the DMA function call which will lead to leak
an mfi command because in the ERROR case it directly returns and not freeing up
the occupied mfi command.

Reviewed by:	ambrisko
MFC after:	2 weeks
Sponsored by:	AVAGO Technologies
2014-10-08 09:37:47 +00:00
Kashyap D Desai
da01111344 d_poll() callback function is the entry point for poll system call for the application.
It is meant to notify the applications which will be waiting for some
controller events to be occured.

Reviewed by:	ambrisko
MFC after:	2 weeks
Sponsored by:	AVAGO Technologies
2014-10-08 09:35:52 +00:00
Kashyap D Desai
d18d1b472d Extended MSI-x vectors support for Invader and Fury(12Gb/s HBA).
This Driver will create multiple MSI-x vector depending upon what FW expose.
As of now 12 Gbp/s MR controller (Invader and Fury) expose 96 msix vector.
As of now 6 Gbp/s MR controller (Thunderbolt) expose 16 msix vector.

Reviewed by:	ambrisko
MFC after:		2 weeks
Sponsored by:	AVAGO Technologies
2014-10-08 09:34:25 +00:00
Kashyap D Desai
90012054d6 Fix the minor svn add issue. $FreeBSD$ expands at the time of
snv add, so I have added $FreeBSD$ as comment.

This commit is contininous of last mrsas commit, so that compilation
does not break.

Obtained from:	AVAGO Technologies
MFC after: 2 weeks
2014-10-08 09:30:35 +00:00
Kashyap D Desai
536094dc79 This is a feature provided to run 32-bit linux binaries on FreeBSD 64bit
machine, for which 32bit compatibilty code has been added.
As in linux there is only one device entry that is used to fire IOCTL commands,
a new device entry megaraid_sas_ioctl_node is added for solely this
purpose.

From one dev node i.e mrgaraid_sa_ioctl_node we have to find out the
controller instance in case of multicontroller, for which one management info
structure has been added.

Reviewed by:	ambrisko
MFC after:		2 weeks
Sponsored by:	AVAGO Technologies
2014-10-08 09:19:35 +00:00
Ruslan Bukin
fdbf76c383 Always wait 'command done' interrupt status bit before proceeding next command.
Sponsored by:	DARPA, AFRL
2014-10-08 08:51:05 +00:00
Kashyap D Desai
4799d48568 Current MegaRAID firmware and hence the driver only supported 64VDs.
E.g: If the user wants to create more than 64VD on a controller,
    it is not possible on current firmware/driver.

New feature and requirement to support upto 256VD, firmware/driver/apps need changes.
In addition to that, there must be a backward compatibility of the new driver with the
older firmware and vice versa.

RAID map is the interface between Driver and FW to fetch all required
fields(attributes) for each Virtual Drives.
In the earlier design driver was using the FW copy of RAID map where as
in the new design the Driver will keep the RAID map copy of its own; on which
it will operate for any raid map access in fast path.

Local driver raid map copy will provide ease of access through out the code
and provide generic interface for future FW raid map changes.

For the backward compatibility driver will notify FW that it supports 256VD
to the FW in driver capability field.
Based on the controller properly returned by the FW, the Driver will know
whether it supports 256VD or not and will copy the RAID map accordingly.

At any given time, driver will always have old or new Raid map.

Reviewed by	:	ambrisko
MFC after	:	2 weeks
Sponsored by:	AVAGO Technologies
2014-10-08 08:48:18 +00:00
Alexander Motin
8a41675372 Add support for WRITE ATOMIC (16) command and report SBC-4 compliance.
Atomic writes are only supported for ZVOLs in "dev" mode.  In other cases
atomicity can not be guarantied and so the command is blocked.
2014-10-08 07:48:36 +00:00
Hans Petter Selasky
c38aa2537b Add support for disabling USB enumeration in general or on selected
USB HUBs.

MFC after:	2 weeks
2014-10-08 07:00:50 +00:00
Pyun YongHyeon
dd6d49aad2 Oops, fix typo made in r272729. 2014-10-08 05:53:04 +00:00
Pyun YongHyeon
b624ef0a8e Add support for QAC AR816x/AR817x Gigabit/Fast Ethernet controllers.
These controllers seem to have the same feature of AR813x/AR815x and
improved RSS support(4 TX queues and 8 RX queues).  alc(4) supports
all hardware features except RSS.  I didn't implement RX checksum
offloading for AR816x/AR817x just because I couldn't get
confirmation from the Vendor whether AR816x/AR817x corrected its
predecessor's RX checksum offloading bug on fragmented packets.
This change adds supports for the following controllers.
 o AR8161 PCIe Gigabit Ethernet controller
 o AR8162 PCIe Fast Ethernet controller
 o AR8171 PCIe Gigabit Ethernet controller
 o AR8172 PCIe Fast Ethernet controller
 o Killer E2200 Gigabit Ethernet controller

Tested by:	Many
Relnotes:	yes
MFC after:	2 weeks
HW donated by:	Qualcomm Atheros Communications, Inc.
2014-10-08 05:47:01 +00:00
Pyun YongHyeon
b19487dff8 Add new quirk PCI_QUIRK_MSI_INTX_BUG to pci(4).
QAC AR816x/E2200 controller has a silicon bug that MSI interrupt
does not assert if PCIM_CMD_INTxDIS bit of command register is set.

Reviewed by:	jhb
2014-10-08 05:34:39 +00:00
Pyun YongHyeon
0999f75a2f Fix a long standing bug in MAC statistics register access. One
additional register was erroneously added in the MAC register set
such that 7 TX statistics counters were wrong.
2014-10-08 01:03:32 +00:00
Sean Bruno
f6f6703f27 Implement PLPMTUD blackhole detection (RFC 4821), inspired by code
from xnu sources.  If we encounter a network where ICMP is blocked
the Needs Frag indicator may not propagate back to us.  Attempt to
downshift the mss once to a preconfigured value.

Default this feature to off for now while we do not have a full PLPMTUD
implementation in our stack.

Adds the following new sysctl's for control:
net.inet.tcp.pmtud_blackhole_detection -- turns on/off this feature
net.inet.tcp.pmtud_blackhole_mss       -- mss to try for ipv4
net.inet.tcp.v6pmtud_blackhole_mss     -- mss to try for ipv6

Adds the following new sysctl's for monitoring:
-- Number of times the code was activated to attempt a mss downshift
net.inet.tcp.pmtud_blackhole_activated
-- Number of times the blackhole mss was used in an attempt to downshift
net.inet.tcp.pmtud_blackhole_min_activated
-- Number of times that we failed to connect after we downshifted the mss
net.inet.tcp.pmtud_blackhole_failed

Phabricator:	https://reviews.freebsd.org/D506
Reviewed by:	rpaulo bz
MFC after:	2 weeks
Relnotes:	yes
Sponsored by:	Limelight Networks
2014-10-07 21:50:28 +00:00
Navdeep Parhar
19abdd0654 cxgbe/tom: don't leak resources tied to an active open request that
cannot be sent to the chip because a prerequisite L2 resolution
failed.

Submitted by:	Hariprasad at chelsio dot com (original version)
MFC after:	2 weeks.
2014-10-07 21:26:22 +00:00
Jung-uk Kim
37417245bf Make kern.nswbuf tunable from loader.
MFC after:	1 week
2014-10-07 20:13:47 +00:00
Gavin Atkinson
4c62148d7b Support the Vodafone R215 LET USB dongle, which is apparently a rebadged
E5372 with different product IDs.

Interestingly, the standard E5372 IDs (12d1:1506) are currently listed in
u3g.c and are the same as the E3131.  However, the R215/E5372 is an NCM
device and works well with cdce(4) whereas the E3131 isn't.  More work
may be needed to better identify the other device IDs.

MFC after:	1 week
2014-10-07 19:07:50 +00:00
Aleksandr Rybalko
c32f462f28 Allow vt(4) to disable terminal bell with sysctl kern.vt.bell_enable=0,
similar as syscons(4) do.

Submitted by:	Tiwei Bie <btw@mail.ustc.edu.cn>
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2014-10-07 18:47:53 +00:00
Alexander V. Chernikov
eadf3b965c Fix possible crash when old value pointer is not updated after array resize. 2014-10-07 18:22:05 +00:00
Bjoern A. Zeeb
c896236487 Since introducing the extra mapping in r250103 for architectural performance
events we have actually counted 'Branch Instruction Retired' when people
asked for 'Unhalted core cycles' using the 'unhalted-core-cycles' event mask
mnemonic.

Reviewed by:	jimharris
Discussed with:	gnn, rwatson
MFC after:	3 days
Sponsored by:	DARPA/AFRL
2014-10-07 18:00:34 +00:00
Ruslan Bukin
41709d23c4 Add driver for Synopsys DesignWare Mobile Storage Host Controller.
Sponsored by:	DARPA, AFRL
2014-10-07 17:39:30 +00:00
Alexander V. Chernikov
79e86902e9 Notify table algo aboute runtime data change on table flush. 2014-10-07 16:46:11 +00:00
Andriy Gapon
c3d1d2e104 l2arc_write_buffers: reduce headroom value
FreeBSD has ARC_BUFC_NUMMETADATALISTS metadata lists and ARC_BUFC_NUMDATALISTS
data lists (currently both are 16) while illumos has just a single list
of each kind.

headroom determines how much data is scanned on a single list
during each run of the l2arc feed thread.
Because FreeBSD has more lists we proportionally decrease the limit.

Reviewed by:	Brendan Gregg (earlier version)
MFC after:	2 weeks
Sponsored by:	HybridCluster
2014-10-07 16:08:21 +00:00
Andriy Gapon
9f96723ec5 revert r272702: wrong (earlier) change was committed 2014-10-07 16:06:10 +00:00
Michael Tuexen
5558cc334d Fix a bug introduced in
https://svnweb.freebsd.org/base?view=revision&revision=272347

MFC after: 3 days
2014-10-07 16:01:17 +00:00
Andriy Gapon
4c3b02bfce reduce L2ARC_WRITE_SIZE on FreeBSD
FreeBSD has ARC_BUFC_NUMMETADATALISTS metadata lists and ARC_BUFC_NUMDATALISTS
data lists (currently both are 16) while illumos has just a single list
of each kind.

L2ARC_WRITE_SIZE determines the default value of l2arc_write_max which
defines limits on how much data is scanned and written to a cache device
during each run of the l2arc feed thread.  The limits are applied on the
per buffer list basis.
Because FreeBSD has more lists we proportionally reduce the limits.

Reviewed by:	Brendan Gregg (earlier version)
MFC after:	2 weeks
Sponsored by:	HybridCluster
2014-10-07 14:30:24 +00:00
Andriy Gapon
ab26525af2 make userland __assfail from opensolaris compat honor 'aok' variable
This should allow zdb -A option to actually make difference.

MFC after:	2 weeks
2014-10-07 14:15:50 +00:00
Andrey V. Elsukov
9ef268219a Our packet filters use mbuf's rcvif pointer to determine incoming interface.
Change mbuf's rcvif to enc0 and restore it after pfil processing.

PR:		110959
Sponsored by:	Yandex LLC
2014-10-07 13:31:04 +00:00
Alexander V. Chernikov
8ebca97f5e * Fix crash in interface tracker due to using old "linked" field.
* Ensure we're flushing entries without any locks held.
* Free memory in (rare) case when interface tracker fails to register ifp.
* Add KASSERT on table values refcounts.
2014-10-07 10:54:53 +00:00
Hans Petter Selasky
15983afb04 Fix compile warning when compiling with GCC.
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2014-10-07 10:04:25 +00:00
Xin LI
0ef99c7017 Bump __FreeBSD_version for the addition of explicit_bzero(3). 2014-10-07 04:54:47 +00:00
Xin LI
78b59024b5 Add explicit_bzero(3) and its kernel counterpart.
Obtained from:	OpenBSD
MFC after:	2 weeks
2014-10-07 04:54:11 +00:00
Neel Natu
65145c7f50 Inject #UD into the guest when it executes either 'MONITOR' or 'MWAIT'.
The hypervisor hides the MONITOR/MWAIT capability by unconditionally setting
CPUID.01H:ECX[3] to 0 so the guest should not expect these instructions to
be present anyways.

Discussed with:	grehan
2014-10-06 20:48:01 +00:00
John Baldwin
a2f67d9087 Fix build for i386 kernels with out 'I686_CPU'.
PR:		193660
Submitted by:	holger@freyther.de
2014-10-06 18:11:05 +00:00
John Baldwin
4a872f3d4f Call initializecpucache() on the BSP for i386 in the !XEN case. This was
my bug in r271409 that I noticed while reviewing r272492.
2014-10-06 15:43:57 +00:00
Alexander Motin
204ca472bf Set CAM_SIM_QUEUED flag before calling ctl_queue() to avoid race.
PR:		194128
Submitted by:	Scott M. Ferris <smferris@gmail.com>
MFC after:	3 days
Sponsored by:	EMC/Isilon Storage Division
2014-10-06 14:52:04 +00:00
Alexander V. Chernikov
bbd5a84297 Improve r272609 (O_TCPOPTS).
MFC after:	3 dayes
2014-10-06 12:29:06 +00:00
Alexander Motin
04f10b6c96 Add support for MaxBurstLength and Expected Data transfer Length parameters.
Before this change target could send R2T request for write transfer of any
size, that could violate iSCSI RFC, which allows initiator to limit maximum
R2T size by negotiating MaxBurstLength connection parameter.

Also report an error in case of write underflow, when initiator provides
less data than initiator expects.  Previously in such case our target
sent R2T request for non-existing data, violating the RFC, and confusing
some initiators.  SCSI specs don't explicitly define how write underflows
should be handled and there are different oppinions, but reporting error
is hopefully better then violating iSCSI RFC with unpredictable results.

MFC after:	2 weeks
2014-10-06 12:20:46 +00:00
Alexander V. Chernikov
a5fedf11fc Sync to HEAD@r272609. 2014-10-06 11:29:50 +00:00
Alexander V. Chernikov
3615981425 Fix O_TCPOPTS processing.
Obtained from:	luigi
2014-10-06 11:15:11 +00:00
Alexander Motin
88971a900d Use r271207 optimization only for MSI-enabled HBAs.
It was found that VirtualBox' AHCI does not allow nterrupt to be cleared
before the interrupt status register is read, causing interrupt storm.

AHCI specification allows to skip this register use when multi-vector MSI
is enabled and so interrupting port is known.  For single-vector MSI that
is not stated explicitly, but if the port is only one, it is obviously
known too.
2014-10-06 10:58:54 +00:00
Andrew Turner
9677c48e30 Disable generating vfp and NEON instructions in the arm kernel. 2014-10-06 09:52:28 +00:00
Ganbold Tsagaankhuu
7f29c69aee Use documented compat string for msm uart.
Whilst here use tab instead of spaces.

Approved by:    stas (mentor)
2014-10-06 09:00:53 +00:00
Xin LI
1b5bcb8425 MFV r272591:
Use loaned ARC buffer for zfs receive to avoid copy.

Illumos issue:
    5162 zfs recv should use loaned arc buffer to avoid copy

MFC after:	2 weeks
2014-10-06 07:29:17 +00:00
Mateusz Guzik
3a222fe000 devfs: tidy up after 272596
This moves a var to an if statement, no functional changes.

MFC after:	1 week
2014-10-06 07:22:48 +00:00
Xin LI
8fb26f5aef MFV r272585:
Split the godfather zio into CPU number's to reduce lock
contention.

Illumos issue:
    5176 lock contention on godfather zio

MFC after:	2 weeks
2014-10-06 07:03:17 +00:00
Alexander Motin
5554eb9bfc Fix length of Extended INQUIRY Data VPD page.
MFC after:	3 days
2014-10-06 07:01:32 +00:00
Mateusz Guzik
1d1b55fbba devfs: don't take proctree_lock unconditionally in devfs_close
MFC after:	1 week
2014-10-06 06:20:35 +00:00
Hans Petter Selasky
b228e6bf57 Minor code styling.
Suggested by:	glebius @
2014-10-06 06:19:54 +00:00
Xin LI
dcb20006f0 MFV r272501:
Illumos issue:
    5177 remove dead code from dsl_scan.c

MFC after:	2 weeks
2014-10-06 05:46:51 +00:00
Xin LI
00769ce74d MFV r272500:
Don't inherit flags other than DS_FLAG_CI_DATASET and DS_FLAG_INCONSISTENT
when cloning.  This prevents DS_FLAG_DEFER_DESTROY being inherited from a
clone that is marked for deferred destroy, which causes snapshots of the
clone being destroyed when getting a hold or clone.

Illumos issue:
    5150 zfs clone of a defer_destroy snapshot causes strangeness

MFC after:	1 week
2014-10-06 05:42:20 +00:00
Mateusz Guzik
dd2390be68 Convert racct stubs to inline functions.
This saves some symbols and function calls for kernel without RACCT.

MFC after:	1 week
2014-10-06 02:31:33 +00:00
Mateusz Guzik
7775dfac2f seq_t needs to be visible to userspace
Pointy hat to:	mjg
Reported by: bz
X-MFC:	with r272567
2014-10-05 21:39:50 +00:00
Bryan Venteicher
111fbcd5ed Change the UMA mutex into a rwlock
Acquire the lock in read mode when just needed to ensure the stability
of the keg list. The UMA lock may be held for a long time (relatively
speaking) in uma_reclaim() on machines with lots of zones/kegs. If the
uma_timeout() would fire during that period, subsequent callouts on that
CPU may be significantly delayed.

Reviewed by:	jhb
2014-10-05 21:34:56 +00:00
Hiroki Sato
3b4b7de506 Virtualize if_edsc(4). 2014-10-05 21:27:26 +00:00
Michael Tuexen
041353aba4 Remove unused MC_ALIGN macro as suggested by Robert.
MFC after: 1 week
2014-10-05 20:30:49 +00:00
Hiroki Sato
d6f59204ef Virtualize if_disc(4) cloner. 2014-10-05 19:46:52 +00:00
Mateusz Guzik
0837bbe4c1 Keep struct filedescent comments within 80-char limit. 2014-10-05 19:44:40 +00:00
Hiroki Sato
c51275260b Virtualize if_bridge(4) cloner. 2014-10-05 19:43:37 +00:00
Mateusz Guzik
2b4a2528d7 filedesc: fix up breakage introduced in 272505
Include sequence counter supports incoditionally [1]. This fixes reprted build
problems with e.g. nvidia driver due to missing opt_capsicum.h.

Replace fishy looking sizeof with offsetof. Make fde_seq the last member in
order to simplify calculations.

Suggested by:	kib [1]
X-MFC:		with 272505
2014-10-05 19:40:29 +00:00
Konstantin Belousov
57c2505e65 On error, sbuf_bcat() returns -1. Some callers returned this -1 to
the upper layers, which interpret it as errno value, which happens to
be ERESTART.  The result was spurious restarts of the sysctls in loop,
e.g. kern.proc.proc, instead of returning ENOMEM to caller.

Convert -1 from sbuf_bcat() to ENOMEM, when returning to the callers
expecting errno.

In collaboration with:	pho
Sponsored by:	The FreeBSD Foundation (kib)
MFC after:	1 week
2014-10-05 17:35:59 +00:00
Yoshihiro Takahashi
513798cc9c - Refactor defining variables.
- Merge common modules both i386 and amd64 into one if-endif.
- Sort.
- There are no functional changes.
2014-10-05 07:27:05 +00:00
Mateusz Guzik
bad2520a2b Avoid unnecessary ppeers_lock acquisition in exit1.
MFC after:	1 week
2014-10-05 07:21:41 +00:00
Robert Watson
6c572040c6 Eliminate use of M_EXT in IP6_EXTHDR_CHECK() by trimming a redundant
'if'/'else' case: it matches the simple 'else' case that follows.

This reduces awareness of external-storage mechanics outside of the
mbuf allocator.

Reviewed by:	bz
MFC after:	3 days
Sponsored by:	EMC / Isilon Storage Division
Differential Revision:	https://reviews.freebsd.org/D900
2014-10-05 06:28:53 +00:00
Andrey V. Elsukov
4118113fc1 Rework bootparttest to use more code from sys/boot.
Use disk_open() call to emulate loader behavior.
2014-10-05 06:04:47 +00:00
Andrey V. Elsukov
34f50f4c34 Add a bit more debug messages. 2014-10-05 06:00:22 +00:00
Cy Schubert
2777bfabc0 ipfilter bug #537 NAT rules with sticky have incorrect hostmap IP address.
This fixes when an IP address mapping is put in the hostmap table for
sticky NAT rules, it ends up having the wrong byte order.

Obtained from:	ipfilter CVS repo (r1.102), NetBSD CVS repo (r1.12)
2014-10-05 03:58:30 +00:00
Cy Schubert
3a77b75120 ipfilter bug #534 destination list hashing not endian neutral
Obtained from:	ipfilter CVS repo (r1.26), NetBSD CVS repo (r1.8)
2014-10-05 03:52:09 +00:00
Cy Schubert
685545cdc7 ipfilter bug #538 ipf_p_dns_del should return void
Obtained from:	ipfilter cvs repo (r1.8)
2014-10-05 03:48:09 +00:00
Cy Schubert
a7bd2acdab ipfilter bug #554 Determining why a ipf rule matches is hard -- replace
ipfilter rule compare with new ipf_rule_compare() function.

Obtained from:	ipfilter CVS rep (r1.129)
2014-10-05 03:45:19 +00:00
Cy Schubert
7db9f2ba58 ipfiler bug #550 filter rule list corrupted with inserted rules
Obtained from:	ipfilter CVS repo (r1.128); NetBSD CVS repo (r1.15)
2014-10-05 03:41:47 +00:00
Bryan Venteicher
6e5254e0d7 Remove stray uma_mtx lock/unlock in zone_drain_wait()
Callers of zone_drain_wait(M_WAITOK) do not need to hold (and were not)
the uma_mtx, but we would attempt to unlock and relock the mutex if we
had to sleep because the zone was already draining. The M_NOWAIT callers
may hold the uma_mtx, but we do not sleep in that case.

Reviewed by:	jhb
MFC after:	3 days
2014-10-05 03:18:30 +00:00
Hiroki Sato
7eb756fab1 Use printb() for boolean flags in ro_opts and actor_state for LACP. 2014-10-05 02:37:01 +00:00
Hiroki Sato
6d47816791 - Move L2 addr configuration for the primary port to a taskqueue. This fixes
LOR of softc rmlock in iflladdr_event handlers.

- Call if_delmulti_ifma() after LACP_UNLOCK().  This fixes another LOR.

- Fix a panic in lacp_transit_expire().

- Fix a panic in lagg_input() upon shutting down a port.
2014-10-05 02:34:21 +00:00
Mateusz Guzik
25108069ec Get rid of crshared. 2014-10-05 02:16:53 +00:00
Konstantin Belousov
4142462eeb Slightly reword comment. Move code, which is described by the
comment, after it.

Discussed with:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-10-04 18:51:55 +00:00
Jean-Sébastien Pédron
0d06446812 vt(4): Don't recalculate buffer size if we don't know screen size
When the screen size is unknown, it's set to 0x0. We can't use that as
the buffer size, otherwise, functions such as vtbuf_fill() will fail.

This fixes a panic on RaspberryPi, where there's no vt(4) backend
configured early in boot.

PR:		193981
Tested by:	danilo@
MFC after:	3 days
2014-10-04 18:40:40 +00:00
Konstantin Belousov
b76278407d Add kernel option KSTACK_USAGE_PROF to sample the stack depth on
interrupts and report the largest value seen as sysctl
debug.max_kstack_used.  Useful to estimate how close the kernel stack
size is to overflow.

In collaboration with:	Larry Baird <lab@gta.com>
Sponsored by:	The FreeBSD Foundation (kib)
MFC after:	1 week
2014-10-04 18:38:14 +00:00
Konstantin Belousov
539c9eef12 Fixes for i/o during coredumping:
- Do not dump into system files.
- Do not acquire write reference to the mount point where img.core is
  written, in the coredump().  The vn_rdwr() calls from ELF imgact
  request the write ref from vn_rdwr().  Recursive acqusition of the
  write ref deadlocks with the unmount.
- Instead, take the range lock for the whole core file.  This prevents
  parallel dumping from two processes executing the same image,
  converting the useless interleaved dump into sequential dumping,
  with second core overwriting the first.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2014-10-04 18:35:00 +00:00
Konstantin Belousov
e3d6feceb1 Add IO_RANGELOCKED flag for vn_rdwr(9), which specifies that vnode is
not locked, but range is.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2014-10-04 18:28:27 +00:00
Ian Lepore
41e8f7efbe Make kevent(2) periodic timer events more reliably periodic. The event
callout is now scheduled using the C_ABSOLUTE flag, and the absolute time
of each event is calculated as the time the previous event was scheduled
for plus the interval.  This ensures that latency in processing a given
event doesn't perturb the arrival time of any subsequent events.

Reviewed by:	jhb
2014-10-04 15:59:15 +00:00
Xin LI
4bb264ae15 Don't make nested definition for range_seg_cache.
Reported by:	ian
MFC after:	1 week
X-MFC-With:	r272506
2014-10-04 15:42:52 +00:00
Bjoern A. Zeeb
92720216a9 Put and #ifdef _KERNEL around the #include for opt_capsicum.h to
hopefully allow the build to finish after r272505.
2014-10-04 14:17:30 +00:00
Alexander V. Chernikov
d4e1b51578 Fix build with gcc. 2014-10-04 13:57:14 +00:00
Alexander V. Chernikov
e530ca7333 Please GCC by specifying proper cast. 2014-10-04 13:46:10 +00:00
Alexander V. Chernikov
e3cadfdb32 Bump max rule size to 512 opcodes. 2014-10-04 12:46:26 +00:00
Alexander V. Chernikov
1ce4b35740 Sync to HEAD@r272516. 2014-10-04 12:42:37 +00:00
Alexander V. Chernikov
60805b89df Add "ipfw_ctl3" FEATURE to indicate presence of new ipfw interface. 2014-10-04 12:10:32 +00:00
Alexander V. Chernikov
ccba94b8fc Switch ipfw to use rmlock for runtime locking. 2014-10-04 11:40:35 +00:00
Alexander V. Chernikov
be3cc1b567 Bump max rule size to 512 opcodes. 2014-10-04 10:15:49 +00:00
Edward Tomasz Napierala
d19c297e5f Make autofs use shared vnode locks.
Reviewed by:	kib
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2014-10-04 09:37:40 +00:00
Xin LI
4750c382a9 MFV r272499:
Illumos issue:
    5174 add sdt probe for blocked read in dbuf_read()

MFC after:	2 weeks
2014-10-04 08:55:08 +00:00
Xin LI
eb0b70068c Add a new sysctl, vfs.zfs.vol.unmap_enabled, which allows the system
administrator to toggle whether ZFS should ignore UNMAP requests.

Illumos issue:
    5149 zvols need a way to ignore DKIOCFREE

MFC after:	2 weeks
2014-10-04 08:51:57 +00:00
Xin LI
2d36d67c72 Diff reduction with upstream. The code change is not really applicable
to FreeBSD.

Illumos issue:
    5148 zvol's DKIOCFREE holds zfsdev_state_lock too long

MFC after:	1 month
2014-10-04 08:41:23 +00:00
Xin LI
523b4c7fdf MFV r272496:
Add tunable for number of metaslabs per vdev
(vfs.zfs.vdev.metaslabs_per_vdev).  The default remains
at 200.

Illumos issue:
    5161 add tunable for number of metaslabs per vdev

MFC after:	2 weeks
2014-10-04 08:29:48 +00:00
Xin LI
a8d7512709 MFV r272495:
In arc_kmem_reap_now(), reap range_seg_cache too to reclaim memory in
response of memory pressure.

Illumos issue:
    5163 arc should reap range_seg_cache

MFC after:	1 week
2014-10-04 08:14:10 +00:00
Mateusz Guzik
ee3fd7bbb1 Plug capability races.
fp and appropriate capability lookups were not atomic, which could result in
improper capabilities being checked.

This could result either in protection bypass or in a spurious ENOTCAPABLE.

Make fp + capability check atomic with the help of sequence counters.

Reviewed by:	kib
MFC after:	3 weeks
2014-10-04 08:08:56 +00:00
Xin LI
8c20e2ff11 MFV r272494:
Make space_map_truncate() always do space_map_reallocate().  Without
this, setting space_map_max_blksz would cause panic for existing pool,
as dmu_objset_set_blocksize would fail if the object have multiple blocks.

Illumos issues:
   5164 space_map_max_blksz causes panic, does not work
   5165 zdb fails assertion when run on pool with recently-enabled
	spacemap_histogram feature

MFC after:	2 weeks
2014-10-04 08:05:39 +00:00
Mateusz Guzik
6cd8796a84 Add sequence counters with memory barriers.
Current implementation is somewhat simplistic and hackish,
will be improved later after possible memory barrier overhaul.

Reviewed by:	kib
MFC after:	3 weeks
2014-10-04 08:03:52 +00:00
Yoshihiro Takahashi
9407825cc6 Merge pc98's machdep.c into i386/i386/machdep.c. 2014-10-04 06:01:30 +00:00
Yoshihiro Takahashi
986eedef26 Reduce diffs against i386. 2014-10-04 05:03:39 +00:00
Yoshihiro Takahashi
a5eb7f4fba - MFi386: Add compile-with option for tau32-ddk.c.
- Whitespace change.
- Remove duplicate line.
2014-10-04 05:01:57 +00:00
Andrey V. Elsukov
f24c6fe478 Add GUID of FreeBSD slice to GPT scheme.
MFC after:	1 week
2014-10-03 21:46:07 +00:00
Steven Hartland
14a0d74ea8 Refactor ZFS ARC reclaim checks and limits
Remove previously added kmem methods in favour of defines which
allow diff minimisation between upstream code base.

Rebalance ARC free target to be vm_pageout_wakeup_thresh by default
which eliminates issue where ARC gets minimised instead of balancing
with VM pageout. The restores the target point prior to r270759.

Bring in missing upstream only changes which move unused code to
further eliminate code differences.

Add additional DTRACE probe to aid monitoring of ARC behaviour.

Enable upstream i386 code paths on platforms which don't define
UMA_MD_SMALL_ALLOC.

Fix mixture of byte an page values in arc_memory_throttle i386 code
path value assignment of available_memory.

PR:		187594
Review:		D702
Reviewed by:	avg
MFC after:	1 week
X-MFC-With:	r270759 & r270861
Sponsored by:	Multiplay
2014-10-03 20:34:55 +00:00
Hans Petter Selasky
b06d477b3c When we fail to get a USB reference we should just return, because
there are no more references held.

MFC after:	3 days
2014-10-03 16:09:46 +00:00