98994 Commits

Author SHA1 Message Date
Xin LI
7079d5877c MFV r268714:
Improve extreme rewind import.

When doing an "extreme rewind" import ("zpool import -XF"), we attempt
to verify all data in the pool, essentially scrubbing the entire pool.
The problem is that spa_load_verify_cb() issues an unbounded number of
concurrent scrub i/os.  This can lead to all of memory being used for
these zio's, wedging the system. Like normal scrub, we need to put a
cap on the number of outstanding i/os, and have the traverse thread
block when we reach this cap.

For this purpose the cap can be very large (10,000) to optimize the
elevator algorithm.  Three kernel tunables have been added:

	vfs.zfs.spa_load_verify_maxinflight
	vfs.zfs.spa_load_verify_metadata
	vfs.zfs.spa_load_verify_data

The latter two tunables controls whether metadata and/or user data
when doing extreme rewind.

Make 'zpool import -T' imply scrub.

Make zpool import -T <txg> accept hexadecimal values for the txg when
prefixed with 0x.

Skip txg's for which there is no uberblock when doing extreme rewind.

Skip reading all user data twice by skipping prefetches when doing
extreme rewinds as we do not access via the ARC.

Illumos issues:
  4970 need controls on i/o issued by zpool import -XF
  4971 zpool import -T should accept hex values
  4972 zpool import -T implies extreme rewind, and thus a scrub
  4973 spa_load_retry retries the same txg
  4974 spa_load_verify() reads all data twice

MFC after:	2 weeks
2014-07-15 22:44:04 +00:00
Xin LI
eb75155228 MFV r268702:
Add missing *_destroy() calls in various places with ZFS.

Illumos issue:
  4975 missing mutex_destroy() calls in zfs

MFC after:	2 weeks
2014-07-15 20:32:23 +00:00
Konstantin Belousov
a62eb1398a Followup to r268466.
- Move the code to calculate resident count into separate function.
  It reduces the indent level and makes the operation of
  vmmap_skip_res_cnt tunable more clear.
- Optimize the calculation of the resident page count for map entry.
  Skip directly to the next lowest available index and page among the
  whole shadow chain.
- Restore the use of pmap_incore(9), only to verify that current
  mapping is indeed superpage.
- Note the issue with the invalid pages.

Suggested and reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-07-15 19:57:03 +00:00
Konstantin Belousov
3760e341ca Change the calculation of the kinfo_vmentry field kve_private_resident
to reflect its name.

Noted and reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-07-15 19:49:00 +00:00
Navdeep Parhar
bae4e5af99 cxgbe(4): Display CF facility correctly in the device log.
MFC after:	3 days
2014-07-15 18:24:41 +00:00
Neel Natu
f7a9f1784f Add support for operand size and address size override prefixes in bhyve's
instruction emulation [1].

Fix bug in emulation of opcode 0x8A where the destination is a legacy high
byte register and the guest vcpu is in 32-bit mode. Prior to this change
instead of modifying %ah, %bh, %ch or %dh the emulation would end up
modifying %spl, %bpl, %sil or %dil instead.

Add support for moffsets by treating it as a 2, 4 or 8 byte immediate value
during instruction decoding.

Fix bug in verify_gla() where the linear address computed after decoding
the instruction was not being truncated to the effective address size [2].

Tested by:	Leon Dang [1]
Reported by:	Peter Grehan [2]
Sponsored by:	Nahanni Systems
2014-07-15 17:37:17 +00:00
Alan Cox
87dd8ef960 Actually set the "no execute" bit on 1 MB page mappings in pmap_protect().
Previously, the "no execute" bit was being set directly in the PTE, instead
of the local variable in which the new PTE value is being constructed.  So,
when the local variable was finally assigned to the PTE, the "no execute"
bit setting was lost.
2014-07-15 17:16:06 +00:00
John Baldwin
fae9277339 Fix build with SMP disabled.
CR:		https://phabric.freebsd.org/D407
Reviewed by:	royger
2014-07-15 15:40:33 +00:00
Konstantin Belousov
5e351014e0 Make amd64 pmap_copy_pages() functional for pages not mapped by DMAP.
Requested and reviewed by:	royger
Tested by:	pho, royger
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-07-15 09:30:43 +00:00
Alan Cox
c3c820296f Eliminate repeated calculation of next_bucket in pmap_protect() and
pmap_remove().  Eliminate an unnecessary variable from pmap_remove() and
pmap_advise().
2014-07-15 05:34:27 +00:00
Navdeep Parhar
44eb893659 Allow multi-byte reads in the private CHELSIO_T4_GET_I2C ioctl. The
firmware allows up to 48B to be read this way but the driver limits
itself to 8B at a time to remain compatible with old cxgbetool
binaries.

MFC after:	1 week
2014-07-15 01:03:29 +00:00
Mateusz Guzik
965d08605f Plug p_pptr null test in do_execve. It is always true. 2014-07-14 22:40:46 +00:00
Mateusz Guzik
c959c23740 Manage struct sigacts refcnt with atomics instead of a mutex.
MFC after:	1 week
2014-07-14 21:12:59 +00:00
Ian Lepore
0f822edead Fix the Zedboard/Zynq ethernet driver to handle media speed changes so
that it can connect to switches at speeds other than 1gb.

This requires changing the reference clock speed.  Since we still don't
have a general clock API that lets a SoC-independant driver manipulate its
own clocks, this change includes a weak reference to a routine named
cgem_set_ref_clk().  The default implementation is a no-op; SoC-specific
code can provide an implementation that actually changes the speed.

Submitted by:	Thomas Skibo <ThomasSkibo@sbcglobal.net>
2014-07-14 20:58:57 +00:00
Nathan Whitehorn
b85beee188 On my Lenovo laptop, the firmware maps the EFI framebuffer with MTRRs set
to uncacheable. This leads to execrable console performance. Once PMAP is
up, remap the framebuffer as write-combining. This reduces boot time on my
laptop by 60% when booting with EFI.

MFC after:	2 weeks
2014-07-14 17:42:22 +00:00
Alan Cox
db3ddfd672 Eliminate dead code. There is no direct map. This code was cut-and-pasted
from amd64.
2014-07-14 17:16:09 +00:00
Konstantin Belousov
4cda7f7ece Rework the tmpfs unmount.
- Suspend filesystem for unmount.  This prevents new tmpfs nodes from
  instantiating, and also ensures that only unmount thread can destroy
  nodes.

- Do not start tmpfs node deletion until all vnodes are reclaimed,
  which guarantees that no thread can access tmpfs data.  For this,
  call vflush() in the loop, until the mnt_nvnodelistsize is non-zero.
  Note that after mnt_nvnodelistsize becomes 0, insmntque() blocks
  insertion of a vnode germ into the mount list of vnodes.

- Fail node allocation when the filesystem is being unmounted.  This
  is race-free due to the vflush() call in loop.  This is mostly
  cosmetic, avoiding some more work which might be done until
  suspension in unmount is started.

Note that there is currently no way to prevent new vnode instantiation
from readers during the unmount.  Due to this, forced unmount might
live-lock if vflush() loop cannot get to the zero vnode count due to
races with readers.  The unmount would proceed after the load is
lifted.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2014-07-14 09:52:33 +00:00
Konstantin Belousov
b5b3326191 Change forgotten in r268615. Set the OBJ_TMPFS_NODE flag for
vm_object of VREG tmpfs node.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2014-07-14 09:35:14 +00:00
Konstantin Belousov
f08f7dca40 The OBJ_TMPFS flag of vm_object means that there is unreclaimed tmpfs
vnode for the tmpfs node owning this object.  The flag is currently
used for two purposes.  First, it allows to correctly handle VV_TEXT
for tmpfs vnode when the ref count on the object is decremented to 1,
similar to vnode_pager_dealloc() for regular filesystems.  Second, it
prevents some operations, which are done on OBJT_SWAP vm objects
backing user anonymous memory, but are incorrect for the object owned
by tmpfs node.

The second kind of use of the OBJ_TMPFS flag is incorrect, since the
vnode might be reclaimed, which clears the flag, but vm object
operations must still be disallowed.

Introduce one more flag, OBJ_TMPFS_NODE, which is permanently set on
the object for VREG tmpfs node, and used instead of OBJ_TMPFS to test
whether vm object collapse and similar actions should be disabled.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2014-07-14 09:30:37 +00:00
Konstantin Belousov
eb2c06b63a Use tmpfs_vn_get_ino_gen() to handle the races with reclaim in tmpfs
dotdot lookup.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2014-07-14 09:16:55 +00:00
Konstantin Belousov
fd63693dcf Style. Add comment about lock mode.
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2014-07-14 09:13:56 +00:00
Konstantin Belousov
895b3782c6 Extract the code to put a filesystem into the suspended state (at the
unmount time) in the helper vfs_write_suspend_umnt().  Use it instead
of two inline copies in FFS.

Fix the bug in the FFS unmount, when suspension failed, the ufs
extattrs were not reinitialized.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2014-07-14 09:10:00 +00:00
Konstantin Belousov
7a41bc2f41 In tmpfs_alloc_file(), code after the 'out' label does only 'return
error;'.  Replace goto's with the return.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2014-07-14 09:02:40 +00:00
Konstantin Belousov
d2ca06cdd2 Add convenience macro to assert tmpfs node lock.
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2014-07-14 08:59:25 +00:00
Konstantin Belousov
55781cb922 Add some assertions for the code handling vm_object for tmpfs vnode.
In particular, vnode must be exclusively locked when the tmpfs vnode
and object are divorced.  When the vnode is opened, the object must be
still alive, since only live vnode can be opened, and the tmpfs node
owns a reference on the object.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2014-07-14 08:55:02 +00:00
Konstantin Belousov
706f80801d The tmpfs_link() must not dereference the filesystem-specific data for
a vnode until it is verified that the vnode indeed belongs to tmpfs
mount.  Otherwise, it might access random memory, at least in the
debug kernel.

Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2014-07-14 08:45:29 +00:00
Konstantin Belousov
57ef02ff0f In kern_linkat(), avoid passing doomed vnode to the VOP.
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2014-07-14 08:41:13 +00:00
Konstantin Belousov
a69452162a Generalize vn_get_ino() to allow filesystems to use custom vnode
producer, instead of hard-coding VFS_VGET().  New function, which
takes callback, is called vn_get_ino_gen(), standard callback for
vn_get_ino() is provided.

Convert inline copies of vn_get_ino() in msdosfs and cd9660 into the
uses of vn_get_ino_gen().

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2014-07-14 08:34:54 +00:00
Konstantin Belousov
fca015d301 Remove code separator lines which do not conform to style(9).
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2014-07-14 08:17:11 +00:00
Kevin Lo
cb7df69b7e Make bind(2) and connect(2) return EAFNOSUPPORT for AF_UNIX on wrong
address family.

See https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=191586 for the
original discussion.

Reviewed by:	terry
2014-07-14 06:00:01 +00:00
Mark Johnston
291624fdf6 Invoke the DTrace trap handler before calling trap() on amd64. This matches
the upstream implementation and helps ensure that a trap induced by tracing
fbt::trap:entry is handled without recursively generating another trap.

This makes it possible to run most (but not all) of the DTrace tests under
common/safety/ without triggering a kernel panic.

Submitted by:	Anton Rang <anton.rang@isilon.com> (original version)
Phabric:	D95
2014-07-14 04:38:17 +00:00
Alan Cox
f26bcf99e0 Eliminate an unused variable. Refresh two comments. 2014-07-13 17:52:07 +00:00
Alan Cox
a844c68fc2 Implement pmap_unwire(). See r268327 for the motivation behind this change. 2014-07-13 16:27:57 +00:00
Mark Johnston
05929f8bd0 Add a headphone redirection quirk for the Lenovo G580.
MFC after:	1 week
2014-07-13 10:31:29 +00:00
Hans Petter Selasky
817a8cac2e Turn off blinking device leds at attach.
MFC after:	3 days
PR:		183735
2014-07-13 09:34:59 +00:00
Hans Petter Selasky
948d799e27 Fix performance problems with AXGE network adapter in RX direction:
- Remove 4 extra bytes from the ethernet payload.
- The maximum RX buffer was incorrectly set. Increase it to 64K for
now, until the exact limit is understood.
- Enable hardware checksumming again.
- Make hardware data structure packed.

MFC after:	3 days
2014-07-13 07:39:28 +00:00
Alexander Motin
4d877c4148 Merge several equal serialization indexes. 2014-07-13 06:01:23 +00:00
Mateusz Guzik
8bedd5d782 Clear nonblock and async on devctl close instaed of open.
This is a purely cosmetic change.
2014-07-12 15:35:04 +00:00
Rui Paulo
efce3748f3 Revert r268543.
We should probably fix sys/gpio.h instead.
2014-07-12 06:23:42 +00:00
Adrian Chadd
c7c0d94874 Add IPv6 flowid, bindmulti and RSS awareness. 2014-07-12 05:46:33 +00:00
Adrian Chadd
a8a2d8003a Add INP_RSS_BUCKET_SET awareness for IPv6 pcbgroup entries.
This ensures that a listen socket with INP_RSS_BUCKET_SET set will use
the pre-determined PCBGROUP rather than what the hashing path chooses.
2014-07-12 05:45:53 +00:00
Adrian Chadd
6e4405cee1 Add the IPv6 versions of the multi-bind, hash/hash type and RSS options. 2014-07-12 05:44:16 +00:00
Adrian Chadd
e989b65f79 Add RSS hashing awareness for IPv6 and TCP IPv6 hash types. 2014-07-12 05:43:43 +00:00
Adrian Chadd
76e63232b6 Add some hash types for UDP RSS for both IPv4 and IPv6.
Nothing is yet using this but I'd like to reserve these values.
2014-07-12 05:42:57 +00:00
Adrian Chadd
d5bb8bd315 Expose in_pcbbind_check_bindmulti() so the upcoming IPv6 RSS changes
can be made to use it.
2014-07-12 05:40:13 +00:00
Rui Paulo
bd08cbb81a Move iic.h to sys/ so that it's automatically installed in /usr/include/sys.
This lets us call iic(4) ioctls without needing the kernel source code
and follows the same model of GPIO.

MFC after:	3 weeks
2014-07-12 01:04:10 +00:00
Rui Paulo
0f912c5a5d Remove _DTRACE_VERSION from sdt.h. It will now come from the command line
(bsd.dep.mk).

MFC after:	3 weeks
2014-07-12 00:57:00 +00:00
Michael Tuexen
0c8682e8ad Whitespace changes.
MFC after: 1 week
2014-07-11 21:15:40 +00:00
Navdeep Parhar
30f337891d cxgbe(4): Add an iSCSI softc to the adapter structure. 2014-07-11 21:02:54 +00:00
Gleb Smirnoff
1fbe6a82f4 Improve reference counting of EXT_SFBUF pages attached to mbufs.
o Do not use UMA refcount zone. The problem with this zone is that
  several refcounting words (16 on amd64) share the same cache line,
  and issueing atomic(9) updates on them creates cache line contention.
  Also, allocating and freeing them is extra CPU cycles.
  Instead, refcount the page directly via vm_page_wire() and the sfbuf
  via sf_buf_alloc(sf_buf_page(sf)) [1].

o Call refcounting/freeing function for EXT_SFBUF via direct function
  call, instead of function pointer. This removes barrier for CPU
  branch predictor.

o Do not cleanup the mbuf to be freed in mb_free_ext(), merely to
  satisfy assertion in mb_dtor_mbuf(). Remove the assertion from
  mb_dtor_mbuf(). Use bcopy() instead of manual assignments to
  copy m_ext in mb_dupcl().

[1] This has some problems for now. Using sf_buf_alloc() merely to
    increase refcount is expensive, and is broken on sparc64. To be
    fixed.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-07-11 19:40:50 +00:00