Commit Graph

98945 Commits

Author SHA1 Message Date
Gleb Smirnoff
1fbe6a82f4 Improve reference counting of EXT_SFBUF pages attached to mbufs.
o Do not use UMA refcount zone. The problem with this zone is that
  several refcounting words (16 on amd64) share the same cache line,
  and issueing atomic(9) updates on them creates cache line contention.
  Also, allocating and freeing them is extra CPU cycles.
  Instead, refcount the page directly via vm_page_wire() and the sfbuf
  via sf_buf_alloc(sf_buf_page(sf)) [1].

o Call refcounting/freeing function for EXT_SFBUF via direct function
  call, instead of function pointer. This removes barrier for CPU
  branch predictor.

o Do not cleanup the mbuf to be freed in mb_free_ext(), merely to
  satisfy assertion in mb_dtor_mbuf(). Remove the assertion from
  mb_dtor_mbuf(). Use bcopy() instead of manual assignments to
  copy m_ext in mb_dupcl().

[1] This has some problems for now. Using sf_buf_alloc() merely to
    increase refcount is expensive, and is broken on sparc64. To be
    fixed.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-07-11 19:40:50 +00:00
Michael Tuexen
f64a0b069a Bugfix: When a remote address was added to an endpoint,
a source address was selected and cached, but it was not
stored that is was cached. This resulted in selecting
different source addresses for the INIT-ACK and COOKIE-ACK
when possible.
Thanks to Niu Zhixiong for reporting the issue.

MFC after: 1 week
2014-07-11 17:31:40 +00:00
Cy Schubert
42834773b3 Remove redundant USE_INET6 test that enables INET6 in the ipfilter userland
regardless of the setting in make.conf.

PR:		190964
Approved by:	glebius (mentor)
MFC after:	1 week
2014-07-11 16:26:51 +00:00
Gleb Smirnoff
fcc34a238c Fix style bug: rename the refcount field of m_ext to ext_cnt, to match
other members.

Sponsored by:	Nginx, Inc.
2014-07-11 14:34:29 +00:00
Gleb Smirnoff
15c28f87b8 All mbuf external free functions never fail, so let them be void.
Sponsored by:	Nginx, Inc.
2014-07-11 13:58:48 +00:00
Michael Tuexen
4474d71a7b Integrate upstream changes.
MFC after: 1 week
2014-07-11 06:52:48 +00:00
Andrey V. Elsukov
ff899182ec Fix condition.
Sponsored by:	Yandex LLC
2014-07-11 06:34:15 +00:00
Neel Natu
3ada6e07ac Use the correct offset when converting a logical address (segment:offset)
to a linear address.
2014-07-11 01:23:38 +00:00
Mateusz Guzik
88f98985aa Eliminate plim and vtmp local vars in exit1.
No functional changes.

MFC after:	1 week
2014-07-10 22:54:38 +00:00
Mateusz Guzik
30d58d6b39 Don't make a temporary copy of fixed sysctl strings. 2014-07-10 21:46:57 +00:00
Mateusz Guzik
b23c40d7b1 Don't zero fd_nfiles during fdp destruction.
Code trying to take a look has to check fd_refcnt and it is 0 by that time.

This is a follow up to r268505, without this the code would leak memory for
tables bigger than the default.

MFC after:	1 week
2014-07-10 21:05:45 +00:00
Mateusz Guzik
e518baf8f9 Avoid relocking filedesc lock when closing fds during fdp destruction.
Don't call bzero nor fdunused from fdfree for such cases. It would do
unnecessary work and complain that the lock is not taken.

MFC after:	1 week
2014-07-10 20:59:54 +00:00
Alan Cox
bfc30490a7 Correct the accounting code for wired mappings. The wrong field of the PVO
entry was being tested.  We were incrementing and decrementing the pmap's
wired mapping count based on whether the physical page being mapped or
unmapped was cache coherent, not whether it was a wired mapping.

Reviewed by:	nwhitehorn
2014-07-10 20:55:38 +00:00
Mark Johnston
58e6549541 Correct the setting of the VID in transmit descriptors when hardware VLAN
tagging is enabled. This was broken in r266978.

Reported by:	gjb
Tested by:	gjb
2014-07-10 16:46:46 +00:00
Ian Lepore
8d99c2a062 Pending interrupt status is cleared by writing to the ISR, not the data reg.
MFC after:	1 week
2014-07-10 14:06:18 +00:00
Pietro Cerutti
7150b86bfe Implement Short/Small String Optimization in SBUF(9) and change lengths and
positions in the API from ssize_t and int to size_t.

CR:		D388
Approved by:	des, bapt
2014-07-10 13:08:51 +00:00
Gleb Smirnoff
8ff2bd98d6 On machines with strict alignment copy pfsync_state_key from packet
on stack to avoid unaligned access.

PR:		187381
Submitted by:	Lytochkin Boris <lytboris gmail.com>
2014-07-10 12:41:58 +00:00
Konstantin Belousov
479fcb4e32 Unconditionally initialize addr to handle the case of changed map
timestamp while the map is unlocked.

Reported by:	bz
Sponsored by:	The FreeBSD Foundation
MFC after:	6 days
2014-07-10 11:20:24 +00:00
Kevin Lo
b11ce478cf Enable 8051 before downloading firmware.
Tested by:	Carlos Jacobo Puga Medina <cpm at fbsd dot es>
2014-07-10 09:42:34 +00:00
Bryan Venteicher
32487a8973 Rework when the Tx queue completion interrupt is enabled
The Tx interrupt is now kept disabled in the common case, only
enabled when the number of free descriptors in the queue falls
below a threshold. Transmitted frames are cleared from the VQ
before subsequent transmit, or in the watchdog timer.

This was a very big performance improvement for an experimental
Netmap bhyve backend.

MFC after:	1 month
2014-07-10 05:36:04 +00:00
Bryan Venteicher
4b59668f0e Add accessor to get the number of free descriptors in the virtqueue
MFC after:	1 month
2014-07-10 05:26:01 +00:00
Adrian Chadd
0a100a6f1e Implement the first stage of multi-bind listen sockets and RSS socket
awareness.

* Introduce IP_BINDMULTI - indicating that it's okay to bind multiple
  sockets on the same bind details.

  Although the PCB code has been taught about this (see below) this patch
  doesn't introduce the rest of the PCB changes necessary to distribute
  lookups among multiple PCB entries in the global wildcard table.

* Introduce IP_RSS_LISTEN_BUCKET - placing an listen socket into the
  given RSS bucket (and thus a single PCBGROUP hash.)

* Modify the PCB add path to be aware of IP_BINDMULTI:
  + Only allow further PCB entries to be added if the owner credentials
    and IP_BINDMULTI has been specified.  Ie, only allow further
    IP_BINDMULTI sockets to appear if the first bind() was IP_BINDMULTI.

* Teach the PCBGROUP code about IP_RSS_LISTE_BUCKET marked PCB entries.
  Instead of using the wildcard logic and hashing, these sockets are
  simply placed into the PCBGROUP and _not_ in the wildcard hash.

* When doing a PCBGROUP lookup, also do a wildcard match as well.
  This allows for an RSS bucket PCB entry to appear in a PCBGROUP
  rather than having to exist in the wildcard list.

Tested:

* TCP IPv4 server testing with igb(4)
* TCP IPv4 server testing with ix(4)

TODO:

* The pcbgroup lookup code duplicated the wildcard and wildcard-PCB
  logic.  This could be refactored into a single function.

* This doesn't yet work for IPv6 (The PCBGROUP code in netinet6/ doesn't
  yet know about this); nor does it yet fully work for UDP.
2014-07-10 03:10:56 +00:00
Warner Losh
aa0b5651c1 Compile boot2 with clang on pc98. 2014-07-10 00:15:50 +00:00
Warner Losh
53dda6a8d5 Make SERIAL support optional again. Enable it for i386 because a huge
percentage of machines has a 16550. Disable it for pc98 since only a
tiny fraction of them have one. These changes save 293 bytes when
building with clang, but preserves the ability to build with serial if
you really want.  We now have 92 bytes free (412 with the in-tree gcc).
2014-07-10 00:15:42 +00:00
Warner Losh
522d68a17f Merge the clang support from i386. Don't move to clang yet. 2014-07-10 00:15:38 +00:00
Xin LI
1b174fa1eb MFV r268455:
Use reserved space for ZFS administrative commands.

We reserve 1/2^spa_slop_shift = 1/32 or 3.125% of pool space (or 32MB at
least) for system use.  Most ZPL operations, e.g. write(2), creat(2), will
fail with ENOSPC if we fall below this.

Certain operations, e.g. file removal and most administrative actions,
still permitted until half of the slop space is used.  This would allow
users to use these operations to free up space in the pool when pool is
close to full but half of slop space is still free.

A very restricted set of operations that frees up space or change quota
are always permitted, regardless of the amount of free space.

MFC after:	 2 weeks
2014-07-09 23:14:59 +00:00
Aleksandr Rybalko
79b647995d Should check fb_read method presence instead of double check for fb_write.
Pointed by:     emaste

Sponsored by:	The FreeBSD Foundation
2014-07-09 21:55:34 +00:00
Konstantin Belousov
fd815c0b8d For safety, ensure that any consumer of the set_regs() and
ptrace_set_pc() use the correct return to userspace using iret.

The signal return, PT_CONTINUE (which in fact uses signal return path)
set the pcb flag already.  The setcontext(2) enforces iret return when
%rip is incorrect.  Due to this, the change is redundand, but is made
to ensure that no path which modifies context, forgets to set
PCB_FULL_IRET.

Inspired by:	CVE-2014-4699
Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-07-09 21:39:40 +00:00
Konstantin Belousov
a91831a261 Current code in sysctl proc.vmmap, which intent is to calculate the
amount of resident pages, in fact calculates the amount of installed
pte entries in the region.  Resident pages which were not soft-faulted
yet are not counted.

Calculate the amount of resident pages by looking in the objects chain
backing the region.

Add a knob to disable the residency calculation at all.  For large
sparce regions, either previous or updated algorithm runs for too long
time, while several introspection tools do not need the (advisory) RSS
value at all.

PR:	kern/188911
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-07-09 19:11:57 +00:00
Xin LI
fdc0ee2cf5 MFV r268452:
Explicitly mark file removal transactions as "presumed to result
in a net free of space" so they will not fail with ENOSPC.

Illumos issue:	4950 files sometimes can't be removed from a full
		filesystem
MFC after:	2 weeks
2014-07-09 18:32:40 +00:00
Aleksandr Rybalko
97f3c4e8a4 Fix inconsistent token parameters for kbd_allocate() and kbd_release() in vt(4).
PR:		191306
Submitted by:	jau789@gmail.com
Sponsored by:	The FreeBSD Foundation
2014-07-09 14:36:03 +00:00
Roger Pau Monné
38d6b2dcb2 vm_phys: remove limitation on number of fictitious regions
The number of vm fictitious regions was limited to 8 by default, but
Xen will make heavy usage of those kind of regions in order to map
memory from foreign domains, so instead of increasing the default
number, change the implementation to use a red-black tree to track vm
fictitious ranges.

The public interface remains the same.

Sponsored by: Citrix Systems R&D
Reviewed by: kib, alc
Approved by: gibbs

vm/vm_phys.c:
 - Replace the vm fictitious static array with a red-black tree.
 - Use a rwlock instead of a mutex, since now we also need to take the
   lock in vm_phys_fictitious_to_vm_page, and it can be shared.
2014-07-09 08:12:58 +00:00
Gleb Smirnoff
fe82cbe85c In several cases in ip_output() we obtain reference on ifa. Do not
leak it.

Together with:	asomers, np
Sponsored by:	Nginx, Inc.
2014-07-09 07:48:05 +00:00
Alexander Motin
409a3c1383 Add LUN options to specify 64-bit EUI and NAA identifiers. 2014-07-09 04:37:50 +00:00
Peter Wemm
ba8cd08ba9 Bump __FreeBSD_version after last SA-14:17.kmem so we have something
to test against in the freebsd.org cluster.
2014-07-09 00:12:05 +00:00
Xin LI
e432298ade Initialize SCTP cmsg's and notification's buffer before copying out
to userland.

Submitted by:	tuexen
Security:	CVE-2014-3953
Security:	FreeBSD-SA-14:17.kmem
2014-07-08 21:54:27 +00:00
Xin LI
2827952eb4 Don't leave the padding between the msg header and the cmsg data,
and the padding after the cmsg data un-initialized.

Submitted by:	tuexen
Security:	CVE-2014-3952
Security:	FreeBSD-SA-14:17.kmem
2014-07-08 21:54:23 +00:00
Neel Natu
b301b9e28f Accurately identify the vcpu's operating mode as 64-bit, compatibility,
protected or real.
2014-07-08 21:48:57 +00:00
Neel Natu
3527963b26 Invalidate guest TLB mappings as a side-effect of its CR3 being updated.
This is a pre-requisite for task switch emulation since the CR3 is loaded
from the new TSS.
2014-07-08 20:51:03 +00:00
Alexander Motin
3120a49e50 Remove status setting from datamove() path. Leave that to other places. 2014-07-08 18:51:03 +00:00
Alexander Motin
e327a057a7 Remove IO_SYNC flag when writing extended file attributes on ZFS.
While it is possible to create and write file, modify its permissions, etc.
without ever doing sync, it looks odd that it is required for setting
extended file attributes on ZFS.  UFS does not do sync there too.

Samba uses those extended attributes to store some its data, and doing it
synchronously by many times reduces file creation performance for systems
without SLOG device.

Reviewed by:	delphij, jpaetzel, silence on fs@
MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2014-07-08 17:26:08 +00:00
Alexander Motin
ad3cd840f2 Fix use-after-free on XPT_RESET_BUS.
That command is not queued, so does not use later status update.
2014-07-08 16:56:21 +00:00
Alexander Motin
b33b96e352 Enable TAS feature: notify initiator if its command was aborted by other.
That should make operation more kind to multi-initiator environment.
Without this, other initiators may find out that something bad happened
to their commands only via command timeout.
2014-07-08 16:38:05 +00:00
Ian Lepore
1e3d53c687 Use named constant rather than '0' to access the reset controller register. 2014-07-08 14:35:09 +00:00
Alexander Motin
d6205772da Fix typo in r267873. 2014-07-08 13:28:37 +00:00
Alexander Motin
950b6e126b Pass correct command that should be aborted to ISPCTL_ABORT_CMD.
This makes XPT_ABORT to work for me on initiator side of isp(4).
Previous code was trying to abort the XPT_ABORT itself and failed.

MFC after:	1 week
2014-07-08 13:01:36 +00:00
Alexander Motin
6c7a99be9f Do not return statuses for aborted iSCSI commands. 2014-07-08 12:16:28 +00:00
Alexander Motin
f5ffef352f Return task management requests to queued execution, but differently.
Testing shown that both original queued design with separate task queue,
and recent direct execution design had significant flaw: If abort request
arrives just after the victim, the last one may not be in the ooa_queue
yet, and so invisible for the task management function.

Unlike original queued implementation, use same queue for all SCSI and
TASK requests from the same initiator. That avoids races between them:
task functions are always executed in proper time, relatively to other
requests.
2014-07-08 12:15:15 +00:00
Alexander Motin
aa75114c57 Add XPT_ABORT support to iSCSI initiator.
While CAM does not use it normally, it is useful for targets testing.

MFC after:	2 weeks
2014-07-08 09:37:41 +00:00
Alexander Motin
fdfc6c8ebd Fix task management functions status: task not found is not an error,
while not implemented function is.
2014-07-08 08:34:34 +00:00