vnode for the tmpfs node owning this object. The flag is currently
used for two purposes. First, it allows to correctly handle VV_TEXT
for tmpfs vnode when the ref count on the object is decremented to 1,
similar to vnode_pager_dealloc() for regular filesystems. Second, it
prevents some operations, which are done on OBJT_SWAP vm objects
backing user anonymous memory, but are incorrect for the object owned
by tmpfs node.
The second kind of use of the OBJ_TMPFS flag is incorrect, since the
vnode might be reclaimed, which clears the flag, but vm object
operations must still be disallowed.
Introduce one more flag, OBJ_TMPFS_NODE, which is permanently set on
the object for VREG tmpfs node, and used instead of OBJ_TMPFS to test
whether vm object collapse and similar actions should be disabled.
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
unmount time) in the helper vfs_write_suspend_umnt(). Use it instead
of two inline copies in FFS.
Fix the bug in the FFS unmount, when suspension failed, the ufs
extattrs were not reinitialized.
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
In particular, vnode must be exclusively locked when the tmpfs vnode
and object are divorced. When the vnode is opened, the object must be
still alive, since only live vnode can be opened, and the tmpfs node
owns a reference on the object.
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
a vnode until it is verified that the vnode indeed belongs to tmpfs
mount. Otherwise, it might access random memory, at least in the
debug kernel.
Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
producer, instead of hard-coding VFS_VGET(). New function, which
takes callback, is called vn_get_ino_gen(), standard callback for
vn_get_ino() is provided.
Convert inline copies of vn_get_ino() in msdosfs and cd9660 into the
uses of vn_get_ino_gen().
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
the upstream implementation and helps ensure that a trap induced by tracing
fbt::trap:entry is handled without recursively generating another trap.
This makes it possible to run most (but not all) of the DTrace tests under
common/safety/ without triggering a kernel panic.
Submitted by: Anton Rang <anton.rang@isilon.com> (original version)
Phabric: D95
- Remove 4 extra bytes from the ethernet payload.
- The maximum RX buffer was incorrectly set. Increase it to 64K for
now, until the exact limit is understood.
- Enable hardware checksumming again.
- Make hardware data structure packed.
MFC after: 3 days
o Do not use UMA refcount zone. The problem with this zone is that
several refcounting words (16 on amd64) share the same cache line,
and issueing atomic(9) updates on them creates cache line contention.
Also, allocating and freeing them is extra CPU cycles.
Instead, refcount the page directly via vm_page_wire() and the sfbuf
via sf_buf_alloc(sf_buf_page(sf)) [1].
o Call refcounting/freeing function for EXT_SFBUF via direct function
call, instead of function pointer. This removes barrier for CPU
branch predictor.
o Do not cleanup the mbuf to be freed in mb_free_ext(), merely to
satisfy assertion in mb_dtor_mbuf(). Remove the assertion from
mb_dtor_mbuf(). Use bcopy() instead of manual assignments to
copy m_ext in mb_dupcl().
[1] This has some problems for now. Using sf_buf_alloc() merely to
increase refcount is expensive, and is broken on sparc64. To be
fixed.
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
a source address was selected and cached, but it was not
stored that is was cached. This resulted in selecting
different source addresses for the INIT-ACK and COOKIE-ACK
when possible.
Thanks to Niu Zhixiong for reporting the issue.
MFC after: 1 week
Code trying to take a look has to check fd_refcnt and it is 0 by that time.
This is a follow up to r268505, without this the code would leak memory for
tables bigger than the default.
MFC after: 1 week
entry was being tested. We were incrementing and decrementing the pmap's
wired mapping count based on whether the physical page being mapped or
unmapped was cache coherent, not whether it was a wired mapping.
Reviewed by: nwhitehorn
The Tx interrupt is now kept disabled in the common case, only
enabled when the number of free descriptors in the queue falls
below a threshold. Transmitted frames are cleared from the VQ
before subsequent transmit, or in the watchdog timer.
This was a very big performance improvement for an experimental
Netmap bhyve backend.
MFC after: 1 month
awareness.
* Introduce IP_BINDMULTI - indicating that it's okay to bind multiple
sockets on the same bind details.
Although the PCB code has been taught about this (see below) this patch
doesn't introduce the rest of the PCB changes necessary to distribute
lookups among multiple PCB entries in the global wildcard table.
* Introduce IP_RSS_LISTEN_BUCKET - placing an listen socket into the
given RSS bucket (and thus a single PCBGROUP hash.)
* Modify the PCB add path to be aware of IP_BINDMULTI:
+ Only allow further PCB entries to be added if the owner credentials
and IP_BINDMULTI has been specified. Ie, only allow further
IP_BINDMULTI sockets to appear if the first bind() was IP_BINDMULTI.
* Teach the PCBGROUP code about IP_RSS_LISTE_BUCKET marked PCB entries.
Instead of using the wildcard logic and hashing, these sockets are
simply placed into the PCBGROUP and _not_ in the wildcard hash.
* When doing a PCBGROUP lookup, also do a wildcard match as well.
This allows for an RSS bucket PCB entry to appear in a PCBGROUP
rather than having to exist in the wildcard list.
Tested:
* TCP IPv4 server testing with igb(4)
* TCP IPv4 server testing with ix(4)
TODO:
* The pcbgroup lookup code duplicated the wildcard and wildcard-PCB
logic. This could be refactored into a single function.
* This doesn't yet work for IPv6 (The PCBGROUP code in netinet6/ doesn't
yet know about this); nor does it yet fully work for UDP.
percentage of machines has a 16550. Disable it for pc98 since only a
tiny fraction of them have one. These changes save 293 bytes when
building with clang, but preserves the ability to build with serial if
you really want. We now have 92 bytes free (412 with the in-tree gcc).
Use reserved space for ZFS administrative commands.
We reserve 1/2^spa_slop_shift = 1/32 or 3.125% of pool space (or 32MB at
least) for system use. Most ZPL operations, e.g. write(2), creat(2), will
fail with ENOSPC if we fall below this.
Certain operations, e.g. file removal and most administrative actions,
still permitted until half of the slop space is used. This would allow
users to use these operations to free up space in the pool when pool is
close to full but half of slop space is still free.
A very restricted set of operations that frees up space or change quota
are always permitted, regardless of the amount of free space.
MFC after: 2 weeks
ptrace_set_pc() use the correct return to userspace using iret.
The signal return, PT_CONTINUE (which in fact uses signal return path)
set the pcb flag already. The setcontext(2) enforces iret return when
%rip is incorrect. Due to this, the change is redundand, but is made
to ensure that no path which modifies context, forgets to set
PCB_FULL_IRET.
Inspired by: CVE-2014-4699
Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
amount of resident pages, in fact calculates the amount of installed
pte entries in the region. Resident pages which were not soft-faulted
yet are not counted.
Calculate the amount of resident pages by looking in the objects chain
backing the region.
Add a knob to disable the residency calculation at all. For large
sparce regions, either previous or updated algorithm runs for too long
time, while several introspection tools do not need the (advisory) RSS
value at all.
PR: kern/188911
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Explicitly mark file removal transactions as "presumed to result
in a net free of space" so they will not fail with ENOSPC.
Illumos issue: 4950 files sometimes can't be removed from a full
filesystem
MFC after: 2 weeks
The number of vm fictitious regions was limited to 8 by default, but
Xen will make heavy usage of those kind of regions in order to map
memory from foreign domains, so instead of increasing the default
number, change the implementation to use a red-black tree to track vm
fictitious ranges.
The public interface remains the same.
Sponsored by: Citrix Systems R&D
Reviewed by: kib, alc
Approved by: gibbs
vm/vm_phys.c:
- Replace the vm fictitious static array with a red-black tree.
- Use a rwlock instead of a mutex, since now we also need to take the
lock in vm_phys_fictitious_to_vm_page, and it can be shared.
While it is possible to create and write file, modify its permissions, etc.
without ever doing sync, it looks odd that it is required for setting
extended file attributes on ZFS. UFS does not do sync there too.
Samba uses those extended attributes to store some its data, and doing it
synchronously by many times reduces file creation performance for systems
without SLOG device.
Reviewed by: delphij, jpaetzel, silence on fs@
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
That should make operation more kind to multi-initiator environment.
Without this, other initiators may find out that something bad happened
to their commands only via command timeout.
Testing shown that both original queued design with separate task queue,
and recent direct execution design had significant flaw: If abort request
arrives just after the victim, the last one may not be in the ooa_queue
yet, and so invisible for the task management function.
Unlike original queued implementation, use same queue for all SCSI and
TASK requests from the same initiator. That avoids races between them:
task functions are always executed in proper time, relatively to other
requests.
tools/regression/file/flock/flock.c, which completes the fix in
r192685. When the lock was stolen from us, retry the whole lock
sequence in kernel, instead of returning EINTR to usermode and hoping
that application would handle it correctly by restarting the lock
acquire.
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
If port passed negative IID value, the function will try to allocate IID
from the pool of unused, based on passed wwpn or name arguments. It does
all its best to make IID unique and persistent across reconnects.
This makes persistent reservation properly work for iSCSI. Previously,
in case of reconnects, reservation could be unexpectedly lost, or even
migrate between intiators.
- add missing rcvif in mbuf
- add missing ipacket stat
- remove uncessary mbuf copy on output path
- fix deadlock of the TX engine in case of error
Obtained from: NETASQ
MFC after: 2 weeks
This includes:
o All directories named *ia64*
o All files named *ia64*
o All ia64-specific code guarded by __ia64__
o All ia64-specific makefile logic
o Mention of ia64 in comments and documentation
This excludes:
o Everything under contrib/
o Everything under crypto/
o sys/xen/interface
o sys/sys/elf_common.h
Discussed at: BSDcan
teardown, and new port creation during `service ctld restart`.
Close it by returning iSCSI port internal state, that allows to identify
dying ports, which should not be counted as existing, from really alive.
several reasons for this change:
pmap_change_wiring() has never (in my memory) been used to set the wired
attribute on a virtual page. We have always used pmap_enter() to do that.
Moreover, it is not really safe to use pmap_change_wiring() to set the wired
attribute on a virtual page. The description of pmap_change_wiring() says
that it assumes the existence of a mapping in the pmap. However, non-wired
mappings may be reclaimed by the pmap at any time. (See pmap_collect().)
Many implementations of pmap_change_wiring() will crash if the mapping does
not exist.
pmap_unwire() accepts a range of virtual addresses, whereas
pmap_change_wiring() acts upon a single virtual page. Since we are
typically unwiring a range of virtual addresses, pmap_unwire() will be more
efficient. Moreover, pmap_unwire() allows us to unwire superpage mappings.
Previously, we were forced to demote the superpage mapping, because
pmap_change_wiring() only allowed us to express the unwiring of a single
base page mapping at a time. This added to the overhead of unwiring for
large ranges of addresses, including the implicit unwiring that occurs at
process termination.
Implementations for arm and powerpc will follow.
Discussed with: jeff, marcel
Reviewed by: kib
Sponsored by: EMC / Isilon Storage Division
Previously ISID was changed every time, that made impossible correct
persistent reservation, because reconnected session was identified as
completely new one.
Reviewed by: trasz
MFC after: 1 week
prior to starting "/sbin/init" which will run all the "/etc/rc.d/xxx"
scripts. Else there can be a race configuring the interfaces via
"/etc/rc.conf".
MFC after: 4 weeks
Sponsored by: Mellanox Technologies
Instead make ports provide wanted port and target IDs, and LUNs provide
wanted LUN IDs. After that core Device ID VPD code only had to link all
of them together and add relative port and port group numbers.
LUN ID for iSCSI LUNs no longer created by CTL, but by ctld, and passed
to CTL as "scsiname" LUN option. This makes LUNs to report the same set
of IDs, independently from the port through which it is accessed, as
required by SCSI specifications.
Having single port for all iSCSI connections makes problematic implementing
some more advanced SCSI functionality in CTL, that require proper ports
enumeration and identification.
This change extends CTL iSCSI API, making ctld daemon to control list of
iSCSI ports in CTL. When new target is defined in config fine, ctld will
create respective port in CTL. When target is removed -- port will be
also removed after all active commands through that port properly aborted.
This change require ctld to be rebuilt to match the kernel.
As a minor side effect, this allows to have iSCSI targets without LUNs.
While that may look odd and not very useful, that is not incorrect.
6679140 asymmetric alloc/dealloc activity can induce dynamic variable drops
6679193 dtrace_dynvar walker produces flood of dtrace_dynhash_sink
This finishes a set of merges from the older OpenSolaris releases.
Still the FreeBSD port has many differences that are difficult to
account for but that seems normal given that the kernels are different.
MFC after: 1 week
getenv_xxx() functions instead of strtoq(), because the getenv_xxx()
functions include wrappers for various postfixes like G/M/K, which
strtoq() doesn't do.
the reply to ReaddirPlus when the server failed within the loop
that calls VFS_VGET(). This failure is most likely an error
return from VFS_VGET() caused by a bogus d_fileno that was
truncated to 32bits.
This patch fixes the server so that it will return directory postop
attributes for the failure. It does not fix the underlying issue caused
by d_fileno being uint32_t when a file system like ZFS generates
a fileno that is greater than 32bits.
Reported by: jpaetzel
Reviewed by: jpaetzel
MFC after: 1 month
Before iSCSI implementation CTL had no knowledge about frontend drivers,
it had only frontends, which really were ports (alike to LUNs, if comparing
to backends). But iSCSI added there ioctl() method, which does not belong
to frontend as a port, but belongs to a frontend driver.
partitions of types other than "freebsd-boot" (in particular, "efi").
This allows the removal of some nasty hacks for supporting PowerPC systems,
in particular aliasing freebsd-boot to apple-boot on APM and an IBM-specific
code on MBR.
This changes the installer to use the correct names, which also breaks a
degeneracy in the meaning of "freebsd-boot" that allows the addition
of support for some newer IBM systems that can boot from GPT in addition to
MBR. Since I have no idea how to detect which those systems are, leave
the default on IBM PPC systems as MBR for now.
console's ability to enter the debugger.... rwatson forgot to document
this when he changed it back in 2011... There is more docs to write
about this, but at least fix this for now...
Reviewed by: emaste
MFC after: 1 week
camcontrol(8) now supports a new 'persist' subcommand that allows users to
issue SCSI PERSISTENT RESERVE IN / OUT commands.
sbin/camcontrol/Makefile:
Add persist.c.
sbin/camcontrol/persist.c:
New persistent reservation support for camcontrol(8).
We have support for all known operation modes for PERSISTENT RESERVE
IN and PERSISTENT RESERVE OUT.
exceptions noted above.
sbin/camcontrol/camcontrol.8:
Document the new 'persist' subcommand.
In the section on the Transport ID (-I) option, explain what
Transport IDs for each protocol should look like. At some point
some of this information could probably get moved off in a
separate man page, either on Transport IDs alone or a man page
documenting the Transport ID parsing code.
Add a number of examples of persistent reservation commands.
Persistent Reservations are complex enough that the average user
probably won't be able to get the commands exactly right by just
reading the man page. These examples show a few basic and
advanced examples of how to use persistent reservations.
sbin/camcontrol/camcontrol.h:
Move the definition for camcontrol_optret here, so we can use it
for the persistent reservation code.
Add a definition for the new scsipersist() function.
sbin/camcontrol/camcontrol.c:
Add 'persist' to the list of subcommands.
Document 'persist' in the help text.
sys/cam/scsi/scsi_all.c:
Add the scsi_persistent_reserve_in() and
scsi_persistent_reserve_out() CCB building functions.
Add a new function, scsi_transportid_sbuf(). This takes a
SCSI Transport ID (documented in SPC-4), and prints it to
an sbuf(9). There are some transports (like ATA, USB, and
SSA) for which there is no transport defined. We need to
come up with a reasonable thing to do if we're presented
with a Transport ID that claims to be for one of those
protocols.
Add new routines scsi_get_nv() and scsi_nv_to_str().
These functions do a table lookup to go between a string and an
integer. There are lots of table lookups needed in the
persistent reservation code in camcontrol(8).
Add a new function, scsi_parse_transportid(), along with leaf node
functions to parse:
FC, 1394 and SAS (scsi_parse_transportid_64bit())
iSCSI (scsi_parse_transportid_iscsi())
SPI (scsi_parse_transportid_spi())
RDMA (scsi_parse_transportid_rdma())
PCIe (scsi_parse_transportid_sop())
Transport IDs. Given a string with the general form proto,id these
functions create a SCSI Transport ID structure.
sys/cam/scsi/scsi_all.h:
Update the various persistent reservation data structures to
SPC4r36l, but also rename some fields that were previously
obsolete with the proper names from older SCSI specs. This
allows using older, obsolete persistent reservation types when
desired.
Add function prototypes for the new persistent reservation CCB
building functions.
Add a data strucure for the READ FULL STATUS service action
of the PERSISTENT RESERVE IN command.
Add Transport ID structures for all protocols described in SPC-4.
Add a new series of SCSI_PROTO_XXX definitions, and
redefine other defines in terms of these new definitions.
Add a prototype for scsi_transportid_sbuf().
Change a couple of "obsolete" persistent reservation data
structure fields into something more meaningful, based on
what the field was called when it was defined in the spec.
(e.g. SPC, SPC-2, etc.)
Create a new define, SPRI_MAX_LEN, for the maximum allocation
length allowed for the PERSISTENT RESERVE IN command.
Add data structures and enumerations for the new name/value
translation functions.
Add data structures for SCSI over PCIe Routing IDs.
Bring the PERSISTENT RESERVE OUT Register and Move parameter list
structure (struct scsi_per_res_out_parms) up to date with SPC-4.
Add a data structure for the transport IDs that can optionally be
appended to the basic PERSISTENT RESERVE OUT parameter list.
Move SCSI protocol macro definitions out of the VPD page 0x83
definition and combine them with the more up to date protocol
definitions higher in the file.
Add function prototypes for scsi_nv_to_str(), scsi_get_nv(),
scsi_parse_transportid_64bit(), scsi_parse_transportid_spi(),
scsi_parse_transportid_rdma(), scsi_parse_transportid_iscsi(),
scsi_parse_transportid_sop(), and scsi_parse_transportid().
Sponsored by: Spectra Logic Corporation
MFC after: 1 week
handle packets up to 1536 bytes)
This fixes the need to frag that could happen when using vlans on top of
if_arge (which is a common case for the use the switch ports as individual
NICs).
Previously to this commit any vlan setup with if_arge as parent would have
the MTU of the parent interface reduced by the size of dot1q header
(4 bytes).
Tested on TP-Link 1043ND (where the WAN port is just a switch port setup to
tag packets in a different VLAN than the LAN ports).
Reported and tested by: Harm Weites (harm at weites.com)
Update some comments on code, specifying the correct vlans used on switch
setup.
Advertise the proper switch operation mode (the rtl8366rb only support
dot1q vlans).
This fixes the breakage that i introduced on r249752 and make the rtl8366rb
switch works again with etherswitchcfg(8).
Tested on TP-Link 1043ND.
Tested by: me, Harm Weites (harm at weites.com)
6851093 system drops to kmdb with anonymous dtrace probes + kmdb
This has no effect on FreeBSD (code is ifdef'ed) but is useful as
reference for future merges.
MFC after: 1 week
The EFI framebuffer produces corrupted output on certain systems. For
now display the framebuffer parameters (address, dimensions, etc.) on
boot to aid in tracking down these issues.
Sponsored by: The FreeBSD Foundation
Mark cpu_search_lowest/cpu_search_highest/cpu_search_both as noinline,
while cpu_search() gets always_inline. With the attributes set,
cpu_search() is inlined in wrappers, and if()s with constant
conditionals are optimized.
On some tests on many-core machine, the hwpmc reported samples for
cpu_search*() are reduced from 25% total to 9%.
Submitted by: "Rang, Anton" <anton.rang@isilon.com>
MFC after: 1 week
- Don't discard frames if the dropped or error flag is set.
- Don't remove the last 4-bytes of every packet.
- Add extra range check for data position offset when receiving data.
MFC after: 1 day
PR: 191432
The array index for the callchain is getting double-incremented -- both in the
loop and the storing. It should only be incremented in one location.
Also, constrain the stack pointer range check.
MFC after: 2 weeks
requests on the trim_queue, even for the CFA ERASE. This allows us, in
the future, to collapse adjacent requests. Since CFA ERASE is only for
CF cards, and it is so restrictive in what it can do, the collapse
code is not presently here. This also brings the ada driver more in
line with the da driver's treatment of BIO_DELETEs.
Reviewed by: mav@