On FreeBSD VFS_HOLD/VN_RELE were mapped to MNT_REF/MNT_REL that
manipulate mnt_ref. But the job of properly maintaining the reference
count is already automatically performed by insmntque(9) and
delmntque(9). So, in effect all ZFS vnodes referenced the corresponding
mountpoint twice.
That was completely harmless, but we want to be very explicit about what
FreeBSD VFS APIs are used, because illumos VFS_HOLD and FreeBSD MNT_REF
provide quite different guarantees with respect to the held vfs_t /
mountpoint. On illumos VFS_HOLD is sufficient to guarantee that
vfs_t.vfs_data stays valid. On the other hand, on FreeBSD MNT_REF does
*not* provide the same guarantee about mnt_data. We have to use
vfs_busy() to get that guarantee.
Thus, the calls to VFS_HOLD/VFS_RELE on vnode init and fini are removed.
VFS_HOLD calls are replaced with vfs_busy in the ioctl handlers.
And because vfs_busy has a richer interface that can not be dumbed down
in all cases it's better to explicitly use it rather than trying to mask
it behind VFS_HOLD.
This change fixes a panic that could result from a race between
zfs_umount() and zfs_ioc_rollback(). We observed a case where
zfsvfs_free() tried to destroy data that zfsvfs_teardown() was still
using. That happened because there was nothing to prevent unmounting of
a ZFS filesystem that was in between zfs_suspend_fs() and
zfs_resume_fs().
Reviewed by: kib, smh
MFC after: 3 weeks
Sponsored by: ClusterHQ
Differential Revision: https://reviews.freebsd.org/D2794
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Eli Rosenthal <eli.rosenthal@delphix.com>
illumos/illumos-gate@c20404ff77
Reviewed by: Patrick Mooney <patrick.mooney@joyent.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Approved by: Matthew Ahrens <mahrens@delphix.com>
Author: Alex Wilson <alex.wilson@joyent.com>
illumos/illumos-gate@d09e4475f6
separate driver. Add support for activating clock and hwreset resources
for these devices when the EXT_RESOURCES option is present.
Reviewed by: andrew, mmel, Emmanuel Vadot <manu@bidouilliste.com>
Approved by: adrian (mentor)
Differential Revision: https://reviews.freebsd.org/D5749
Previously, freebsd32 binaries could submit read/write requests with lengths
greater than INT_MAX that a native kernel would have rejected.
Reviewed by: kib
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D5788
not getting a proper bounds check.
Thanks to CTurt for pointing at this with a big red blinking neon sign.
PR: 206761
Submitted by: sson
Reviewed by: cturt@hardenedbsd.org
MFC after: 3 days
PowerPC has real Open Firmware and does not necessarily need FDT.
Make ofwpci.c only PCI dependent.
Pointed out by: emaste
Reviewed by: nwhitehorn
Obtained from: Semihalf
This is kinda critical to the performance when the CPU is slow and
network bandwidth is high, e.g. in the hypervisor.
Reviewed by: rrs, gallatin, Dexuan Cui <decui microsoft com>
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D5765
And factor out tcp_lro_rx_done, which deduplicates the same logic with
netinet/tcp_lro.c
Reviewed by: gallatin (1st version), hps, zbb, np, Dexuan Cui <decui microsoft com>
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D5725
The i8254 simulation in Hyper-V is kinda broken and is not available
in Generation 2 Hyper-V VMs, so Hyper-V timer must be registered early
enough so that it can be used to do the TSC freq calibration.
This fixes the notorious warning like this:
calcru: runtime went backwards from 50 usec to 25 usec for pid 0 (kernel)
Submitted by: Dexuan Cui <decui microsoft com>
Reviewed by: kib, sephe
Tested by: kib, sephe
MFC after: 1 week
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D5778
- Set BI_CAN_EXEC_DYN, so we can execute ET_DYN ELF files in addition to
regular ET_EXECs.
- Provide an AT_BASE entry in the auxiliary vector, so the executable
knows at which address it got loaded and can apply relocations.
Some time ago I made a change to merge together the memory scope
definitions used by mmap (MAP_{PRIVATE,SHARED}) and lock objects
(PTHREAD_PROCESS_{PRIVATE,SHARED}). Though that sounded pretty smart
back then, it's backfiring. In the case of mmap it's used with other
flags in a bitmask, but for locking it's an enumeration. As our plan is
to automatically generate bindings for other languages, that looks a bit
sloppy.
Change all of the locking functions to use separate flags instead.
Obtained from: https://github.com/NuxiNL/cloudabi
This provides a constant ABI and layout for these structures (especially
struct adapter) avoiding some foot shooting.
Discussed with: np
Sponsored by: Chelsio Communications
Previously, calls to *sleep() and cv_*wait*() immediately returned during
early boot. Instead, permit threads that request a sleep without a
timeout to sleep as wakeup() works during early boot. Sleeps with
timeouts are harder to emulate without working timers, so just punt and
panic explicitly if any thread tries to use those before timers are
working. Any threads that depend on timeouts should either wait until
SI_SUB_KICK_SCHEDULER to start or they should use DELAY() until timers
are available.
Until APs are started earlier this should be a no-op as other kthreads
shouldn't get a chance to start running until after timers are working
regardless of when they were created.
Reviewed by: kib
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D5724
- Move some blocks around to reduce the number of 'if (unmap)' checks.
- Use 'pbuf == NULL' instead of 'unmap'.
- Use nitems.
- Pull an assignment out of an if expression.
Reviewed by: kib
Sponsored by: Chelsio Communications
nic->num_vf_en is set based on the number of the enabled LMACs.
This number should not be overwritten later by any routine.
Instead it should fail PCI_IOV_ADD_VF() so that available VFs
with the corresponding LMACs will attach whereas other, disabled
VFs will fail with the proper error code.
Error signaling (due to improper number of VFs requested) is also moved
from PCI_IOV_INIT() to PCI_IOV_ADD_VF().
This will be reworked when multiple queue sets are enabled but for
now this is the correct behavior of the driver.
Obtained from: Semihalf
Sponsored by: Cavium
If the driver is not active or link is down the packet could remain
non-writeable. This commit makes all mbufs enqueued to the driver's
ring buffer to have correct attributes.
Pointed out by: wma
Reviewed by: wma
Obtained from: Semihalf
Sponsored by: Cavium
Differential Revision: https://reviews.freebsd.org/D5800
The FDT description is as follows:
- phy-handle, reg, qlm-mode, mac-address are under nodes in bgx0/1 node
- phy nodes (pointed by phy-handle) are under MDIO even though they may
not be connected through to MDIO. In those nodes they do not contain
MAC address or etc.
This commit changes parsing of the FDT nodes for BGX so that it can
obtain correct MAC address for a given PHY.
Reviewed by: wma
Obtained from: Semihalf
Sponsored by: Cavium
Differential Revision: https://reviews.freebsd.org/D5781
- Avoid memory leak when nicvf_tx_mbuf_locked() fails
- Introduce nicvf_xmit_locked() routine that uses drbr_peek(),
drbr_advance() or drbr_putback() for a specific ifnet.
This gives more clear and efficient design as well as
prevents from dropping mbufs that where not sent due to temporary
lack of descriptors.
- Add missing ETHER_BPF_MTAP() hook
Pointed out by: yongari
Reviewed by: wma
Obtained from: Semihalf
Sponsored by: Cavium
Differential Revision: https://reviews.freebsd.org/D5534
Changes introduced to AHCI code adding support for MSI-x
caused interrupt storm on Alpine boards.
This is unintended behaviour so added quirk to omit this functionality.
Reviewed by: mav
Submitted by: Marcin Mazurek <mma@semihalf.com>
Obtained from: Semihalf
Sponsored by: Annapurna Labs
Differential Revision: https://reviews.freebsd.org/D4301
increased to 256TiB. The kernel address space can also be increased to be
the same size, but this will be performed in a later change.
To help work with an extra level of page tables two new functions have
been added, one to file the lowest level table entry, and one to find the
block/page level. Both of these find the entry for a given pmap and virtual
address.
This has been tested with a combination of buildworld, stress2 tests, and
by using sort to consume a large amount of memory by sorting /dev/zero. No
new issues are known to be present from this change.
Reviewed by: kib
Obtained from: ABT Systems Ltd
Relnotes: yes
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D5720
Make it compile only for i386/amd64 for now as it's been tested there.
It's quite possible it'll show up elsewhere and we can enable it
for other architectures later.
Tested:
* PC Engines APU1C4
Submitted by: Daniel Wyatt <daniel@dewyatt.com>
Reviewed by: adrian, loos
Differential Revision: https://reviews.freebsd.org/D5389