New Stratus bnxt devices require support for short HWRM commands for VFs
to function. Enable their use, but only use them if it's both supported
and required... prefer the long HWRM commands when possible.
Submitted by: Bhargava Chenna Marreddy <bhargava.marreddy@broadcom.com>
Sponsored by: Broadcom Limited
Differential Revision: https://reviews.freebsd.org/D13269?id=36180
Add a new TMPFS_LINK_MAX to use in place of LINK_MAX for link overflow
checks and pathconf() reporting. Rather than storing a full 64-bit
link count, just use a plain int and use INT_MAX as TMPFS_LINK_MAX.
Discussed with: bde
Reviewed by: kib (part of a larger patch)
Sponsored by: Chelsio Communications
This uses NANDFS_LINK_MAX instead of LINK_MAX for link overflow checks
and the value reported by pathconf() / fpathconf().
Sponsored by: Chelsio Communications
Having all filesystems fall through to default values isn't always correct
and these values can vary for different filesystem implementations. Most
of these changes just use the existing default values with a few exceptions:
- Don't report CHOWN_RESTRICTED for ZFS since it doesn't do the exact
permissions check this claims for chown().
- Use NANDFS_NAME_LEN for NAME_MAX for nandfs.
- Don't report a LINK_MAX of 0 on smbfs. Now fail with EINVAL to
indicate hard links aren't supported.
Requested by: bde (though perhaps not this exact implementation)
Reviewed by: kib (earlier version)
MFC after: 1 month
Sponsored by: Chelsio Communications
The script originally supported embedding an mfs into ELF files or any
other type of file, because it searched for magic strings to mark the
beginning and end of the embeddable section. It was later modified to
read the section offset and length via readelf, which made it work for
ELF only. Restore the ability to update arbitrary file types by using
the readelf technique for ELF, and the magic string technique for all
others (including PE/COFF files like loader.efi).
Submitted by: Zakary Nafziger <worldofzak@gmail.com>
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D12746
- Define a NFS_LINK_MAX as UINT32_MAX to match the wire protocol.
- Use NFS_LINK_MAX instead of LINK_MAX as the fallback value reported
for a PATHCONF RPC by the NFS server.
- Use NFS_LINK_MAX instead of LINK_MAX as the default value reported
by the NFS client pathconf() if not overridden by the NFS server.
- When reading the link count out of an RPC reply, read the full 32
bits instead of the lower 16 bits.
Reviewed by: rmacklem (earlier version)
Sponsored by: Chelsio Communications
This method handles _PC_FILESIZEBITS, _PC_SYMLINK_MAX, and _PC_NO_TRUNC.
For other values it defers to vop_stdpathconf().
MFC after: 1 month
Sponsored by: Chelsio Communications
- Define a ZFS_LINK_MAX as the ZFS version of LINK_MAX which is set to
UINT64_MAX to match the on-disk format.
- Enable the currently #if 0'd code to check for link overflows and
return EMLINK.
- Don't clamp the link count reported in stat() to LINK_MAX as that is
still the 16-bit limit, but report the full link counts. Also,
avoid possibly overflowing the reported link count to 0 when adjusting
the link count to account for ".snapshot".
- Update the LINK_MAX reported by pathconf() to report ZFS_LINK_MAX
rather than LINK_MAX (but clamped to LONG_MAX for 32-bit systems).
Reviewed by: avg (earlier version)
Sponsored by: Chelsio Communications
The method handles NAME_MAX and LINK_MAX explicitly. For all other
pathconf variables, the method passes the request down to the underlying
file descriptor. This requires splitting a kern_fpathconf() syscallsubr
routine out of sys_fpathconf(). Also, to avoid lock order reversals with
vnode locks, the fdescfs vnode is unlocked around the call to
kern_fpathconf(), but with the usecount of the vnode bumped.
MFC after: 1 month
Sponsored by: Chelsio Communications
The firmware is always in little endian, use htole*() for all request fields
larger than one byte.
Submitted by: Bhargava Chenna Marreddy <bhargava.marreddy@broadcom.com>
Sponsored by: Broadcom Limited
Email address has changed, uses consistent name (Matthew, not Matt)
Reported by: Matthew Macy <mmacy@mattmacy.io>
Differential Revision: https://reviews.freebsd.org/D13537
gmirror does not perform any sorting of I/O requests, so the bioq API
doesn't provide any advantages over plain TAILQs. The API also does not
provide operations needed by an upcoming change.
No functional change intended. The diff shrinks the geom_mirror.ko
text and the gmirror softc slightly.
Tested by: pho (part of a larger patch)
MFC after: 1 week
Sponsored by: Dell EMC Isilon
the 32-bit cookie can be sign-extended on its way out of the loader and
through Open Firmware. If sign-extended, the in-kernel check of its value
would fail on 64-bit systems, resulting in a mountroot prompt. Solve this
by telling the kernel to ignore the high-order bits.
PR: kern/224437
Submitted by: Gustavo Romero
applied relocation offset in link_elf.c works as intended. We may want to
revisit how that works in future, for example by having elf_reloc_self()
actually store the numbers it is using rather than computing them later,
but this fixes symbol lookup after r326203.
Reported by: andreast@
Pointy hat to: me
The IA32 memory model guarantees that all writes are seen in the program
order. Also, any access to the uncacheable memory flushes the store
buffers. As the consequence, SFENCE instruction is (almost) never needed,
in particular, it is not needed to ensure the correct order of updates as
seen by a PCIe device.
Use atomic_thread_fence_rel() instead of wb() to only emit compiler barriers
on x86 there. Other architectures get the right barrier instruction as
well.
Reviewed by: hselasky
Sponsored by: Mellanox Technologies
MFC after: 1 week
In this case volatile qualifiers enusre that a compiler does not
optimize the accesses out.
Reviewed by: alc, jhb
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D13534
They provide relaxed-ordered atomic access semantic. Due to the
FreeBSD memory model, the operations are syntaxical wrappers around
the volatile accesses. The volatile qualifier is used to ensure that
the access not optimized out and in turn depends on the volatile
semantic as implemented by supported compilers.
The motivation for adding the operation is to help people coming from
other systems or knowing the C11/C++ standards where atomics have
special type and require use of the special access operations. It is
still the case that FreeBSD requires plain load and stores of aligned
integer types to be atomic.
Suggested by: jhb
Reviewed by: alc, jhb
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D13534
changed worked to capture dumps for me. However, the test for
SCHEDULER_STOPPED() isn't right. We can also call the dump routine
from ddb, in which case the scheduler is still running. This leads to
an assertion panic that we're sleeping when we shouldn't. Instead, use
the proper test for dumping or not. This brings us in line with other
places that do special things while we're doing polled I/O like this.
Noticed by: pho@
Differential Revision: https://reviews.freebsd.org/D13531
By the ACPI standard (ACPI 5 chapter 8.4 Declaring Processors) Processors
can be implemented in 2 distinct ways:
* Through a Processor object type (which provides P_BLK)
* Through a Device object type
Prior to this change, the FreeBSD driver only supported the former. AMD
Epyc / Poweredge systems we are testing both implement the latter only. Add
the missing support.
Because P_BLK is not defined in the device object case, C-states entering
must be completely controlled via _CST methods rather than P_LVL2/3.
John Baldwin points out that ACPI 6.0 formally deprecates the Processor
keyword, so eventually processors will only be enumerated as Device objects.
Submitted by: attilio
Reviewed by: jhb, markj, Anton Rang <rang AT acm.org>
Relnotes: maybe
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D13457
bug that requires 'hands off' for a period of time (2.3s) before we
check the RDY bit. Sicne this is a very odd quirk for a very limited
selection of drives, do this as a quirk. This prevented a successful
reset of the card when the card wedged.
Also, make sure that we comply with the advice from section 3.1.5 of
the 1.3 spec says that transitioning CC.EN from 0 to 1 when CSTS.RDY
is 1 or transitioning CC.EN from 1 to 0 when CSTS.RDY is 0 "has
undefined results". Short circuit when EN == RDY == desired state.
Finally, fail the reset if the disable fails. This will lead to a
failed device, which is what we want. (note: nda device needs
work for coping with a failed device).
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D13389
dtrace_gethrtime() may be called outside of probe context, and in
particular, from the DTRACEIOC_BUFSNAP handler.
Disable interrupts rather than using sched_pin() to help ensure that
we don't call any external functions when in probe context.
PR: 218452
MFC after: 1 week
of low physical memory:
Update the comment about leaving the permanent mapping in place. This
also improves the wording of the comment. PTD 0 is still left alone
because it is fairly important that it was unmapped earlier, and the
comment now describes the unmapping of the other low PTDs that the code
actually does.
Reviewed by: kib
it by a transient double mapping for the one instruction in ACPI wakeup
where it is needed (and for many surrounding instructions in ACPI resume).
Invalidate the TLB as soon as convenient after undoing the transient
mapping. ACPI resume already has the strict ordering needed for this.
This fixes the non-trapping of null pointers and other garbage pointers
below NBPDR (except transiently). NBPDR is quite large (4MB, or 2MB for
PAE).
This fixes spurious traps at the first instruction in VM86 bioscalls.
The traps are for transiently missing read permission in the first
VM86 page (physical page 0) which was just written to at KERNBASE in
the kernel. The mechanism is unknown (it is not simply PG_G).
locore uses a similar but larger transient double mapping and needs
it for 2 instructions instead of 1. Unmap the first PDE in it after
the 2 instructions to detect most garbage pointers while bootstrapping.
pmap_bootstrap() finishes the unmapping.
Remove the avoidance of the double mapping for a recently fixed special
case. ACPI resume could use this avoidance (made non-special) to avoid
any problems with the transient double mapping, but no such problems
are known.
Update comments in locore. Many were for old versions of FreeBSD which
tried to map low memory r/o except for special cases, or might have
allowed access to low memory via physical offsets. Now all kernel
maps are r/w, and removal of of the double map disallows use of physical
offsets again.
when KERNLOAD is smaller than NBPDR (not the default) and PG_G is
enabled (the default if the CPU supports it). This case has relatively
minor problems with coherency of the permanent double mapping, but the
fix in r167869 to improve coherency creates page tables with 3 different
errors so never worked.
The permanent double mapping is fundamentally broken and will be removed
soon. It fundamentally breaks trapping for null pointers and requires
complications to avoid cache coherency bugs. It is currently used for
only a single instruction in ACPI resume,
Many fixes VM86 and/or ACPI and/or the double map were attempted near
r1200000. r167869 attempted to fix cache coherency bugs in an unusual
case, but the bugs were unreachable because older errors in page tables
caused a crash first.
This commit just makes r167869 work as intended. Part 1 of these fixes
fixed the other errors, but also stopped mapping the PDE for KERNBASE
as a large page, so double mapping of this PDE only causes the same
problems as when KERNLOAD is the default. Except for the problem of
trapping null pointers, r167869 could be used to fix these problems,
but it is inactive in usual cases. The only known other problem is
that incoherent permissions for page 0 cause spurious traps in VM86
BIOS calls.
Reviewed by: kib
when KERNLOAD is not a multiple of NBPDR (not the default) and PSE is
enabled (the default if the CPU supports it). Addresses in PDEs must
be a multiple of NBPDR in the PSE case, but were not so in the crashing
case.
KERNLOAD defaults to NBPDR. NBPDR is 4 MB for !PAE and 2 MB for PAE.
The default can be changed by editing i386/include/vmparam.h or using
makeoptions. It can be changed to less than NBPDR to save real and
virtual memory at a small cost in time, or to more than NBPDR to waste
real and virtual memory. It must be larger than 1 MB and a multiple of
PAGE_SIZE. When it is less than NBPDR, it is necessarily not a multiple
of NBPDR. This case has much larger bugs which will be fixed in part 2.
The fix is to only use PSE for physical addresses above <KERNLOAD
rounded _up_ to an NBPDR boundary>. When the rounding is non-null,
this leaves part of the kernel not using large pages. Rounding down
would avoid this pessimization, but would break setting of PAT bits
on i/o pages if it goes below 1MB. Since rounding down always goes
below 1MB when KERNLOAD < NBPDR and the KERNLOAD > NBPDR case is not
useful, never round down.
Fix related style bugs (e.g., wrong literal values for NBPDR in comments).
Reviewed by: kib
A comment in bcm_bsc_fill_tx_fifo() even lists sc_totlen > 0 as a
precondition for calling the routine. I apparently forgot to make the
code do what my comment said.
Otherwise a poorly timed lowmem event may attempt to acquire a destroyed
lock. Unregister the handler before destroying the ARC reclaim thread.
Reported by: gjb
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D13480
vxlan_ftable entries are sorted in ascending order, due to wrong arguments
order it is possible to stop search before existing element will be found.
Then new element will be allocated in vxlan_ftable_update_locked() and can
be inserted in the list second time or trigger MPASS() assertion with
enabled INVARIANTS.
PR: 224371
MFC after: 1 week
that had the IPv6 fragmentation header:
o Neighbor Solicitation
o Neighbor Advertisement
o Router Solicitation
o Router Advertisement
o Redirect
Introduce M_FRAGMENTED mbuf flag, and set it after IPv6 fragment reassembly
is completed. Then check the presence of this flag in correspondig ND6
handling routines.
PR: 224247
MFC after: 2 weeks
- Fix reference of uninitialized error value in bhndb_generic_resume() if
the dynamic window count is 0.
- Fix incorrect bhnd_pmu(4) UPTME_MASK and PLL0_PC2_WILD_INT_MASK
constants.
- Variable definitions referenced by our generated SPROM layouts will never
be NULL, but add explicit asserts to make that clear.
- Add missing variable initialization in bhnd_nvram_sprom_ident().
- Fix leak of driver array in bhnd_erom_probe_driver_classes().
- Fix zero-length memset() in bhndb_pci_eio_init().
- Fix an off-by-one error and potential invalid OOBSEL bit shift operation
in bcma_dinfo_init_intrs().
- Remove dead code in siba_suspend_hw().
- Fix duplicate call to bhnd_pmu_enable_regulator() in both the enable and
disable code paths of bhnd_compat_cc_pmu_set_ldoparef().
Reported by: Coverity
CIDs: 1355194, 1362020, 1362022, 1373114, 1366563, 1373115,
1381569, 1381579, 1383555, 1383566, 1383571
Sponsored by: The FreeBSD Foundation
Currently Facility Unavailable is absent and once an application
tries to use or access a register from a feature disabled in the
CPU it causes a kernel panic.
A simple test-case is:
int main() { asm volatile ("tbegin.;"); }
which will use TM (Hardware Transactional Memory) feature which
is not supported by the kernel and so will trigger the following
kernel panic:
----
fatal user trap:
exception = 0xf60 (unknown)
srr0 = 0x10000890
srr1 = 0x800000000000f032
lr = 0x100004e4
curthread = 0x5f93000
pid = 1021, comm = htm
panic: unknown trap
cpuid = 40
KDB: stack backtrace:
Uptime: 3m18s
Dumping 10 MB (3 chunks)
chunk 0: 11MB (2648 pages) ... ok
chunk 1: 1MB (24 pages) ... ok
chunk 2: 1MB (2 pages)panic: IOMMU mapping error: -4
cpuid = 40
Uptime: 3m18s
----
Since Hardware Transactional Memory is not yet supported by FreeBSD, treat
this as an illegal instruction.
PR: 224350
Submitted by: Gustavo Romero <gromero_AT_ibm_DOT_com>
MFC after: 2 weeks
memory:
Load the kernel eflags less magically, as in locore. The magic increased
when I removed eflags from the pcb in r305899.
Remove a jump to low memory that became garbage when the i386 version was
mostly replaced by the amd64 version in r235622.
The amd64 version is very similar. It still loads the flags magically,
but is not missing comments about using the special page table.
Reviewed by: kib
If a route is modified in a way that changes the route's source
address (i.e. the address used to access the gateway), then a
reference on the ifaddr representing the old source address will
be leaked if the address type does not have an ifa_rtrequest
method defined. Plug the leak by releasing the reference in
all cases.
Differential Revision: https://reviews.freebsd.org/D13417
Reviewed by: ae
MFC after: 3 weeks
Sponsored by: Dell
Without the identifier in the list booting FreeBSD results in printing the
following (from a PowerKVM boot):
cpu0: Unknown PowerPC CPU revision 0x1201, 2550.00 MHz
For now, add the same feature list as POWER8. As new capabilities are added to
support POWER9 specific features, they will be added to this.
PR: 224344
Submitted by: Breno Leitao <breno_DOT_leitao_AT_gmail_DOT_com>
dvp->v_mount after dvp is unlocked.
The vnode might be reclaimed after unlock, so v_mount becomes NULL.
Cache the struct mount pointer before the unlock, the struct is
type-stable.
Note that devfs_allocv() reads mp->mnt_data but does not operate on it
further when dirent is doomed. The unmount cannot proceed until all
dirents are reclaimed.
Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week