Fix some style while at it.
Submitted by: Johannes Lundberg <johalun0@gmail.com>
MFC after: 1 week
Sponsored by: Limelight Networks
Sponsored by: Mellanox Technologies
The forthcoming microcode update will fix a TSX bug by clobbering PMC3
when TSX instructions are executed (even speculatively). There is an
alternate mode where CPU executes all TSX instructions by aborting
them, in which case PMC3 is still available to OS. Any code that
correctly uses TSX must be ready to handle abort anyway.
Since it is believed that FreeBSD population of hwpmc(4) users is
significantly larger than the population of TSX users, switch the
microcode into TSX abort mode whenever a pmc is allocated, and back to
bug avoidance mode when the last pmc is deallocated.
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
extended attributes, the kernel can panic with either "ffs_truncate3"
or with "softdep_deallocate_dependencies: dangling deps".
The problem arises because the flushbuflist() function which is
called to clear out buffers is passed either the V_NORMAL flag to
indicate that it should flush buffer associated with the contents
of the file or the V_ALT flag to indicate that it should flush the
buffers associated with the extended attribute data. The buffers
containing the extended attribute data are identified by having
their BX_ALTDATA flag set in the buffer's b_xflags field. The
BX_ALTDATA flag is set on the buffer when the extended attribute
block is first allocated or when its contents are read in from the
disk.
On a busy system, a buffer may be reused for another purpose, but
the contents of the block that it contained continues to be held
in the main page cache. Each physical page is identified as holding
the contents of a logical block within a specified file (identified
by a vnode). When a request is made to read a file, the kernel first
looks for the block in the existing buffers. If it is not found
there, it checks the page cache to see if it is still there. If
it is found in the page cache, then it is remapped into a new
buffer thus avoiding the need to read it in from the disk.
The bug is that when a buffer request made for an extended attribute
is fulfilled by reconstituting a buffer from the page cache rather
than reading it in from disk, the BX_ALTDATA flag was not being
set. Thus the flushbuflist() function would never clear it out and
the "ffs_truncate3" panic would occur because the vnode being cleared
still had buffers on its clean-buffer list. If the extended attribute
was being updated, it is first read, then updated, and finally
written. If the read is fulfilled by reconstituting the buffer
from the page cache the BX_ALTDATA flag was not set and thus the
dirty buffer would never be flushed by flushbuflist(). Eventually
the buffer would be recycled. Since it was never written it would
have an unfinished dependency which would trigger the
"softdep_deallocate_dependencies: dangling deps" panic.
The fix is to ensure that the BX_ALTDATA flag is set when a buffer
has been reconstituted from the page cache.
PR: 230962
Reported by: 2t8mr7kx9f@protonmail.com
Reviewed by: kib
Tested by: Peter Holm
MFC after: 1 week
Sponsored by: Netflix
isci(4) uses deferred loading. Typically on amd64 and i386 non-PAE
the tag does not create any restrictions, but on i386 PAE-tables but
non-PAE configs callbacks might be used.
Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
writes are running. Some of the cases which are not handled properly in driver are:
1. With R1 fastpath supported, single write from CAM layer can consume 2 MPT frames
at driver/firmware level for fastpath qualification(if fw_outstanding < controller Queue Depth).
Due to this driver has to throttle IOs coming from CAM layer as well as second fastpath
write(of R1 write) against Adapter Queue Depth.
If "fw_outstanding" reaches to adapter queue depth, driver should return IOs from CAM layer with
device busy status.While allocating second MPT frame(corresponding to R1 FP write) also, driver
should ensure fw_outstanding should not exceed adapter QD.
2. For R1 fastpath writes completion, driver decrements "fw_oustanding" counter without
really returning MPT frame to free pool. It may cause IOs(with heavy IOs running, consuming whole
adapter Queue Depth) consuming MPT frames reserved for DCMDs(management commands) and
DCMDs(internal and sent by application) not getting MPT frame will start failing.
Below is one test case to hit the issue described above-
1. Run heavy IOs (outstanding IOs should hit adapter Queue Depth).
2. Run management tool (Broadcom's storcli tool) querying adapter in loop (run command- "storcli64 /c0 show" in loop).
3. Management tool's requests would start failing due to non-availability of free MPT frames as all frames would be consumed by IOs.
Fix: Increment/decrement of "fw_outstanding" counter should be in sync with MPT frame get/return.
Submitted by: Sumit Saxena <sumit.saxena@broadcom.com>
Reviewed by: Kashyap Desai <Kashyap.Desai@broadcom.com>
Approved by: Ken
MFC after: 3 days
Sponsored by: Broadcom Inc
When merging from Netflix's tree, resetting the carsize was dropped
accidentally. This fix fixes that revision by properly resetting how
many are in the car.
Noticed by: mav@
Research Unix, 7th Edition introduced TIMEZONE and DSTFLAG
compile-time constants in sys/param.h to communicate these values for
the machine. 4.2BSD moved from the compile-time to run-time and
introduced these variables and used for localtime() to return the
right offset from UTC (sometimes referred to as GMT, for this purpose
is the same). 4.4BSD migrated to using the tzdata code/database and
these variables were basically unused.
FreeBSD removed the real need for these with adjkerntz in
1995. However, some RTC clocks continued to use these variables,
though they were largely unused otherwise. Later, phk centeralized
most of the uses in utc_offset, but left it using both tz_minuteswest
and adjkerntz.
POSIX (IEEE Std 1003.1-2017) states in the gettimeofday specification
"If tzp is not a null pointer, the behavior is unspecified" so there's
no standards reason to retain it anymore. In fact, gettimeofday has
been marked as obsolecent, meaning it could be removed from a future
release of the standard. It is the only interface defined in POSIX
that references these two values. All other references come from the
tzdata database via tzset().
These were used to more faithfully implement early unix ABIs which
have been removed from FreeBSD. NetBSD has completely eliminated
these variables years ago. Linux has migrated to tzdata as well,
though these variables technically still exist for compatibility
with unspecified older programs.
So, there's no real reason to have them these days. They are a
historical vestige that's no longer used in any meaningful way.
Reviewed By: jhb@, brooks@
Differential Revision: https://reviews.freebsd.org/D19550
kernel vn_printf() routine when printing out vnodes associated with
a UFS filesystem) to also include the inode's link count, effective
link count, generation number, owner, group, flags, size, and for
UFS2 filesystems, the extent size.
Sponsored by: Netflix
vnode pointer (b_vp). The value of b_vp can be used by "show vnode"
to print the vnode and "show vnodebufs" to print all the clean and
dirty buffers associated with the vnode (which should include this
buffer).
Sponsored by: Netflix
The 16GB, 32GB and 128GB versions of this product all have the same
problem. For some reason, the RC10 size is correct, while the RC16
size is larger (oddly by the capacity size / 1024 bytes). Using the
RC16 size results in illegal LBA range errors when geom tastes the
device. So, expand the quirk to cover all versions of this chip.
Ideally, we'd get both READ CAPACITY 10 and READ CAPACITY 16 sizes and
print a warnnig if they differ and use the smaller of the two numbers,
though that may be problematical as well. Furthermore, SBC-4
encourages users transition to RC16 only, which suggests that in the
future RC10 may disappear from some drives. It's unclear how to cope
with these drives generically.
PR: 234503
MFC After: 1 week
I tried to save some CPU time on hopeless aggregation attempts, but it seems
the condition I added is overly strict, blocking also aggregation of optional
I/Os in cases which previously were possible. Revert just to be safe.
MFC after: 1 month
Make sure the enter and leave polling routines can be called multiple times
with same setting. Ignore setting polling or event mode twice. This fixes a
deadlock during shutdown if polling mode was already selected.
MFC after: 1 week
Sponsored by: Mellanox Technologies
The ESGL bit was left uninitialized when executing the REPORT LUNS
ioctl. This could allow a zeroed data buffer to be treated as a
scatter/gather list. The firmware would eventually walk past the end
of the data buffer, potentially find what looked like a valid
address/length pair, and write the result to semi-random memory.
Obtained from: Dell EMC Isilon
MFC after: 1 week
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D19398