Commit Graph

124564 Commits

Author SHA1 Message Date
John Baldwin
f0aefccb70 Restore the <sys/vmem.h> header to fix build of cxgbe(4) TOM.
vmem's are not just used for TLS memory in TOM and the #include actually
predates the TLS code so should not have been removed when the TLS vmem
moved in r340466.

Pointy hat to:	jhb
Sponsored by:	Chelsio Communications
2018-11-16 01:27:24 +00:00
Mateusz Guzik
088ac3ef4b amd64: handle small memset buffers with overlapping stores
Instead of jumping to locations which store the exact number of bytes,
use displacement to move the destination.

In particular the following clears an area between 8-16 (inclusive)
branch-free:

movq    %r10,(%rdi)
movq    %r10,-8(%rdi,%rcx)

For instance for rcx of 10 the second line is rdi + 10 - 8 = rdi + 2.
Writing 8 bytes starting at that offset overlaps with 6 bytes written
previously and writes 2 new, giving 10 in total.

Provides a nice win for smaller stores. Other ones are erratic depending
on the microarchitecture.

General idea taken from NetBSD (restricted use of the trick) and bionic
string functions (use for various ranges like in this patch).

Reviewed by:	kib (previous version)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17660
2018-11-16 00:44:22 +00:00
John Baldwin
47c64f9e3e Remove bogus roundup2() of the key programming work request header.
The key context is always placed immediately after the work request
header.  The total work request length has to be rounded up by 16
however.

MFC after:	1 month
Sponsored by:	Chelsio Communications
2018-11-15 23:31:04 +00:00
John Baldwin
2939ecd3ce Change the quantum for TLS key addresses to 32 bytes.
The addresses passed when reading and writing keys are always shifted
right by 5 as the memory locations are addressed in 32-byte chunks, so
the quantum needs to be 32, not 8.

MFC after:	1 month
Sponsored by:	Chelsio Communications
2018-11-15 23:10:46 +00:00
Mark Johnston
aeb7a84ee1 Remove mostly-useless proc provider probes.
For some reason the proc UMA zone's ctor, dtor and init functions are
instrumented, but these functions are always available through FBT.
Moreover, the probes are not part of the original Solaris proc
provider, aren't documented, have no uses (e.g., in dwatch(8)) and
have no clear use to begin with.  Therefore, remove them.

Reviewed by:	rpaulo
Differential Revision:	https://reviews.freebsd.org/D2169
2018-11-15 23:02:59 +00:00
John Baldwin
bc13c69bef Move the TLS key map into the adapter softc so non-TOE code can use it.
Sponsored by:	Chelsio Communications
2018-11-15 23:00:30 +00:00
John Baldwin
c15600b71a Use sbsndptr_adv() instead of sbsndptr() for TOE TLS.
For TOE TLS, we just want to advance the send pointer to skip over the
record just sent to the TOE.  The recently added sbsndptr_adv() is
sufficient for that and is cheaper.

MFC after:	1 month
Sponsored by:	Chelsio Communications
2018-11-15 22:47:47 +00:00
John Baldwin
b6b42932db Convert the number of MSI IRQs on x86 from a constant to a tunable.
The number of MSI IRQs still defaults to 512, but it can now be
changed at boot time via the machdep.num_msi_irqs tunable.

Reviewed by:	kib, royger (older version)
Reviewed by:	markj
MFC after:	1 month
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D17977
2018-11-15 18:37:41 +00:00
Luiz Otavio O Souza
aaf1f854e1 Set the SPI clock speed and polarity on each transfer to catch up with
recent changes in spibus and allow the use of different SPI modes on
the same bus.

Reported by:	ian
Sponsored by:	Rubicon Communications, LLC (Netgate)
2018-11-15 17:05:02 +00:00
Luiz Otavio O Souza
77613ca0cb Comment MD_ROOT and remove 'device re' which is not part of the system and
can be loaded as module.
2018-11-15 16:29:27 +00:00
Warner Losh
e5436ab5af Add cam_iosched_set_latfcn to set a latency callback for high latency.
It's often useful to have a callback when an I/O takes more than a
threshold amount of time. This adds the infrastructure for periph
devices to register one.

One use-case is as a debugging aide when you need a semi-realtime
indication of an I/O outlier so you can trigger bus capture gear for
vendor analysis.

Sponsored by: Netflix, Inc
2018-11-15 16:02:45 +00:00
Warner Losh
204a1a4d4c Introduce scsi_ata_setfeatures() as a convenient way to make
a passthru ATA SETFEATURES command.

Sponsored by: Netflix, Inc
2018-11-15 16:02:34 +00:00
Warner Losh
36173f6976 Do proper conversion to/from sbt.
Doh! sbttoX and Xtosbt were backwards. While they ran, they produced
bogus results.

Pointy hat to: imp@
2018-11-15 16:02:24 +00:00
Warner Losh
023b87bffa When converting ns,us,ms to sbt, return the ceil() of the result
rather than the floor(). Returning the floor means that
sbttoX(Xtosbt(y)) != y for almost all values of y.  In practice, this
results in a difference of at most 1 in the lsb of the sbintime_t.
This difference is meaningless for all current users of these
functions, but is important for the newly introduced sysctl conversion
routines which implicitly rely on the transformation being idempotent.

Sponsored by: Netflix, Inc
2018-11-15 16:02:13 +00:00
Warner Losh
ee7eba240b Remove trailing white space in advance of other changes. 2018-11-14 23:15:50 +00:00
Stephen Hurd
0efb1a464f Clear RX completion queue state veriables in iflib_stop()
iflib_stop() was not resetting the rxq completion queue state variables.
This meant that for any driver that has receive completion queues, after a
reinit, iflib would start asking what's available on the rx side starting at
whatever the completion queue index was prior to the stop, instead of at 0.

Submitted by:	pkelsey
Reported by:	pkelsey
MFC after:	3 days
Sponsored by:	Limelight Networks
2018-11-14 20:36:18 +00:00
Gleb Smirnoff
905837ebe7 Initialize compatibility epoch tracker for thread0. Fixes
panics for drivers that call if_maddr_lock() during startup.

Reported by:	cy
2018-11-14 19:10:35 +00:00
John Baldwin
c6aba52e4f Revert r332735 and fix MSI-X to properly fail allocations when full.
The off-by-one errors in 332735 weren't actual errors and were
preventing the last MSI interrupt source from being used.  Instead,
the issue is that when all MSI interrupt sources were allocated, the
loop in msix_alloc() would terminate with 'msi' still set to non-null.
The only check for 'i' overflowing was in the 'msi' == NULL case, so
msix_alloc() would try to reuse the last MSI interrupt source instead
of failing.

Fix by moving the check for all sources being in use to just after the
loop.

Reviewed by:	kib, markj
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D17976
2018-11-14 18:45:33 +00:00
Vincenzo Maffione
2e42b74a6f vtnet: fix netmap support
netmap(4) support for vtnet(4) was incomplete and had multiple bugs.
This commit fixes those bugs to bring netmap on vtnet in a functional state.

Changelist:
  - handle errors returned by virtqueue_enqueue() properly (they were
    previously ignored)
  - make sure netmap XOR rest of the kernel access each virtqueue.
  - compute the number of netmap slots for TX and RX separately, according to
    whether indirect descriptors are used or not for a given virtqueue.
  - make sure sglist are freed according to their type (mbufs or netmap
    buffers)
  - add support for mulitiqueue and netmap host (aka sw) rings.
  - intercept VQ interrupts directly instead of intercepting them in txq_eof
    and rxq_eof. This simplifies the code and makes it easier to make sure
    taskqueues are not running for a VQ while it is in netmap mode.
  - implement vntet_netmap_config() to cope with changes in the number of queues.

Reviewed by:	bryanv
Approved by:	gnn (mentor)
MFC after:	3 days
Sponsored by:	Sunny Valley Networks
Differential Revision:	https://reviews.freebsd.org/D17916
2018-11-14 15:39:48 +00:00
Stephen Hurd
8d4ceb9cc4 Prevent POLA violation with TSO/CSUM offload
Ensure that any time CSUM_IP_TSO or CSUM_IP6_TSO is set that the corresponding
CSUM_IP6?_TCP / CSUM_IP flags are also set.

Rather than requireing drivers to bake-in an understanding that TSO implies
checksum offloads, make it explicit.

This change requires us to move the IFLIB_NEED_ZERO_CSUM implementation to
ensure it's zeroed for TSO.

Reported by:	Jacob Keller <jacob.e.keller@intel.com>
MFC after:	1 week
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D17801
2018-11-14 15:23:39 +00:00
Stephen Hurd
4d261ce278 Fix leaks caused by ifc_nhwtxqs never being initialized
r333502 removed initialization of ifc_nhwtxqs, and it's not clear
there's a need to copy it into the struct iflib_ctx at all. Use
ctx->ifc_sctx->isc_ntxqs instead.

Further, iflib_stop() did not clear the last ring in the case where
isc_nfl != isc_nrxqs (such as when IFLIB_HAS_RXCQ is set). Use
ctx->ifc_sctx->isc_nrxqs here instead of isc_nfl.

Reported by:	pkelsey
Reviewed by:	pkelsey
MFC after:	3 days
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D17979
2018-11-14 15:16:45 +00:00
Luiz Otavio O Souza
2e82757c64 Add the driver for the SPI controller on ARMADA38X.
Tested on Clearfog (Pro) and SG-3100.

Sponsored by:	Rubicon Communications, LLC (Netgate)
2018-11-14 14:26:32 +00:00
Konstantin Belousov
1c4ca77890 Add d_off support for multiple filesystems.
The d_off field has been added to the dirent structure recently.
Currently filesystems don't support this feature.  Support has been
added and tested for zfs, ufs, ext2fs, fdescfs, msdosfs and unionfs.
A stub implementation is available for cd9660, nandfs, udf and
pseudofs but hasn't been tested.

Motivation for this feature: our usecase is for a userspace nfs server
(nfs-ganesha) with zfs.  At the moment we cache direntry offsets by
calling lseek once per entry, with this patch we can get the offset
directly from getdirentries(2) calls which provides a significant
speedup.

Submitted by:	Jack Halford <jack@gandi.net>
Reviewed by:	mckusick, pfg, rmacklem (previous versions)
Sponsored by:	Gandi.net
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D17917
2018-11-14 14:18:35 +00:00
Conrad Meyer
fbd5d78209 amdtemp(4): Fix temperature reporting on AMD 2990WX
Update the AMD family 17h temperature reporting based on AMD Tech Doc 56255
OSRR, section 4.2.1.

For CPUS w/CUR_TEMP_RANGE_SEL set, scale the reported temperature into the
range -49..206; i.e., subtract 49°C.

Submitted by:	gallatin@
Reported by:	bcran@
Reviewed by:	me (long ago)
MFC after:	22.57 seconds
Relnotes:	yea
Differential Revision:	https://reviews.freebsd.org/D16855
2018-11-14 04:50:29 +00:00
Conrad Meyer
9d49c4229a amdsmn(4)/amdtemp(4): Attach to Ryzen 2 hostbridges
As reported, tested, and patch supplied by Johannes.

There may be future work to do to support multiple sensors, but for now, any
sensor at all is a strict improvement for Ryzen 2 systems.

PR:		228480
Submitted by:	Johannes Lundberg <johalun0 AT gmail.com> (earlier version)
Reported by:	deischen@, Johannes, and numerous others
MFC after:	3.72 days
2018-11-14 03:42:39 +00:00
Brooks Davis
5b1df30051 Use the main capabilities.conf for freebsd32.
Allow the location of capabilities.conf to be configured.

Also allow a per-abi syscall prefix to be configured with the
abi_func_prefix syscalls.conf variable and check syscalls against
entries in capabilities.conf with and without the prefix amended.

Take advantage of these two features to allow use shared capabilities.conf
between the default syscall vector and the freebsd32 compatability
layer.  We've been inconsistent about keeping the two in sync as
evidenced by the bugs fixed in r340294.  This eliminates that problem
going forward.

Reviewed by:	kib
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D17932
2018-11-14 00:46:02 +00:00
Gleb Smirnoff
6febf18036 Fix build on some architectures after r340413. On amd64 epoch.h
appeared to be included implicitly.
2018-11-14 00:33:03 +00:00
Matt Macy
91cf497515 epoch(9) revert r340097 - no longer a need for multiple sections per cpu
I spoke with Samy Bahra and recent changes to CK to make ck_epoch_call and
ck_epoch_poll not modify the record have eliminated the need for this.
2018-11-14 00:12:04 +00:00
Gleb Smirnoff
635c18840a style(9), mostly adjusting overly long lines. 2018-11-13 23:57:34 +00:00
Warner Losh
eea18fdb02 Remove support for versions prior to FreeBSD 7.0 from twa(4)
This was for pre-7.0 CAM support. Neither the CAM nor the busdma
changes over the years have not been ifdef'd. The code cannot build
on 6.x anymore. Support for 6.4 ended in 2010, so remove them.
2018-11-13 23:53:24 +00:00
Gleb Smirnoff
a760c50c9e With epoch not inlined, there is no point in using _lite KPI. While here,
remove some unnecessary casts.
2018-11-13 23:45:38 +00:00
Gleb Smirnoff
9f360eecf9 The dualism between epoch_tracker and epoch_thread is fragile and
unnecessary. So, expose CK types to kernel and use a single normal
structure for epoch_tracker.

Reviewed by:	jtl, gallatin
2018-11-13 23:20:55 +00:00
Matt Macy
f1bac7bb74 Add ZFS to amd64 NOTES to catch future breakage of static linking 2018-11-13 23:08:46 +00:00
Gleb Smirnoff
b79aa45e0e For compatibility KPI functions like if_addr_rlock() that used to have
mutexes but now are converted to epoch(9) use thread-private epoch_tracker.
Embedding tracker into ifnet(9) or ifnet derived structures creates a non
reentrable function, that will fail miserably if called simultaneously from
two different contexts.
A thread private tracker will provide a single tracker that would allow to
call these functions safely. It doesn't allow nested call, but this is not
expected from compatibility KPIs.

Reviewed by:	markj
2018-11-13 22:58:38 +00:00
Warner Losh
85939654f3 Use atomic_load_acq_int() here too to poll done, ala r328521 2018-11-13 22:41:20 +00:00
Kirk McKusick
9fc5d538fc In preparation for adding inode check-hashes, clean up and
document the libufs interface for fetching and storing inodes.
The undocumented getino / putino interface has been replaced
with a new getinode / putinode interface.

Convert the utilities that had been using the undocumented
interface to use the new documented interface.

No functional change (as for now the libufs library does not
do inode check-hashes).

Reviewed by:  kib
Tested by:    Peter Holm
Sponsored by: Netflix
2018-11-13 21:40:56 +00:00
Mateusz Guzik
f183fb162c locks: plug warnings about unitialized variables
They only showed up after I redefined LOCKSTAT_ENABLED to 0.

doing_lockprof in mutex.c is a real (but harmless) bug. Should the
value be non-zero it will do checks for lock profiling which would
otherwise be skipped.

state in rwlock.c is a wart from the compiler, the value can't be
used if lock profiling is not enabled.

Sponsored by:	The FreeBSD Foundation
2018-11-13 21:29:56 +00:00
Eric van Gyzen
d54474e63b Make no assertions about lock state when the scheduler is stopped.
Change the assert paths in rm, rw, and sx locks to match the lock
and unlock paths.  I did this for mutexes in r306346.

Reported by:	Travis Lane <tlane@isilon.com>
MFC after:	2 weeks
Sponsored by:	Dell EMC Isilon
2018-11-13 20:48:05 +00:00
Mark Johnston
991666adc7 Ensure that libnv can be used when kern.trap_enotcap=1.
libnv used fcntl(fd, F_GETFL) to test whether fd is a valid file
descriptor.  Aside from being racy, this check requires CAP_FCNTL
rights on fd.  Instead, use fcntl(fd, F_GETFD), which does not require
any capability rights.

Also remove some redundant fd_is_valid() checks to avoid extra system
calls; in many cases we were performing this check immediately before
dup()ing the descriptor.

Reviewed by:	cem, oshogbo (previous version)
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17963
2018-11-13 20:07:55 +00:00
Mark Johnston
0f9b7bf37a Add accounting to per-domain UMA full bucket caches.
In particular, track the current size of the cache and maintain an
estimate of its working set size.  This will be used to decide how
much to shrink various caches when the kernel attempts to reclaim
pages.  As a secondary effect, it makes statistics aggregation (done
by, e.g., vmstat -z) cheaper since sysctl_vm_zone_stats() no longer
needs to iterate over lists of cached buckets.

Discussed with:	alc, glebius, jeff
Tested by:	pho (previous version)
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D16666
2018-11-13 19:44:40 +00:00
Gleb Smirnoff
a82296c2df Uninline epoch(9) entrance and exit. There is no proof that modern
processors would benefit from avoiding a function call, but bloating
code. In fact, clang created an uninlined real function for many
object files in the network stack.

- Move epoch_private.h into subr_epoch.c. Code copied exactly, avoiding
  any changes, including style(9).
- Remove private copies of critical_enter/exit.

Reviewed by:	kib, jtl
Differential Revision:	https://reviews.freebsd.org/D17879
2018-11-13 19:02:11 +00:00
Mark Johnston
bb4a27f927 Allow allocations across meta boundaries.
Remove restrictions that prevent allocation requests to cross the
boundary between two meta nodes.

Replace the bmu_avail field in meta nodes with a bitmap that identifies
which subtrees have some free memory, and iterate over the nonempty
subtrees only in blst_meta_alloc.  If free memory is scarce, this should
make searching for it faster.

Put the code for handling the next-leaf allocation in a separate
function.  When taking blocks from the next leaf empties the leaf, be
sure to clear the appropriate bit in its parent, and so on, up to the
least-common ancestor of this leaf and the next.

Eliminate special terminator nodes, and rely instead on the fact that
there is a 0-bit at the end of the bitmask at the root of the tree that
will stop a meta_alloc search, or a next-leaf search, before the search
falls off the end of the tree. Make sure that the tree is big enough to
have space for that 0-bit.

Eliminate special all-free indicators.  Lazy initialization of subtrees
stands in the way of having an allocation span a meta-node boundary, so
a subtree of all free blocks is not treated specially.  Subtrees of
all-allocated blocks are still recognized by looking at the bitmask at
the root and finding 0.

Don't print all-allocated subtrees.  Do print the bitmasks for meta
nodes, when tree-printing.

Submitted by:	Doug Moore <dougm@rice.edu>
Reviewed by:	alc
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D12635
2018-11-13 18:40:01 +00:00
Mark Johnston
6f8ba91638 RISC-V: Implement get_cyclecount(9).
Add the missing implementation for get_cyclecount(9) on RISC-V by
reading the cycle CSR.

Submitted by:	Mitchell Horne <mhorne063@gmail.com>
Reviewed by:	jhb
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D17953
2018-11-13 18:20:27 +00:00
Mark Johnston
1e2ceeb16a RISC-V: Add macros for reading performance counter CSRs.
The RISC-V spec defines several performance counter CSRs such as: cycle,
time, instret, hpmcounter(3...31).  They are defined to be 64-bits wide
on all RISC-V architectures.  On RV64 and RV128 they can be read from a
single CSR.  On RV32, additional CSRs (given the suffix "h") are present
which contain the upper 32 bits of these counters, and must be read as
well.  (See section 2.8 in the User ISA Spec for full details.)

This change adds macros for reading these values safely on any RISC-V
ISA length.  Obviously we aren't supporting anything other than RV64
at the moment, but this ensures we won't need to change how we read
these values if we ever do.

Submitted by:	Mitchell Horne <mhorne063@gmail.com>
Reviewed by:	jhb
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D17952
2018-11-13 18:12:06 +00:00
Kevin Bowling
0d909f4ccf powerpc64: reduce GENERIC64 diff versus amd64 GENERIC
Reviewed by:	jhibbits
Approved by:	timur (mentor)
Differential Revision:	https://reviews.freebsd.org/D17515
2018-11-13 09:19:07 +00:00
Kyle Evans
75beb4d46a Add dynamic_kenv assertion to init_static_kenv
Both to formally document the requirement that this not be called after the
dynamic kenv is setup, and to perhaps help static analyzers figure out
what's going on. While calling init_static_kenv this late isn't fatal, there
are some caveats that the caller should be aware of:

- Late calls are effectively a no-op, as far as default FreeBSD is
concerned, as everything will switch to searching the dynamic kenv once it's
available.

- Each of the kern_getenv calls will leak memory, as it's assumed that
these are searching static environment and allocations will not be made.

As such, this usage is not sensible and should be detected.
2018-11-13 04:34:30 +00:00
Kyle Evans
851f1a1121 Fix test-dts{,o} targets
There were two main problems here:

1.) sys/dts/Makefile.inc is not included from various */overlays directories
    by default, only ../Makefile.inc
2.) When shelling out for DTS/DTSO, cwd != .CURDIR, so enumeration always
    failed

These changes allow make test-dts and make test-dtso to function in their
respective directories.

Reviewed by:	manu
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D17961
2018-11-12 22:18:11 +00:00
Niclas Zeising
af14df7703 Add evdev support to amd64 and i386 kernels
Include evdev support and drivers in the amd64 and i386 GENERIC and MINIMAL
kernels.  Evdev is used by X and wayland to handle input devices, and this
change, together with upcomming changes in ports will make us handle input
devices better in graphical UIs.

Reviewed by:	wulf, bapt, imp
Approved by:	imp
Differential Revision:	https://reviews.freebsd.org/D17912
2018-11-12 21:01:28 +00:00
Konstantin Belousov
83813c6696 Apply fix to un-cripple max cpu id on BSP earlier.
We need to know actual value for the standard extended features before
ifuncs are resolved.

Reported and tested by:	madpilot
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2018-11-12 19:17:26 +00:00
Julien Charbon
23d903a783 cxgbe/netmap: Fix cxgbe netmap when interface is DOWN
A kernel panic can occur if the cxgbe interface is DOWN
when activating netmap. This patch prevents the driver
from freeing up cxgbe netmap resources when they have not
been allocated.

Submitted by:	Nicolas Witkowski <nwitkowski@verisign.com>
Reviewed by:	np
MFC after:	1 week
Sponsored by:	Verisign, Inc.
Differential Revision:	https://reviews.freebsd.org/D17802
2018-11-12 17:57:12 +00:00