Commit Graph

254356 Commits

Author SHA1 Message Date
Mateusz Guzik
35bb59edc5 threads: reimplement tid allocation on top of a bitmap
There are workloads with very bursty tid allocation and since unr tries very
hard to have small-sized bitmaps it keeps reallocating memory. Just doing
buildkernel gives almost 150k calls to free coming from unr.

This also gets rid of the hack which tried to postpone TID reuse.

Reviewed by:	kib, markj
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D27101
2020-11-09 23:05:28 +00:00
Mateusz Guzik
1bd3cf5de5 threads: introduce a limit for total number
The intent is to replace the current id allocation method and a known upper
bound will be useful.

Reviewed by:	kib (previous version), markj (previous version)
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D27100
2020-11-09 23:04:30 +00:00
Mateusz Guzik
f6dd1aefb7 vfs: group mount per-cpu vars into one struct
While here move frequently read stuff into the same cacheline.

This shrinks struct mount by 64 bytes.

Tested by:	pho
2020-11-09 23:02:13 +00:00
Mateusz Guzik
6fcc846b59 vmstat: drop the HighUse field from malloc dump
It is hardwired to "-" since its introduction in 2005.

Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D27141
2020-11-09 23:00:29 +00:00
Mateusz Guzik
f0c90a0931 malloc: provide 384 byte zone
Total page count after buildworld on ZFS for 384 (if present) and 512 zones:
before: 29713
after: 25946

per-zone page use:
vm.uma.malloc_384.keg.domain.1.pages: 11621
vm.uma.malloc_384.keg.domain.0.pages: 11597
vm.uma.malloc_512.keg.domain.1.pages: 1280
vm.uma.malloc_512.keg.domain.0.pages: 1448

Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D27145
2020-11-09 22:59:41 +00:00
Mateusz Guzik
8e6526e966 malloc: retire mt_stats_zone in favor of pcpu_zone_64
Reviewed by:	markj, imp
Differential Revision:	https://reviews.freebsd.org/D27142
2020-11-09 22:58:29 +00:00
Michael Tuexen
283c76c7c3 RFC 7323 specifies that:
* TCP segments without timestamps should be dropped when support for
  the timestamp option has been negotiated.
* TCP segments with timestamps should be processed normally if support
  for the timestamp option has not been negotiated.
This patch enforces the above.

PR:			250499
Reviewed by:		gnn, rrs
MFC after:		1 week
Sponsored by:		Netflix, Inc
Differential Revision:	https://reviews.freebsd.org/D27148
2020-11-09 21:49:40 +00:00
Emmanuel Vadot
db6a0c8f47 Bump __FreeBSD_version after linuxkpi changes 2020-11-09 13:20:44 +00:00
Emmanuel Vadot
dab39c11af LinuxKPI: Implement ACPI bits required by drm-kmod in base system
It includes:

ACPI_HANDLE() implementation.
AC and VIDEO ACPI events notification support.
Replacement of hand-rolled GPLed _DSM method evaluation helpers
with in-base ones.

Submitted by:	wulf
Differential Revision:	https://reviews.freebsd.org/D26603
2020-11-09 13:20:14 +00:00
Michael Tuexen
e597bae4ee Fix a potential use-after-free bug introduced in
https://svnweb.freebsd.org/changeset/base/363046

Thanks to Taylor Brandstetter for finding this issue using fuzz testing
and reporting it in https://github.com/sctplab/usrsctp/issues/547
2020-11-09 13:12:07 +00:00
Edward Tomasz Napierala
e3b1c847a4 Make it possible to mount a fuse filesystem, such as squashfuse,
from a Linux binary.  Should come handy for AppImages.

Reviewed by:	asomers
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D26959
2020-11-09 08:53:15 +00:00
Warner Losh
8b8af16875 Remove newline from bxe description, it's not done elsewhere. 2020-11-09 03:02:34 +00:00
Mateusz Guzik
3a440a421d Add more per-cpu zones.
This covers powers of 2 up to 64.

Example pending user is ZFS.
2020-11-09 00:34:23 +00:00
Navdeep Parhar
de0a3472d8 cxgbe(4): Allow the PF driver to set a VF's MAC address.
The MAC address can be set with the optional mac-addr property in the VF
section of the iovctl.conf(5) used to instantiate the VFs.

MFC after:	2 weeks
Sponsored by:	Chelsio Communications
2020-11-09 00:08:35 +00:00
Mateusz Guzik
b54ed68408 vmstat: remove spurious newlines when reporting zones 2020-11-09 00:05:45 +00:00
Mateusz Guzik
523d66730c procdesc: convert the zone to a malloc type
The object is 128 bytes in size.
2020-11-09 00:05:21 +00:00
Mateusz Guzik
62d77e4e0c bufcache: convert bo_numoutput from long to int
int is wide enough and it plugs a hole in struct vnode, taking it down
from 496 to 488 bytes.
2020-11-09 00:04:58 +00:00
Mateusz Guzik
e90afaa015 kqueue: save space by using only one func pointer for assertions 2020-11-09 00:04:35 +00:00
Navdeep Parhar
dc0800a9ad cxgbev(4): Use the MAC address set by the the PF if there is one.
Query the firmware for the MAC address set by the PF for the VF and use
it instead of the firmware generated MAC if it's available.

MFC after:	2 weeks
Sponsored by:	Chelsio Communications
2020-11-09 00:01:13 +00:00
Brandon Bergren
8801df34f0 [PowerPC] Fix powerpc64le boot after HPT superpages addition
The HPT is always stored in big-endian, as it is accessed directly by the
hardware as well as the kernel. As such, it is necessary to convert values
to and from native endian when running on LE.

Some unconverted accesses snuck in accidentally with r367417.

Apply the appropriate conversions to fix boot hanging on powerpc64le.

Sponsored by:	Tag1 Consulting, Inc.
2020-11-08 23:34:06 +00:00
Navdeep Parhar
76b976ad98 cxgbe(4): Add the firmware binaries missing in r367428.
Obtained from:	Chelsio Communications
MFC after:	5 days
Sponsored by:	Chelsio Communications
2020-11-08 22:30:13 +00:00
Mitchell Horne
4a3fc6e22e Fix definition of rn_addmask()
Add the missing static keyword present in the declaration.

Reviewed by:	melifaro
Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D27024
2020-11-08 19:02:22 +00:00
Mitchell Horne
b02c4e5c78 igmp: convert igmpstat to use PCPU counters
Currently there is no locking done to protect this structure. It is
likely okay due to the low-volume nature of IGMP, but allows for
the possibility of underflow. This appears to be one of the only
holdouts of the conversion to counter(9) which was done for most
protocol stat structures around 2013.

This also updates the visibility of this stats structure so that it can
be consumed from elsewhere in the kernel, consistent with the vast
majority of VNET_PCPUSTAT structures.

Reviewed by:	kp
Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D27023
2020-11-08 18:49:23 +00:00
Richard Scheffenegger
4d0770f172 Prevent premature SACK block transmission during loss recovery
Under specific conditions, a window update can be sent with
outdated SACK information. Some clients react to this by
subsequently delaying loss recovery, making TCP perform very
poorly.

Reported by:	chengc_netapp.com
Reviewed by:	rrs, jtl
MFC after:	2 weeks
Sponsored by:	NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D24237
2020-11-08 18:47:05 +00:00
Alexander V. Chernikov
2d39824195 Switch net.add_addr_allfibs default to 0.
The goal of the fib support is to provide multiple independent
 routing tables, isolated from each other.
net.add_addr_allfibs default tries to shift gears in the opposite
 direction, unconditionally inserting all addresses to all of the fibs.

There are use cases when this is necessary, however this is not a
 default expected behaviour, especially compared to other implementations.

Provide WARNING message for the setups with multiple fibs to notify
 potential users of the feature.

Differential Revision:	https://reviews.freebsd.org/D26076
2020-11-08 18:27:49 +00:00
Alexander V. Chernikov
76e6b37f6b Temporarily revert setting net.add_addr_allfibs to 0.
It accidentally sweeped in r367486.
Revert to allow for proper commit message & warning.
2020-11-08 18:11:12 +00:00
Edward Tomasz Napierala
a1bd83fede Move syscall_thread_{enter,exit}() into the slow path. This is only
needed for syscalls from unloadable modules.

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D26988
2020-11-08 15:54:59 +00:00
Mariusz Zaborski
36d6566e59 Check if the ZVOL has been written before calling zil_async_to_sync.
The ZIL will be opened on the first write, not earlier.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mariusz Zaborski <oshogbo@vexillium.org>
OpenZFS Pull Request: https://github.com/openzfs/zfs/pull/11152
PR:		250934
2020-11-08 14:08:00 +00:00
Alexander V. Chernikov
770495f4c0 Fix build broken by r367484: add route_ifaddrs.c.
Pointy hat to: melifaro
Reported by:	jenkins
2020-11-08 13:30:44 +00:00
Dimitry Andric
1ee2434eb5 Merge commit 354d3106c from llvm git (by Kai Luo):
[PowerPC] Skip combining (uint_to_fp x) if x is not simple type

  Current powerpc64le backend hits
  ```
  Combining: t7: f64 = uint_to_fp t6
  llc: llvm-project/llvm/include/llvm/CodeGen/ValueTypes.h:291:
  llvm::MVT llvm::EVT::getSimpleVT() const: Assertion `isSimple() &&
  "Expected a SimpleValueType!"' failed.
  ```
  This patch fixes it by skipping combination if `t6` is not simple
  type.
  Fixed https://bugs.llvm.org/show_bug.cgi?id=47660.

  Reviewed By: #powerpc, steven.zhang

  Differential Revision: https://reviews.llvm.org/D88388

This should fix the llvm assertion mentioned above when building the
following ports for powerpc64le:

* audio/traverso
* databases/percona57-pam-for-mysql
* databases/percona57-server
* emulators/citra
* emulators/citra-qt5
* games/7kaa
* graphics/dia
* graphics/mandelbulber
* graphics/pcl-pointclouds
* net-p2p/libtorrent-rasterbar
* textproc/htmldoc

Requested by:	pkubaj
MFC after:	3 days
2020-11-08 12:47:35 +00:00
Alexander V. Chernikov
bad6b23606 Move all ifaddr route creation business logic to net/route/route_ifaddr.c
Differential Revision:	https://reviews.freebsd.org/D26318
2020-11-08 11:12:00 +00:00
Alexander Leidinger
8ec6c4a38b - add more linux socket options (sorted by value)
- map those IPv4 / IPv6 socket options which exist in FreeBSD
   + most of them visually verified to have the same type/layout of arguments
   + not tested with linux programs to behave as intended
 - be more human readable for known options which are not handled
 - be more verbose for unhandled socket message flags we know about
 - print the jail ID in linux_msg if run in a jail
 - add possibility to print debug message about known missing parts only once
 - add multiple levels of sysctl linux.debug:
   1: print debug messages, tell about unimplemented stuff (only once)
   2: like 1, but also print messages about implemented but not tested
      stuff (only once)
   3+: like 2, but no rate limiting of messages
 - increase default linux debug level from 1 to 3

We are a lot more verbose in as we need to be (e.g. some of the IP socket
options which are the same, and share the same memory layout, and are
believed to work). The reason is that we have no good testsuite to test those
linux-bits. The LTP or other test suites like the python one, are not fully
up to the task we need. As such the excessive messages about emulated but not
tested socket options.

IMO any MFC (possible, but most probably not by me) should set the default
debug level to 1.

Discussed with:	trasz
2020-11-08 09:50:58 +00:00
Toomas Soome
83a252c6a1 loader: cstyle cleanup of bootstrap.h did miss a bit
correct small issues - misplaced comment and typos.
2020-11-08 09:49:51 +00:00
Toomas Soome
90b307a897 loader: cstyle cleanup of bootstrap.h
No functional changes intended.
2020-11-08 09:35:41 +00:00
Olivier Cochard
c4fd0cc9ee Return the same value for smbios.chassis.maker as smbios.system.maker (and prevents returning a space character).
Reviewed by:	grehan
Approved by:	grehan
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D27123
2020-11-08 07:49:39 +00:00
Kyle Evans
8c28aa5e45 imgact_binmisc: limit the extent of match on incoming entries
imgact_binmisc matches magic/mask from imgp->image_header, which is only a
single page in size mapped from the first page of an image. One can specify
an interpreter that matches on, e.g., --offset 4096 --size 256 to read up to
256 bytes past the mapped first page.

The limitation is that we cannot specify a magic string that exceeds a
single page, and we can't allow offset + size to exceed a single page
either.  A static assert has been added in case someone finds it useful to
try and expand the size, but it does seem a little unlikely.

While this looks kind of exploitable at a sideways squinty-glance, there are
a couple of mitigating factors:

1.) imgact_binmisc is not enabled by default,
2.) entries may only be added by the superuser,
3.) trying to exploit this information to read what's mapped past the end
  would be worse than a root canal or some other relatably painful
  experience, and
4.) there's no way one could pull this off without it being completely
  obvious.

The first page is mapped out of an sf_buf, the implementation of which (or
lack thereof) depends on your platform.

MFC after:	1 week
2020-11-08 04:24:29 +00:00
Thomas Munro
cc7edd258c Add collation version support to querylocale(3).
Provide a way to ask for an opaque version string for a locale_t, so
that potential changes in sort order can be detected.  Similar to
ICU's ucol_getVersion() and Windows' GetNLSVersionEx(), this API is
intended to allow databases to detect when text order-based indexes
might need to be rebuilt.

The CLDR version is extracted from CLDR source data by the Makefile
under tools/tools/locale, written into the machine-generated Makefile
under shared/colldef, passed to localedef -V, and then written into
LC_COLLATE file headers.  The initial version is 34.0.
tools/tools/locale was recently updated to pull down 35.0, but the
output hasn't been committed under share/colldef yet, so that will
provide the first observable change when it happens.  Other versioning
schemes are possible in future, because the format is unspecified.

Reviewed by:	bapt, 0mp, kib, yuripv (albeit a long time ago)
Differential Revision:	https://reviews.freebsd.org/D17166
2020-11-08 02:50:34 +00:00
Warner Losh
d2799054f0 Also mention PORTS_MODULES
PORTS_MODULES is also an effective way to update the tree. Also
a minor rejustify on this an an adjacent paragraph.

Suggested by: David Wolfskill
2020-11-08 02:46:04 +00:00
Warner Losh
cc408e29d0 Be explicit about recompiling all the modules...
Add a note about always recompiling all modules on every new kernel
change / update. In addition, suggest using /usr/local/sys/modules
so this happens automatically.
2020-11-08 02:20:21 +00:00
Simon J. Gerraty
956e45f6fb Update to bmake-20201101
Lots of new unit-tests increase code coverage.

Lots of refactoring, cleanup and simlpification to reduce
code size.

Fixes for Bug 223564 and 245807

Updates to dirdeps.mk and meta2deps.py
2020-11-07 21:46:27 +00:00
Michael Tuexen
f908d8247e The ioctl() calls using FIONREAD, FIONWRITE, FIONSPACE, and SIOCATMARK
access the socket send or receive buffer. This is not possible for
listening sockets since r319722.
Because send()/recv() calls fail on listening sockets, fail also ioctl()
indicating EINVAL.

PR:			250366
Reported by:		Yong-Hao Zou
Reviewed by:		glebius, rscheff
MFC after:		1 week
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D26897
2020-11-07 21:17:49 +00:00
Simon J. Gerraty
302da1a3d3 Import bmake-20201101
Lots of new unit-tests increase code coverage.

Lots of refactoring, cleanup and simlpification to reduce
code size.

Fixes for Bug 223564 and 245807

Updates to dirdeps.mk and meta2deps.py
2020-11-07 19:39:21 +00:00
Cy Schubert
ce558a3f33 Fix build post-r367455.
MFC after:	2 weeks
X-MFC with:	r367455
2020-11-07 19:17:37 +00:00
Kyle Evans
1024ef27fe imgact_binmisc: move some calculations out of the exec path
The offset we need to account for in the interpreter string comes in two
variants:

1. Fixed - macros other than #a that will not vary from invocation to
   invocation
2. Variable - #a, which is substitued with the argv0 that we're replacing

Note that we don't have a mechanism to modify an existing entry.  By
recording both of these offset requirements when the interpreter is added,
we can avoid some unnecessary calculations in the exec path.

Most importantly, we can know up-front whether we need to grab
calculate/grab the the filename for this interpreter. We also get to avoid
walking the string a first time looking for macros. For most invocations,
it's a swift exit as they won't have any, but there's no point entering a
loop and searching for the macro indicator if we already know there will not
be one.

While we're here, go ahead and only calculate the argv0 name length once per
invocation. While it's unlikely that we'll have more than one #a, there's no
reason to recalculate it every time we encounter an #a when it will not
change.

I have not bothered trying to benchmark this at all, because it's arguably a
minor and straightforward/obvious improvement.

MFC after:	1 week
2020-11-07 18:07:55 +00:00
Bryan Drewery
9470af395f syslogd: Stop trying to send remote messages through special sockets
Specifically this was causing the /dev/klog fd and the signal pipe
handling fd to get a sendmsg(2) called on them and always returned
[ENOTSOCK].

r310350 combined these sockets into the main socket list and properly
skipped AF_UNSPEC at the sendmsg(2) call but later in r344739 it was
broken such that these special sockets were no longer excluded since
the AF_UNSPEC check specifically excluded these special sockets. Only
these special sockets have sl_sa = NULL. The sl_family checks should
be redundant now but are left in case of future changes so the intent
is clearer.

MFC after:	2 weeks
2020-11-07 17:18:44 +00:00
Mateusz Guzik
ff19fd6242 zfs: remove 2 assertions that teardown lock is not held
They are not very useful and hard to implement with rms.

This has a side effect of simplying the code.
2020-11-07 16:58:38 +00:00
Mateusz Guzik
42e7abd5db rms: several cleanups + debug read lockers handling
This adds a dedicated counter updated with atomics when INVARIANTS
is used. As a side effect one can reliably determine the lock is held
for reading by at least one thread, but it's still not possible to
find out whether curthread has the lock in said mode.

This should be good enough in practice.

Problem spotted by avg.
2020-11-07 16:57:53 +00:00
Kyle Evans
ecb4fdf943 imgact_binmisc: reorder members of struct imgact_binmisc_entry (NFC)
This doesn't change anything at the moment since the out-of-order elements
were a pair of uint32_t, but future additions may have caused unnecessary
padding by following the existing precedent.

MFC after:	1 week
2020-11-07 16:41:59 +00:00
Kyle Evans
e0f14ecf60 vt: resolve conflict between VT_ALT_TO_ESC_HACK and DBG
When using the ALT+CTRL+ESC sequence to break into kdb, the keyboard is
completely borked when you return. watch(8) shows that it's working, but
it's inserting escape sequences.

Further investigation revealed that VT_ALT_TO_ESC_HACK is the default and
directly conflicts with this sequence, so upon return from the debugger
ALKED is set.

If they triggered the break to debugger, it's safe to assume they didn't
mean to use VT_ALT_TO_ESC_HACK, so just unset it to reduce the surprise when
the keyboard seems non-functional upon return.

Reviewed by:	tsoome
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D27109
2020-11-07 15:38:01 +00:00
Michal Meloun
eb20867f52 Add a method to determine whether given interrupt is per CPU or not.
MFC after:	2 weeks
2020-11-07 14:58:01 +00:00