Commit Graph

247307 Commits

Author SHA1 Message Date
Mateusz Guzik
cc3593fbd9 vfs: rework vnode list management
The current notion of an active vnode is eliminated.

Vnodes transition between 0<->1 hold counts all the time and the
associated traversal between different lists induces significant
scalability problems in certain workloads.

Introduce a global list containing all allocated vnodes. They get
unlinked only when UMA reclaims memory and are only requeued when
hold count reaches 0.

Sample result from an incremental make -s -j 104 bzImage on tmpfs:
stock:   118.55s user 3649.73s system 7479% cpu 50.382 total
patched: 122.38s user 1780.45s system 6242% cpu 30.480 total

Reviewed by:	jeff
Tested by:	pho (in a larger patch, previous version)
Differential Revision:	https://reviews.freebsd.org/D22997
2020-01-13 02:37:25 +00:00
Mateusz Guzik
80663cadb8 ufs: use lazy list instead of active list for syncer
Quota code is temporarily regressed to do a full vnode scan.

Reviewed by:	jeff
Tested by:	pho (in a larger patch, previous version)
Differential Revision:	https://reviews.freebsd.org/D22996
2020-01-13 02:35:15 +00:00
Mateusz Guzik
57083d2576 vfs: add per-mount vnode lazy list and use it for deferred inactive + msync
This obviates the need to scan the entire active list looking for vnodes
of interest.

msync is handled by adding all vnodes with write count to the lazy list.

deferred inactive directly adds vnodes as it sets the VI_DEFINACT flag.

Vnodes get dequeued from the list when their hold count reaches 0.

Newly added MNT_VNODE_FOREACH_LAZY* macros support filtering so that
spurious locking is avoided in the common case.

Reviewed by:	jeff
Tested by:	pho (in a larger patch, previous version)
Differential Revision:	https://reviews.freebsd.org/D22995
2020-01-13 02:34:02 +00:00
Mateusz Guzik
ac4ec14188 ufs: add a setter for inode i_flag field
This will be used later to add vnodes to the lazy list.

Reviewed by:	kib (previous version), jeff
Tested by:	pho (in a larger patch)
Differential Revision:	https://reviews.freebsd.org/D22994
2020-01-13 02:31:51 +00:00
Conrad Meyer
365cd52245 Fix a typo in r356667 comment
No functional change.

Reported by:	bdragon
Approved by:	csprng(markm), earlier version
X-MFC-With:	r356667
2020-01-12 23:52:16 +00:00
Conrad Meyer
86def3dcd6 getrandom(2): Add Linux GRND_INSECURE API flag
Treat it as a synonym for GRND_NONBLOCK.  The reasoning is this:

We have two choices for handling Linux's GRND_INSECURE API flag.

1. We could ignore it completely (like GRND_RANDOM).  However, this might
produce the surprising result of GRND_INSECURE requests blocking, when the
Linux API does not block.

2. Alternatively, we could treat GRND_INSECURE requests as requests for
GRND_NONBLOCk.  Here, the surprising result for Linux programs is that
invocations with unseeded random(4) will produce EAGAIN, rather than
garbage.

Honoring the flag in the way Linux does seems fraught.  If we actually use
the output of a random(4) implementation prior to seeding, we leak some
entropy (in an information theory and also practical sense) from what will
be the initial seed to attackers (or allow attackers to arbitrary DoS
initial seeding, if we don't leak).  This seems unacceptable -- it defeats
the purpose of blocking on initial seeding.

Secondary to that concern, before seeding we may have arbitrarily little
entropy collected; producing output from zero or a handful of entropy bits
does not seem particularly useful to userspace.

If userspace can accept garbage, insecure, non-random bytes, they can create
their own insecure garbage with srandom(time(NULL)) or similar.  Any program
which would be satisfied with a 3-bit key CTR stream has no need for CSPRNG
bytes.  So asking the kernel to produce such an output from the secure
getrandom(2) API seems inane.

For now, we've elected to emulate GRND_INSECURE as an alternative spelling
of GRND_NONBLOCK (2).  Consider this API not-quite stable for now.  We
guarantee it will never block.  But we will attempt to monitor actual port
uptake of this bizarre API and may revise our plans for the unseeded
behavior (prior stable/13 branching).

Approved by:	csprng(markm), manpages(bcr)
See also:	https://lwn.net/ml/linux-kernel/cover.1577088521.git.luto@kernel.org/
See also:	https://lwn.net/ml/linux-kernel/20200107204400.GH3619@mit.edu/
Differential Revision:	https://reviews.freebsd.org/D23130
2020-01-12 20:47:38 +00:00
Garance A Drosehn
526473251e Fix the way 'factor' behaves when using OpenSSL to match the description
of how it works when not compiled with OpenSSL.

Also, allow users to specify a hexadecimal number by using a prefix of
'0x'.  Before this, users could only specify a hexadecimal value if that
value included a hex digit ('a'-'f') in the value.

PR:		243136
Submitted by:	Steve Kargl
Reviewed by:	gad
MFC after:	3 weeks
2020-01-12 20:25:11 +00:00
Michael Tuexen
fe1274ee39 Fix race when accepting TCP connections.
When expanding a SYN-cache entry to a socket/inp a two step approach was
taken:
1) The local address was filled in, then the inp was added to the hash
   table.
2) The remote address was filled in and the inp was relocated in the
   hash table.
Before the epoch changes, a write lock was held when this happens and
the code looking up entries was holding a corresponding read lock.
Since the read lock is gone away after the introduction of the
epochs, the half populated inp was found during lookup.
This resulted in processing TCP segments in the context of the wrong
TCP connection.
This patch changes the above procedure in a way that the inp is fully
populated before inserted into the hash table.

Thanks to Paul <devgs@ukr.net> for reporting the issue on the net@
mailing list and for testing the patch!

Reviewed by:		rrs@
MFC after:		1 week
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D22971
2020-01-12 17:52:32 +00:00
Bjoern A. Zeeb
c6feea3b89 nd6_rtr: constantly use __func__ for nd6log()
Over time one or two hard coded function names did not match the
actual function anymore.  Consistently use __func__ for nd6log() calls
and re-wrap/re-format some messages for consitency.

MFC after:	2 weeks
2020-01-12 17:41:09 +00:00
Bjoern A. Zeeb
25ebfe3350 nd6_rtr: make nd6_prefix_onlink() static
nd6_prefix_onlink() is not used anywhere outside nd6_rtr.c.  Stop
exporting it and make it file local static.
2020-01-12 16:58:21 +00:00
Michael Tuexen
fc0eb7637c Fix division by zero issue.
Thanks to Stas Denisov for reporting the issue for the userland stack
and providing a fix.

MFC after:		3 days
2020-01-12 15:45:27 +00:00
Edward Tomasz Napierala
ca603bb1ee dd kern_getpriority(), make Linuxulator use it.
Reviewed by:	kib, emaste
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22842
2020-01-12 14:25:44 +00:00
Edward Tomasz Napierala
7a0ef283e6 Add kern_setpriority(), use it in Linuxulator.
Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22841
2020-01-12 13:38:51 +00:00
Xin LI
d14a599d69 Tighten FAT checks and fix off-by-one error in corner case.
sbin/fsck_msdosfs/fat.c:
 - readfat:
    * Only truncate out-of-range cluster pointers (1, or greater than
      NumClusters but smaller than CLUST_RSRVD), as the current cluster
      may contain some data. We can't fix reserved cluster pointers at
      this pass, because we do no know the potential cluster preceding
      it.
    * Accept valid cluster for head bitmap. This is a no-op, and mainly
      to improve code readability, because the 1 is already handled in
      the previous else if block.
 - truncate_at: absorbed into checkchain.
 - checkchain: save the previous node we have traversed in case that we
   have a chain that ends with a special (>= CLUST_RSRVD) cluster, or is
   free. In these cases, we need to truncate at the cluster preceding the
   current cluster, as the current cluster contains a marker instead of
   a next pointer and can not be changed to CLUST_EOF (the else case can
   happen if the user answered "no" at some point in readfat()).
 - clearchain: correct the iterator for next cluster so that we don't
   stop after clearing the first cluster.
 - checklost: If checkchain() thinks the chain have no cluster, it
   doesn't make sense to reconnect it, so don't bother asking.

Reviewed by:	kevlo
MFC after:	24 days
X-MFC-With:	r356313
Differential Revision:	https://reviews.freebsd.org/D23065
2020-01-12 06:13:52 +00:00
Mateusz Guzik
d199ad3b44 Add "panicked" boolean which can be tested instead of panicstr
The test is performed all the time and reading entire panicstr to do it
wastes space.
2020-01-12 06:09:10 +00:00
Mateusz Guzik
879e0604ee Add KERNEL_PANICKED macro for use in place of direct panicstr tests 2020-01-12 06:07:54 +00:00
Mateusz Guzik
76a49ebaa6 sysctl: add missing CLTFLAG_MPSAFE annotation to CONST_STRING 2020-01-12 05:25:06 +00:00
Mateusz Guzik
a314aba874 vm: add missing CLTFLAG_MPSAFE annotations
This covers all vm/* files.
2020-01-12 05:08:57 +00:00
Mateusz Guzik
638af813d9 dtrace: add missing CLTFLAG_MPSAFE annotations 2020-01-12 04:53:22 +00:00
Mateusz Guzik
20fa645666 zfs: add missing CLTFLAG_MPSAFE annotations 2020-01-12 04:53:01 +00:00
Kyle Evans
4f47920e9c Makefile.inc1: push /usr/libexec into the BPATH/TMPPATH
${WORLDTMP}/legacy/usr/libexec will only have libexec/ bits that we've
pushed as bootstrap tools, so this is generally safe to include prior to
PATH. The following are the ramifications of this change:

- BPATH addition gets us at least bootstrap flua in WMAKEENV path for
  buildenv, for those earlier systems where it's bootstrapped still

- Reworked the sysent target to just set PATH and let it get worked out in
  src.lua.mk or individual sysent makefiles -- this gives us back the
  ability to overwrite LUA_CMD and use a different/external lua for these
  targets.  sysent can also now work cleanly in buildenv.

- tools/build/Makefile will now symlink the host flua into build's host
  tools so that the above can work without needing to add the host's
  /usr/libexec explicitly into TMPPATH.

Reviewed by:	arichardson, brooks, imp (all slightly earlier version)
Differential Revision:	https://reviews.freebsd.org/D22464
2020-01-12 04:18:36 +00:00
Kyle Evans
89476f9c99 regulator: small enhancements to regulator_shutdown
Highlights:

- Exit early if we're not disabling unused regulators; there's no need to
  take the regulator topology lock and re-evaluate this every iteration, as
  it's not going to change.
- Don't emit a notice that we're shutting down a regulator if it's not
  enabled, to reduce noise.
- Mention the outcome of the shutdown, to aide debugging and easily let
  developer/user collect list of regulators we actually shutdown to
  determine problematic one.

Reviewed by:	manu
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D22213
2020-01-12 04:07:03 +00:00
Mateusz Guzik
91de98e6d4 vfs: only recalculate watermarks when limits are changing
Previously they would get recalculated all the time, in particular in:
getnewvnode -> vcheckspace -> vspace
2020-01-11 23:00:57 +00:00
Mateusz Guzik
e6ae744e0e vfs: deduplicate vnode allocation logic
This creates a dedicated routine (vn_alloc) to allocate vnodes.

As a side effect code duplicationw with getnewvnode_reserve is eleminated.

Add vn_free for symmetry.
2020-01-11 22:59:44 +00:00
Mateusz Guzik
b52d50cf69 vfs: prealloc vnodes in getnewvnode_reserve
Having a reserved vnode count does not guarantee that getnewvnodes wont
block later. Said blocking partially defeats the purpose of reserving in
the first place.

Preallocate instaed. The only consumer was always passing "1" as count
and never nesting reservations.
2020-01-11 22:58:14 +00:00
Mateusz Guzik
6928306764 vfs: incomplete pass at converting more ints to u_long
Most notably numvnodes and freevnodes were u_long, but parameters used to
govern them remained as ints.
2020-01-11 22:56:20 +00:00
Mateusz Guzik
bf62296f35 vfs: add missing CLTFLA_MPSAFE annotations
This covers all kern/vfs_*.c files.
2020-01-11 22:55:12 +00:00
Justin Hibbits
7d7671db00 powerpc/mpc85xx: Fix localbus child reg property decoding
r302340, as an attempt to fix the localbus child handling post-rman change,
actually broke child resource allocation, due to typos in
fdt_lbc_reg_decode().  This went unnoticed because there aren't any drivers
currently in tree that use localbus.
2020-01-11 22:29:44 +00:00
Gleb Smirnoff
629667a148 Pacify gcc.
Reported by:	rlibby
2020-01-11 20:07:30 +00:00
Bjoern A. Zeeb
e1891232fc in6_mcast: make in6_joingroup_locked() static
in6_joingroup_locked() is only used file-local. No need to export it
hance make it static.
2020-01-11 18:55:12 +00:00
Emmanuel Vadot
c9f3a1ac17 arm64: allwinner: dtso: Add spi0 spigen DTSO
This overlays can be used on A64 board to use spigen and spi(8)
on the spi0 pins.

Tested On:  Pine64-LTS, A64-Olinuxino

Submitted by:	Gary Otten <gdotten@gmail.com>
2020-01-11 18:36:10 +00:00
Xin LI
d3dd66792b Correct off-by-two issue when determining FAT type.
In the code we used NumClusters as the upper (non-inclusive) boundary
of valid cluster number, so the actual value was 2 (CLUST_FIRST) more
than the real number of clusters. This causes a FAT16 media with
65524 clusters be treated as FAT32 and might affect FAT12 media with
4084 clusters as well.

To fix this, we increment NumClusters by CLUST_FIRST after the type
determination.

PR:		243179
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D23082
2020-01-11 17:41:20 +00:00
Hans Petter Selasky
ae5b45c86e Make sure the VNET is properly set when reaping mbufs in ipoib.
Else the following panic may happen:

panic()
icmp_error()
ipoib_cm_mb_reap()
linux_work_fn()
taskqueue_run_locked()
taskqueue_thread_loop()
fork_exit()
fork_trampoline()

Submitted by:	Andreas Kempe <kempe@lysator.liu.se>
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2020-01-11 12:02:16 +00:00
Hans Petter Selasky
5bc41c932f Revert r356598 for now because it breaks some AMD based XHCI controllers.
Reported by:	jkim @
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2020-01-11 11:38:02 +00:00
Konstantin Belousov
7e3300e505 rtld: clean up Makefile.
Move all MD statements into $MACHINE_ARCH/Makefile.inc.
Unconditionally apply version script to rtld, the interpreter is not
functional without it for long time.

Reviewed by:	brooks, emaste
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D23083
2020-01-11 09:18:58 +00:00
Konstantin Belousov
1021c8d705 Stop prepending prefix to the result of realpath(3).
The path is already absolute.

Noted and reviewed by:	rstone
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D23121
2020-01-11 09:08:02 +00:00
Xin LI
727d995c7d Apply typo fix from NetBSD, we have already applied all NetBSD changes so
update the NetBSD tag while I'm there.

MFC after:	2 weeks
2020-01-11 04:02:40 +00:00
Xin LI
ed0879d944 Require FAT to occupy at least one sector.
Obtained from:	Android https://r.android.com/1205830
MFC after:	3 days
2020-01-11 03:59:06 +00:00
Kirk McKusick
27a6257130 When a read error occurs while fetching a directory block to delete
or rename an entry in it, properly reset the link count of the inode
associated with the entry that was to have been changed.

Tested by: Peter Holm
MFC after: 7 days
2020-01-11 03:18:47 +00:00
Pedro F. Giffuni
7e4c9d4893 Update ELFOSABI_* constants with OpenVOS.
Reference:
	https://www.sco.com/developers/gabi/latest/ch4.eheader.html
2020-01-11 01:44:55 +00:00
Jung-uk Kim
f425b8be7e MFV: r356607
Import ACPICA 20200110.
2020-01-10 22:49:14 +00:00
Kyle Evans
6486ccfe2f camdd: initialize devs earlier
GCC9 points out that devs may be used initialized after the bailout label;
in-fact, if num_io_opts != 2 then it is. Move the initialization up a little
bit.

Reviewed by:	ken
MFC after:	3 days
2020-01-10 22:20:23 +00:00
Ed Maste
0ce9d0af5b src.opts.mk: force KERBEROS_SUPPORT off where KERBEROS forced off
Explicitly setting WITHOUT_KERBEROS implies WITHOUT_KERBEROS_SUPPORT,
but previously other cases that forced KERBEROS off (such as
WITHOUT_CRYPT) did not also set KERBEROS_SUPPORT off.  Because the
_SUPPORT dependent options (KERBEROS/KERBEROS_SUPPORT) are processed
before other dependencies (CRYPT/KERBEROS) it's not easy to make this
happen automatically.  Instead just explicitly set KERBEROS_SUPPORT
off where we set KERBEROS off.

Reported by:	Michael Dexter's Build Option Survey run
2020-01-10 22:00:39 +00:00
Kyle Evans
53f8212826 tests: fusefs: silence remaining unsigned/signed comparison warnings
External GCC turns these into errors; cast to long to silence them.

Reviewed by:	asomers
Differential Revision:	https://reviews.freebsd.org/D23127
2020-01-10 21:51:27 +00:00
Gleb Smirnoff
ed6cbf4805 Add pfil(9) hook to vtnet(4).
The patch could be simplier, using only the second chunk to
vtnet_rxq_eof(), that passes full mbufs to pfil(9). Packet
filter would m_free() them in case of returning PFIL_DROPPED.

However, we pretend to be a hardware driver, so we first try
to pass a memory buffer via PFIL_MEMPTR feature. This is mostly
done for debugging purposes, so that one can experiment in bhyve
with packet filters utilizing same features as a true driver.
2020-01-10 21:22:03 +00:00
Loïc Bartoletti
1cd24ac42b Add myself (lbartoletti) as a ports commiter
Step 5 (Update Mentor and Mentee Information) from Commiters guide.

I also alphababetize mentees from tcberner.

Approved by:	tcberner (mentor)
Differential Revision:	https://reviews.freebsd.org/D23125
2020-01-10 20:53:58 +00:00
Gleb Smirnoff
9328cbc047 Always multiple vm.pgcache_zone_max to number of CPUs, and rename it
respectively.  The tunable controls how big is the size of per-cpu
vm page cache.  Previously the value was split for all CPUs in system,
so configuring same value on machines with different count of CPUs
yielded in different cache size available to a particular CPU.

Reviewed by:	markj
Obtained from:	Netflix
2020-01-10 19:32:08 +00:00
Emmanuel Vadot
ca4387843e arm: allwinner: axp209: Add regnode_status method
This allow consumers to check if the regulator is enable or not.

MFC after:	1 week
2020-01-10 18:53:14 +00:00
Emmanuel Vadot
b74b94d2a1 twsi: Rework how we handle the i2c messages
We use to handle each message separately in i2c_transfer but that cannot
work with message with NOSTOP as it confuses the controller that we disable
the interrupts and start a new message.
Handle every message in the interrupt handler and fire a new start condition
if the previous message have NOSTOP, the controller understand this as a
repeated start.
This fixes booting on Allwinner A10/A20 platform where before the i2c controller
used to write 0 to the PMIC register that control the regulators as it though that
this was the continuation of the write message.

Tested on:   A20 BananaPi, Cubieboard 1 (kevans)
Reported by:	kevans
MFC after:	1 month
2020-01-10 18:52:14 +00:00
Jung-uk Kim
8bf5cb5c35 Import ACPICA 20200110. 2020-01-10 18:46:46 +00:00