Commit Graph

256869 Commits

Author SHA1 Message Date
Hans Petter Selasky
9febbc4541 Fix for natd(8) sending wrong sequence number after TCP retransmission,
terminating a TCP connection.

If a TCP packet must be retransmitted and the data length has changed in the
retransmitted packet, due to the internal workings of TCP, typically when ACK
packets are lost, then there is a 30% chance that the logic in GetDeltaSeqOut()
will find the correct length, which is the last length received.

This can be explained as follows:

If a "227 Entering Passive Mode" packet must be retransmittet and the length
changes from 51 to 50 bytes, for example, then we have three cases for the
list scan in GetDeltaSeqOut(), depending on how many prior packets were
received modulus N_LINK_TCP_DATA=3:

  case 1:  index 0:   original packet        51
           index 1:   retransmitted packet   50
           index 2:   not relevant

  case 2:  index 0:   not relevant
           index 1:   original packet        51
           index 2:   retransmitted packet   50

  case 3:  index 0:   retransmitted packet   50
           index 1:   not relevant
           index 2:   original packet        51

This patch simply changes the searching order for TCP packets, always starting
at the last received packet instead of any received packet, in
GetDeltaAckIn() and GetDeltaSeqOut().

Else no functional changes.

Discussed with:	rscheff@
Submitted by:	Andreas Longwitz <longwitz@incore.de>
PR:		230755
MFC after:	1 week
Sponsored by:	Mellanox Technologies // NVIDIA Networking
2021-02-22 17:13:58 +01:00
Roger Pau Monné
808d4aad10 xen-blkback: fix leak of grant maps on ring setup failure
Multi page rings are mapped using a single hypercall that gets passed
an array of grants to map. One of the grants in the array failing to
map would lead to the failure of the whole ring setup operation, but
there was no cleanup of the rest of the grant maps in the array that
could have likely been created as a result of the hypercall.

Add proper cleanup on the failure path during ring setup to unmap any
grants that could have been created.

This is part of XSA-361.

Sponsored by:	Citrix Systems R&D
2021-02-22 16:47:52 +01:00
Ed Maste
aa8ae5fe17 git hooks: add "Fixes" trailer to commit message template
A number of projects use "Fixes: <hash>" to identify a commit that is
fixed by a given change.  Adopt that convention.

Differential Revision:	https://reviews.freebsd.org/D28693
2021-02-22 10:29:56 -05:00
Mark Johnston
608c44f96e m_uiotombuf_nomap(): Stop clearing PG_ZERO in newly allocated pages
The caller should not be passing M_ZERO in the first place, so PG_ZERO
will not be preserved by the page allocator and clearing it accomplishes
nothing.

Reviewed by:	gallatin, jhb
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D28808
2021-02-22 10:04:46 -05:00
Stefan Eßer
a0ba293c2f Add missing entry for zfs_racct.c 2021-02-22 15:06:48 +01:00
Martin Matuska
ba27dd8be8 zfs: merge OpenZFS master-9312e0fd1
Notable upstream changes:
  778869fa1 Fix reporting of mount progress
  e7adccf7f Disable use of hardware crypto offload drivers on FreeBSD
  03e02e5b5 Fix checksum errors not being counted on repeated repair
  64e0fe14f Restore FreeBSD resource usage accounting
  11f2e9a49 Fix panic if scrubbing after removing a slog device

MFC after:	2 weeks
2021-02-22 13:01:17 +01:00
Alexander Motin
c02a28754b Fix build after 2c7dc6bae9.
MFC after:	1 month
2021-02-21 17:21:14 -05:00
Alexander Motin
2c7dc6bae9 Refactor CTL datamove KPI.
- Make frontends call unified CTL core method ctl_datamove_done()
to report move completion.  It allows to reduce code duplication
in differerent backends by accounting DMA time in common code.
 - Add to ctl_datamove_done() and be_move_done() callback samethr
argument, reporting whether the callback is called in the same
context as ctl_datamove().  It allows for some cases like iSCSI
write with immediate data or camsim frontend write save one context
switch, since we know that the context is sleepable.
 - Remove data_move_done() methods from struct ctl_backend_driver,
unused since forever.

MFC after:	 1 month
2021-02-21 16:52:33 -05:00
Jamie Gritton
1158508a80 jail: Add pr_state to struct prison
Rather that using references (pr_ref and pr_uref) to deduce the state
of a prison, keep track of its state explicitly.  A prison is either
"invalid" (pr_ref == 0), "alive" (pr_uref > 0) or "dying"
(pr_uref == 0).

State transitions are generally tied to the reference counts, but with
some flexibility: a new prison is "invalid" even though it now starts
with a reference, and jail_remove(2) sets the state to "dying" before
the user reference count drops to zero (which was prviously
accomplished via the PR_REMOVE flag).

pr_state is protected by both the prison mutex and allprison_lock, so
it has the same availablity guarantees as the reference counts do.

Differential Revision:	https://reviews.freebsd.org/D27876
2021-02-21 13:24:47 -08:00
Mateusz Guzik
2443068d48 vfs: shrink struct vnode to 448 bytes on LP64
... by moving v_hash into a 4 byte hole.

Combined with several previous size reductions this makes the size small
enough to fit 9 vnodes per page as opposed to 8.

Add a compilation time assert so that this is not unknowingly worsened.

Note the structure still remains bigger than it should be.
2021-02-21 21:07:14 +00:00
Mateusz Guzik
ee9b37ae5c jail: fix build after the previous commit
Noted by: Michael Butler <imb protected-networks.net>
2021-02-21 21:05:25 +00:00
Martin Matuska
0626917d07 Update vendor/openzfs to master-9312e0fd1
Notable changes:
- fix reporting of mount progress (778869fa1)
- disable use of hardware crypto offload drivers on FreeBSD (e7adccf7f)
- fix checksum errors not being counted on repeated repair (03e02e5b5)
- restore FreeBSD resource usage accounting (64e0fe14f)
- fix panic if scrubbing after removing a slog device (11f2e9a49)
2021-02-21 21:22:07 +01:00
Jamie Gritton
f7496dcab0 jail: Change the locking around pr_ref and pr_uref
Require both the prison mutex and allprison_lock when pr_ref or
pr_uref go to/from zero.  Adding a non-first or removing a non-last
reference remain lock-free.  This means that a shared hold on
allprison_lock is sufficient for prison_isalive() to be useful, which
removes a number of cases of lock/check/unlock on the prison mutex.

Expand the locking in kern_jail_set() to keep allprison_lock held
exclusive until the new prison is valid, thus making invalid prisons
invisible to any thread holding allprison_lock (except of course the
one creating or destroying the prison).  This renders prison_isvalid()
nearly redundant, now used only in asserts.

Differential Revision:	https://reviews.freebsd.org/D28419
Differential Revision:	https://reviews.freebsd.org/D28458
2021-02-21 10:55:44 -08:00
Michael Tuexen
b963ce4588 sctp: improve computation of an alternate net
Espeially handle the case where the net passed in is about to
be deleted and therefore not in the list of nets anymore.

MFC after:	3 days
Reported by:	syzbot+9756917a7c8381adf5e8@syzkaller.appspotmail.com
2021-02-21 17:13:06 +01:00
Michael Tuexen
5ac839029d sctp: clear a pointer to a net which will be removed
MFC after:	3 days
2021-02-21 13:06:05 +01:00
Konstantin Belousov
8b7239681e ext2fs: clear write cluster tracking on truncation
Reviewed by:	fsu, mckusick
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D28679
2021-02-21 11:38:21 +02:00
Konstantin Belousov
2bfd8992c7 vnode: move write cluster support data to inodes.
The data is only needed by filesystems that
1. use buffer cache
2. utilize clustering write support.

Requested by:	mjg
Reviewed by:	asomers (previous version), fsu (ext2 parts), mckusick
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D28679
2021-02-21 11:38:21 +02:00
Konstantin Belousov
d485c77f20 Remove #define _KERNEL hacks from libprocstat
Make sys/buf.h, sys/pipe.h, sys/fs/devfs/devfs*.h headers usable in
userspace, assuming that the consumer has an idea what it is for.
Unhide more material from sys/mount.h and sys/ufs/ufs/inode.h,
sys/ufs/ufs/ufsmount.h for consumption of userspace tools, with the
same caveat.

Remove unacceptable hack from usr.sbin/makefs which relied on sys/buf.h
being unusable in userspace, where it override struct buf with its own
definition.  Instead, provide struct m_buf and struct m_vnode and adapt
code to use local variants.

Reviewed by:	mckusick
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D28679
2021-02-21 11:38:21 +02:00
Konstantin Belousov
750ea20d3f Delete dead CLUSTERDEBUG config option.
Reviewed by:	mckusick
Tested by:	pho
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D28679
2021-02-21 11:38:21 +02:00
Baptiste Daroussin
e6bb49f12c pci_vendors: update to 2021.02.20 2021-02-21 06:09:03 +01:00
Baptiste Daroussin
c9cb66f04d termcap: add an entry for the foot terminal
MFC after:	3 days
2021-02-21 06:06:47 +01:00
Mateusz Guzik
81174cd8e2 vfs: employ vfs_ref_from_vp in statfs and fstatfs
Avoids locking and unlocking the vnode.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D28695
2021-02-21 00:43:05 +00:00
Mateusz Guzik
a15f787adb vfs: add vfs_ref_from_vp
This generalizes what vop_stdgetwritemount used to be doing.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D28695
2021-02-21 00:43:05 +00:00
Mateusz Guzik
5fa12fe0cd amd64: implement strlen in assembly, take 2
Tested with glibc test suite.

The C variant in libkern performs excessive branching to find the zero
byte instead of using the bsfq instruction. The same code patched to use
it is still slower than the routine implemented here as the compiler
keeps neglecting to perform certain optimizations (like using leaq).

On top of that the routine can be used as a starting point for copyinstr
which operates on words intead of bytes.

The previous attempt had an instance of swapped operands to andq when
dealing with fully aligned case, which had a side effect of breaking the
code for certain corner cases. Noted by jrtc27.

Sample results:

$(perl -e "print 'A' x 3"):
stock:  211198039
patched:338626619
asm:    465609618

$(perl -e "print 'A' x 100"):
stock:   83151997
patched: 98285919
asm:    120719888

Reviewed by:	jhb, kib
Differential Revision:	https://reviews.freebsd.org/D28779
2021-02-21 00:43:05 +00:00
Jamie Gritton
6e1d1bfcac jail: Improve locking when removing prisons
Change the flow of prison_deref() so it doesn't let go of allprison_lock
until it's completely done using it (except for a possible drop as part
of an upgrade on its first try).

Differential Revision:	https://reviews.freebsd.org/D28458
MFC after:	3 days
2021-02-20 14:38:58 -08:00
Richard Scheffenegger
a8e431e153 PRR: use accurate rfc6675_pipe when enabled
Reviewed By: #transport, tuexen
MFC after:   2 weeks
Sponsored by:        NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D28816
2021-02-20 20:11:48 +01:00
Alexander V. Chernikov
f17f94cd1b Add arp/ndp tests in addition to rtsock ll tests. 2021-02-20 18:26:36 +00:00
Alexander V. Chernikov
e5b394f2d0 Fix setting static entries for arp/ndp.
rtsock message validation changes committed in 2fe5a79425
 did not take llinfo messages into account.

Add a special validation case for RTA_GATEWAY llinfo messages.

MFC after:	2 days
2021-02-20 18:26:35 +00:00
Ed Maste
020f411255 bsdinstall: add knob to set ASLR sysctls
Reviewed by:	mw
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D28418
2021-02-20 11:55:00 -05:00
Ed Maste
fbc57e2df9 bsdinstall: replace multiple ifs with case
Reduce copy-paste and use a more typical construct.

Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D28417
2021-02-20 11:54:31 -05:00
Guangyuan Yang
504e64af32 pwrite(2): add a BUGS section
Add a BUGS section about using pwrite(2) when O_APPEND is set on the fd.

MFC after:	3 days
Submitted by:	Ka Ho Ng <khng300@gmail.com>
Reviewed by:	gbe, yuripv
Differential Revision:	https://reviews.freebsd.org/D28372
2021-02-20 08:05:43 +00:00
Mark Johnston
150fc89a12 libdtrace: Trivial style fixes to force dt_lex.c to be regenerated
After commit 8ba333e02e ("libdtrace: Stop relying on lex
compatibility"), there have been several reports of incremental
buildworlds failing since make does not know that dt_lex.c needs to be
regenerated, and I want to avoid this when merging to stable/13.

MFC with:	8ba333e02e
2021-02-19 21:51:18 -05:00
Warner Losh
8cd1b2b1a7 boot: remove gptboot.efifat, it never should have been
conical hat reduction: Make sure we also remove gotboot.efifat. It was created,
briefly, and shouldn't have existed in the first place. Kill it at the same
place we kill boot1.efifat.

Pointy Hat to: imp@
2021-02-19 15:34:25 -07:00
Navdeep Parhar
038148c108 cxgbetool(8): Add support for setting the hashfilter mode (filter mask).
Tighten up the validation of filter modes while here.  Unrecognized
keywords will be now be flagged as errors instead of being ignored.
2021-02-19 14:23:58 -08:00
Navdeep Parhar
0460a45062 cxgbe(4): Use the correct filter width for T5+.
T5 and above have extra bits for the optional filter fields.  This is a
correctness issue and not just a waste because a filter mode valid on a
T4 (36b) may not be valid on a T5+ (40b).

MFC after:	2 weeks
Sponsored by:	Chelsio Communications
2021-02-19 14:23:58 -08:00
Navdeep Parhar
c91dda5ad9 cxgbe(4): Add a driver ioctl to set the filter mask.
Allow the filter mask (aka the hashfilter mode when hashfilters are
in use) to be set any time it is safe to do so.  The requested mask
must be a subset of the filter mode already.  The driver will not change
the mode or ingress config just to support a new mask.

MFC after:	2 weeks
Sponsored by:	Chelsio Communications
2021-02-19 14:23:58 -08:00
Navdeep Parhar
7ac8040a99 cxgbe(4): Use firmware commands to get/set filter configuration.
1. Query the firmware for filter mode, mask, and related ingress config
   instead of trying to figure them out from hardware registers.  Read
   configuration from the registers only when the firmware does not
   support this query.

2. Use the firmware to set the filter mode.  This is the correct way to
   do it and is more flexible as well.  The filter mode (and associated
   ingress config) can now be changed any time it is safe to do so.

   The user can specify a subset of a valid mode and the driver will
   enable enough bits to make sure that the mode is maxed out -- that
   is, it is not possible to set another bit without exceeding the
   total width for optional filter fields.  This is a hardware
   requirement that was not enforced by the driver previously.

MFC after:	2 weeks
Sponsored by:	Chelsio Communications
2021-02-19 14:23:58 -08:00
Jamie Gritton
d4380c0cdd jail: Change both root and working directories in jail_attach(2)
jail_attach(2) performs an internal chroot operation, leaving it up to
the calling process to assure the working directory is inside the jail.

Add a matching internal chdir operation to the jail's root.  Also
ignore kern.chroot_allow_open_directories, and always disallow the
operation if there are any directory descriptors open.

Reported by:    mjg
Approved by:    markj, kib
MFC after:      3 days
2021-02-19 14:13:35 -08:00
Mark Johnston
0f9544d03e iflib: Fix detach of pseudo interfaces
In commit 38bfc6dee3 we added an IFDI_DETACH() call to
iflib_pseudo_deregister() since it looked like it was missing.  One is
present in the error-handling path of iflib_pseudo_register().  However,
the detach actually comes from the DEVICE_DETACH() method for the
above-mentioned device_t, so now we're calling IFDI_DETACH() twice when
destroying a pseudo interface.

Fix the problem by not calling IFDI_DETACH() from the device detach
routine.  This way we can ensure that iflib de-initialization always
happens in a consistent order.  It also ensures that you can't do silly
things like "devctl detach <pseudo ifnet>", which would previously
detach the driver without tearing down the corresponding ifnet.

PR:		253541
Reviewed by:	erj
MFC after:	1 week
Fixes:		38bfc6dee3
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D28774
2021-02-19 17:10:41 -05:00
Dimitry Andric
d2b3fadf2d Revert 3c4fd2463b since upstream libcxxrt fixed it in another way
In 0ee0dbfb0d I imported a more recent
libcxxrt snapshot, which includes an upstream fix for the padding of
struct _Unwind_Exception:

e458560b7e

However, we also had a similar fix in our tree as:
https://cgit.freebsd.org/src/commit/?id=3c4fd2463bb29f65ef1404011fcb31e508cdf2e2

Since having both fixes makes the struct too large again, it leads to
SIGBUSes when throwing exceptions on amd64 (or other LP64 arches). This
is most easily tested by running kyua without any arguments.

It looks like our fix is no longer needed now, so revert it to reduce
diffs against upstream.

PR:		253226
Reviewed by:	arichardson, kp
MFC after:	3 days
Differential Revision: https://reviews.freebsd.org/D28799
2021-02-19 22:18:02 +01:00
Alexander V. Chernikov
f9e1cd6c99 Fix arp/ndp deletion broken by 2fe5a79425.
Changes in the 2fe5a79425 moved dst sockaddr masking from the
 routing control plane to the rtsock code.

It broke arp/ndp deletion.
It turns out, arp/ndp perform RTM_GET request first to get an
 interface index necessary for the deletion.
Then they simply stamp the reply with RTF_LLDATA and set the
 command to RTM_DELETE.
As a result, kernel receives request with non-empty RTA_NETMASK
 and clears RTA_DST host bits before passing the message to the
 lla code.

De facto, the only needed bits are RTA_DST, RTA_GATEWAY and the
 subset of rtm_flags.

With that in mind, fix the interace by clearing RTA_NETMASK
 for every messages with RTF_LLDATA.

While here, cleanup arp/ndp code a bit.

MFC after:	1 day
Reviewed by:	gnn
Differential Revision:	https://reviews.freebsd.org/D28804
2021-02-19 21:17:17 +00:00
Alfredo Dal'Ava Junior
a78bb831a1 fbio: Use appropriate types for the physical and virtual framebuffer address
Use appropriate types for the physical and virtual framebuffer address.
Fixes framebuffers mapped above 4G physical on 32-bit systems that
support physical address extensions like i386 and Book-E powerpc.

Patch developed by bdragon

Reviewed by:	bdragon, luporl
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D28604
2021-02-19 20:55:54 -03:00
John Baldwin
2ccf971ace iflib: Cast the result of iflib_netmap_txq_init() to void.
This fixes a warning from GCC for kernels without netmap since the
return value is never used.

Reviewed by:	vmaffione, erj
Differential Revision:	https://reviews.freebsd.org/D28598
2021-02-19 12:52:53 -08:00
Alexander Motin
05d882b780 Microoptimize CTL I/O queues.
Switch OOA queue from TAILQ to LIST and change its direction, so that
we traverse it forward, not backward.  There is only one place where
we really need other direction, and it is not critical.

Use STAILQ_REMOVE_HEAD() instead of STAILQ_REMOVE() in backends.

Replace few impossible conditions with assertions.

MFC after:	1 month
2021-02-19 15:49:36 -05:00
Robert Watson
c3feaeaa32 Reimplement the arm64 dtrace_gethrtime(), which provides the
high-resolution nanosecond timestamp used for the DTrace 'timestamp'
built-in variable.  The new implementation uses the EL0 cycle
counter and frequency registers in ARMv8-A.  This replaces a
previous implementation that relied on an instrumentation-safe
implementation of getnanotime(), which provided only timer
resolution.

MFC after:	3 days
Reviewed by:	andrew, bsdimp (older version)
Useful comments appreciated:	jrtc27, emaste
2021-02-19 09:00:39 +00:00
Alfredo Dal'Ava Junior
50b7c1f530 ofwfb: fix incorrect colors on powerpc* and add new tunable parameters
- Implements little-endian support (powerpc64le)
- Adds 'hw.ofwfb.physaddr' kernel parameter so user can manually
  provide correct address if it's not detected correctly
- Adds 'hw.ofwfb.argb32_pixel' so user can set it manually if
  colors are inverted due to incorrect pixel format (default = 1)
- Automatically selects RGBA32 pixel format if NVidia graphic adapter
  is detected (sets hw.ofwfb.argb32_pixel=0)

Machines equipped with NVidia graphic adapters tend to use RGBA32
pixel format. By default ARGB32 pixel format is used, proved to work
on machines equipped with ATI graphic adapter and the onboard adapter
used on Talos II and Blackbird machines from Raptor Computing Systems.

Original patch developed by bdragon

Reviewed by:	bdragon, luporl
MFC after:	3 days
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D28604
2021-02-19 19:50:36 -03:00
Andrew Turner
d765b21138 Remove __XSCALE__ checks from the arm code
XScale support was removed over 2 years ago, remove the last __XSCALE__
checks from the arm MD code.

Sponsored by:	Innovate UK
2021-02-19 15:31:26 +00:00
Richard Scheffenegger
853fd7a2e3 Ensure cwnd doesn't shrink to zero with PRR
Under some circumstances, PRR may end up with a fully
collapsed cwnd when finalizing the loss recovery.

Reviewed By:	#transport, kbowling
Reported by:	Liang Tian
MFC after:	1 week
Sponsored by:	NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D28780
2021-02-19 13:55:32 +01:00
Daniel Ebdrup Jensen
248a47a4c2 ports(7): Update instructions for package target
Packages default to ending up in a different location compared to the
documentation, so catch up to the implementation by referring to the
location where packages can usually be found if no environment variables
have been set.

While here, also update the mention of the file extension to match the
txz format that packages use.

PR:		253179, 224370
Reported by:	rwatson, jeromer at fastmail dotnet
2021-02-19 13:42:16 +01:00
Kyle Evans
4c0bef07be kern: net: remove TCP_LINGERTIME
TCP_LINGERTIME can be traced back to BSD 4.4 Lite and perhaps beyond, in
exactly the same form that it appears here modulo slightly different
context.  It used to be the case that there was a single pr_usrreq
method with requests dispatched to it; these exact two lines appeared in
tcp_usrreq's PRU_ATTACH handling.

The only purpose of this that I can find is to cause surprising behavior
on accepted connections. Newly-created sockets will never hit these
paths as one cannot set SO_LINGER prior to socket(2). If SO_LINGER is
set on a listening socket and inherited, one would expect the timeout to
be inherited rather than changed arbitrarily like this -- noting that
SO_LINGER is nonsense on a listening socket beyond inheritance, since
they cannot be 'connected' by definition.

Neither Illumos nor Linux reset the timer like this based on testing and
inspection of Illumos, and testing of Linux.

Reviewed by:	rscheff, tuexen
Differential Revision:	https://reviews.freebsd.org/D28265
2021-02-18 22:36:01 -06:00