It was possible that termination of ktrace session occured during some
record write, in which case write occured after the close of the vnode.
Use ktr_io_params refcounting to avoid this situation, by taking the
reference on the structure instead of vnode.
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D30400
Instead of trying to partially ifdef out ktrace handling, define the
missing identifier to 0. Without this fix lack of ktrace in the kernel
also means there is no SIGXFSZ signal delivery.
This allows tracking all wait times with much smaller runtime impact.
For example when doing -j 104 buildkernel on tmpfs:
no profiling: 2921.70s user 282.72s system 6598% cpu 48.562 total
all acquires: 2926.87s user 350.53s system 6656% cpu 49.237 total
contested only: 2919.64s user 290.31s system 6583% cpu 48.756 total
These are not needed when including ktls.h to get sockopt definitions.
Reviewed by: gallatin, jhb
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D30392
sys_ktrace() calls namei(), which may call ktrnamei(). But sys_ktrace()
also calls ktrace_enter() first, so if the caller is itself being
traced, the assertion in ktrace_enter() is triggered. And, ktrnamei()
does not check for recursion like most other ktrace ops do.
Fix the bug by simply deferring the ktrace_enter() call.
Also make the parameter to ktrnamei() const and convert to ANSI.
Reported by: syzbot+d0a4de45e58d3c08af4b@syzkaller.appspotmail.com
Reviewed by: kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D30340
Handle the case where during socket option processing, the user
switches a stack such that processing the stack specific socket
option does not make sense anymore. Return an error in this case.
MFC after: 1 week
Reviewed by: markj
Reported by: syzbot+a6e1d91f240ad5d72cd1@syzkaller.appspotmail.com
Sponsored by: Netflix, Inc.
Differential revision: https://reviews.freebsd.org/D30395
When enabled, writes to ktrace.out that exceed the max file size limit
cause SIGXFSZ as it should be, but note that the limit is taken from
the process that initiated ktrace. When disabled, write is blocked,
but signal is not send.
Note that in either case ktrace for the affected process is stopped.
Requested and reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D30257
Other processes might still be able to write, make the decision to stop
based on the per-process situation.
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D30257
and use the mark to stop applying file size limits on the write of
the accounting record. This allows to remove hack to clear process
limits in acct_process(), and avoids the bug with the clearing being
ineffective because limits are also cached in the thread structure.
Reported and reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D30257
Wrap too long lines.
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D30257
Otherwise pages are cleaned some time later when the lower fs decides
that it is time to do it. This mostly manifests itself as delayed
mtime update, e.g. breaking make-like programs.
Reported by: mav
Tested by: mav, pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
There is no need to own vnode interlock, since v_object is type stable
and can only change to/from NULL, and no other checks in the function
access fields protected by the interlock. Remove the need variable, the
result of the test is directly usable as return value.
Tested by: mav, pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
This makes it possible to use core_write(), core_output(),
and sbuf_drain_core_output(), in Linux coredump code. Moving
them out of imgact_elf.c is necessary because of the weird way
it's being built.
Reviewed By: kib
Sponsored By: EPSRC
Differential Revision: https://reviews.freebsd.org/D30369
While partially reverting D24237 with D29690, due to introducing some
unintended effects for in-kernel TCP consumers, the preexisting lock
on the socket send buffer was not considered properly.
Found by: markj
MFC after: 2 weeks
Reviewed By: tuexen, #transport
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D30390
PRUS_NOTREADY indicates that the caller has not yet populated the chain
with data, and so it is not ready for transmission. This is used by
sendfile (for async I/O) and KTLS (for encryption). In particular, if
pru_send returns an error, the caller is responsible for freeing the
chain since other implicit references to the data buffers exist.
For async sendfile, it happens that an error will only be returned if
the connection was dropped, in which case tcp_usr_ready() will handle
freeing the chain. But since KTLS can be used in conjunction with the
regular socket I/O system calls, many more error cases - which do not
result in the connection being dropped - are reachable. In these cases,
KTLS was effectively assuming success.
So:
- Change sosend_generic() to free the mbuf chain if
pru_send(PRUS_NOTREADY) fails. Nothing else owns a reference to the
chain at that point.
- Similarly, in vn_sendfile() change the !async I/O && KTLS case to free
the chain.
- If async I/O is still outstanding when pru_send fails in
vn_sendfile(), set an error in the sfio structure so that the
connection is aborted and the mbuf chain is freed.
Reviewed by: gallatin, tuexen
Discussed with: jhb
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D30349
- Free the input mbuf in a single place instead of in every error path.
- Handle PRUS_NOTREADY consistently.
- Flush the socket's send buffer if an implicit connect fails. At that
point the mbuf has already been enqueued but we don't want to keep it
in the send buffer.
Reviewed by: gallatin, tuexen
Discussed with: jhb
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D30349
If a regulator hasn't been enable by a driver but is enabled in hardware
(most likely enabled by U-Boot), regulator_status will returns that it
is enabled and so any call to regulator_disable will panic as it wasn't
enabled by one of our drivers.
Sponsored by: Diablotin Systems
Differential Revision: https://reviews.freebsd.org/D30293
This allow us to powerup/down the card and enabling/disabling the
regulators if any.
Sponsored by: Diablotin Systems
Differential Revision: https://reviews.freebsd.org/D30292
This helper can be used to enable/disable the regulator and starting
the power sequence of sd/sdio/eMMC cards.
Sponsored by: Diablotin Systems
Differential Revision: https://reviews.freebsd.org/D30291
This method is used to know if a regulator is enabled or not.
Sponsored by: Diablotin Systems
Differential Revision: https://reviews.freebsd.org/D30290
If a sd/emmc node have a pwrseq property parse it and get the corresponding
driver.
This can later be used to powerup/powerdown the SDIO card or eMMC.
Sponsored by: Diablotin Systems
Differential Revision: https://reviews.freebsd.org/D30289
This driver is used to power up sdio card or eMMC.
It handle the reset-gpio, clocks and needed delays for powerup/powerdown.
Sponsored by: Diablotin Systems
Differential Revision: https://reviews.freebsd.org/D30288
By default name the gpio P<bank><bankpin>
This make it easier to find the gpio when reading schematics or DTS.
Sponsored by: Diablotin Systems
Differential Revision: https://reviews.freebsd.org/D30287
For the discovery phase of SD/eMMC we need to do some transaction in a async
way.
The classic CAM XPT_{GET,SET}_TRAN_SETTING cannot be used in a async way.
This also allow us to split the discovery phase into a more complete state
machine and we don't mtx_sleep with a random number to wait for completion
of the tasks.
For mmc_sim we now do the SET_TRAN_SETTING in a taskqueue so we can call
the needed function for regulators/clocks without the cam lock(s). This part is
still needed to be done for sdhci.
We also now save the host OCR in the discovery phase as it wasn't done before and
only worked because the same ccb was reused.
Reviewed by: imp, kibab, bz
Differential Revision: https://reviews.freebsd.org/D30038
the thread destructor is invoked. Catch that window by waiting for all
task_struct allocations to be returned before freeing the UMA zone in the
LinuxKPI. Else UMA may fail to release the zone due to concurrent access
and panic:
panic() - Bad link element prev->next != elm
zone_release()
bucket_drain()
bucket_free()
zone_dtor()
zone_free_item()
uma_zdestroy()
linux_current_uninit()
This failure can be triggered by loading and unloading the LinuxKPI module
in a loop:
while true
do
kldload linuxkpi
kldunload linuxkpi
done
Discussed with: kib@
MFC after: 1 week
Sponsored by: Mellanox Technologies // NVIDIA Networking
r367492 would unlock the socket buffer before eventually calling the upcall.
This leads to problematic interaction with NFS kernel server/client components
(MP threads) accessing the socket buffer with potentially not correctly updated
state.
Reported by: rmacklem
Reviewed By: tuexen, #transport
Tested by: rmacklem, otis
MFC after: 2 weeks
Sponsored By: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D29690
API should work as following:
- periodicaly report Lower-or-EQual bandwidth (LEQ) connections
over kernel socket, if user application registered for such
per-flow notifications
- report Grater-or-EQual (GEQ) bandwidth as soon as it reaches
specified value in configured time window
Custom implementation of callouts was removed. There is no
point of doing calout-wheel here as generic callouts are
doing exactly the same. The performance is not critical
for such reporting, so the biggest concern should be
to have a code which can be easily maintained.
This is ia preparation for locking rework which is highly inefficient.
Approved by: mw
Sponsored by: Stormshield
Obtained from: Semihalf
Differential Revision: https://reviews.freebsd.org/D30210
Commit b3d4c70dc6 added support for CLAIM_DELEG_CUR_FH to Open.
While doing this, I noticed that CLAIM_DELEG_PREV_FH support
could be added the same way. Although I am not aware of any extant
NFSv4.1/4.2 client that uses this claim type, it seems prudent to add
support for this variant of Open to the NFSv4.1/4.2 server.
This patch does not affect mounts from extant NFSv4.1/4.2 clients,
as far as I know.
MFC after: 2 weeks
This fixes a few bugs in iSCSI backends where the backends were using
the limits they advertised initially during the login phase as the
final values instead of the values negotiated with the other end.
Reported by: Jithesh Arakkan @ Chelsio
Reviewed by: mav
Differential Revision: https://reviews.freebsd.org/D30271
cxgbei stores state about a target transfer in the ctl_private[] array
of a ctl_io that is freed when a target transfer (represented by the
cdw) is freed. As such, freeing a ctl_io before a cdw that references
it can result in a use after free in cxgbei. Two of the four places
freed the cdw first, and the other two freed the ctl_io first. Fix
the latter two places to free the cdw first.
Reported by: Jithesh Arakkan @ Chelsio
Reviewed by: mav
Differential Revision: https://reviews.freebsd.org/D30270
At this point the directory's vnode lock is held, so blocking while
waiting for free pages makes the system more susceptible to deadlock in
low memory conditions. This is particularly problematic on NUMA systems
as UMA currently implements a strict first-touch policy.
ufsdirhash_build() already uses M_NOWAIT for other allocations and
already handled failures for the block array allocation, so just convert
to M_NOWAIT.
PR: 253992
Reviewed by: markj, mckusick, vangyzen
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D29045
Floating states get assigned to interface 'all' (V_pfi_all), so when we
try to flush all states for an interface states originally created
through this interface are not flushed. Only if-bound states can be
flushed in this way.
Given that we track the original interface we can check if the state's
interface is 'all', and if so compare to the orig_if instead.
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D30246
Track (and display) the interface that created a state, even if it's a
floating state (and thus uses virtual interface 'all').
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D30245
It's a class0 driver that implements some pcib methods and creates
a pci bus as its children.
The "ofw_pci" name will be used by a new driver that will be a subclass
of the pci bus.
No functional changes intended.
Submitted by: Kornel Duleba <mindal@semihalf.com>
Reviewed by: andrew
Obtained from: Semihalf
Sponsored by: Alstom Group
Differential Revision: https://reviews.freebsd.org/D30226
There is no need to preform any voltage reconfiguration
in case the vccq regulator is not physically attached to the
slot.
Submitted by: Lukasz Hajec <lha@semihalf.com>
Obtained from: Semihalf
Sponsored by: Alstom Group
Differential Revision: https://reviews.freebsd.org/D30355
While here, fix all links to older en_US.ISO8859-1 documentation
in the src/ tree.
PR: 255026
Reported by: Michael Büker <freebsd@michael-bueker.de>
Reviewed by: dbaio
Approved by: blackend (mentor), re (gjb)
MFC after: 10 days
Differential Revision: https://reviews.freebsd.org/D30265
API should work as following:
- periodicaly report Lower-or-EQual bandwidth (LEQ) connections
over kernel socket, if user application registered for such
per-flow notifications
- report Grater-or-EQual (GEQ) bandwidth as soon as it reaches
specified value in configured time window
Custom implementation of callouts was removed. There is no
point of doing calout-wheel here as generic callouts are
doing exactly the same. The performance is not critical
for such reporting, so the biggest concern should be
to have a code which can be easily maintained.
This is ia preparation for locking rework which is highly inefficient.
Approved by: mw
Sponsored by: Stormshield
Obtained from: Semihalf
Differential Revision: https://reviews.freebsd.org/D30210
These were seemingly copied over from icl_soft.
Reviewed by: np
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D30268
I botched a few of the changes when rebasing the changes in
4b6ed0758d across the changes in
43bbae1948.
- Move the counter allocations into alloc_ofld_rxq().
- Free the counters freeing an ofld rxq.
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D30267
The new Mikrotik 10/25G NIC is mostly compatible with AR8151 hardware,
with few exceptions:
* card supports only 32bit DMA operations
* card does not support write-one-to-clear semantics for interrupt status
register
* MDIO operations can take longer to complete
This patch adds support for Mikrotik 10/25G NIC to the alc driver
while maintaining support for all earlier HW.
The patch was tested with FreeBSD main branch as of commit
f4b38c360e
This was tested on Intel i7-4790K system with Mikrotik 10/25G NIC.
This was tested on Intel i7-4790K system with RB44Ge (AR8151 based 4-port NIC)
to verify backwards compatibility.
PR: 256000
Submitted by: Gatis Peisenieks <gatis@mikrotik.com>
MFC after: 1 week
The most difficult NFSv4 client recovery case happens when the
lease has expired on the server. For NFSv4.0, the client will
receive a NFSERR_EXPIRED reply from the server to indicate this
has happened.
For NFSv4.1/4.2, most RPCs have a Sequence operation and, as such,
the client will receive a NFSERR_BADSESSION reply when the lease
has expired for these RPCs. The client will then call nfscl_recover()
to handle the NFSERR_BADSESSION reply. However, for the expired lease
case, the first reclaim Open will fail with NFSERR_NOGRACE.
This patch recognizes this case and calls nfscl_expireclient()
to handle the recovery from an expired lease.
This patch only affects NFSv4.1/4.2 mounts when the lease
expires on the server, due to a network partitioning that
exceeds the lease duration or similar.
MFC after: 2 weeks
Each packet is counted as 128 bytes by the code, not 125. Not sure
what I was thinking about here 14 years ago. May be just a typo.
Reported by: Dmitry Luhtionov <dmitryluhtionov@gmail.com>
MFC after: 2 weeks
While the xin32k clk was implemented in rk3399_cru as a fixed rate
clock, migrate it to rk805 as we will also need the 2nd clock
'rtc_clko_wifi' for SDIO and BT.
Both clocks remain fixed rate, and while the 1st one is always on
(though that is not expressed in the clk framework), the 2nd one
we can toggle on/off.
Reviewed-by: manu
Tested-by: manu
MFC-after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D26870
Recent discussion on the nfsv4@ietf.org mailing list confirmed
that an NFSv4 server should reply to an RPC in less than 1second.
If an NFSv4 RPC requires a delegation be recalled,
the server will attempt a CB_RECALL callback.
If the client is not responsive, the RPC reply will be delayed
until the callback times out.
Without this patch, the timeout is set to 4 seconds (set in
ticks, but used as seconds), resulting in the RPC reply taking over 4sec.
This patch redefines the constant as being in milliseconds and it
implements that for a value of 800msec, to ensure the RPC
reply is sent in less than 1second.
This patch only affects mounts from clients when delegations
are enabled on the server and the client is unresponsive to callbacks.
MFC after: 2 weeks
The Linux NFSv4.1/4.2 client now uses the CLAIM_DELEG_CUR_FH
variant of the Open operation when delegations are recalled and
the client has a local open of the file. This patch adds
support for this variant of Open to the NFSv4.1/4.2 server.
This patch only affects mounts from Linux clients when delegations
are enabled on the server.
MFC after: 2 weeks
The current implement of ip_input() reject packets destined for
169.254.0.0/16, but not those original from 169.254.0.0/16 link-local
addresses.
Fix to fully respect RFC 3927 section 2.7.
PR: 255388
Reviewed by: donner, rgrimes, karels
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D29968
Once the final component of the IP address has been parsed, the offset
on the input must not be advanced, as this would remove an unparsed
character from the input.
Submitted by: Markus Stoff
Reviewed by: donner
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D26489
We dereference so->so_cred to update the per-uid socket buffer
accounting, so the crfree() call must be deferred until after that
point.
PR: 255869
MFC after: 1 week
Since busy state is checked by all blocked writes, stopping a process
which waits in ttydisc_write() causes cascade. Utilize sigdeferstop()
to avoid the issue.
Submitted by: Jakub Piecuch <j.piecuch96@gmail.com>
PR: 255816
MFC after: 1 week
behind USB HUBs are detected and the USB reset counter logic will kick in
preventing enumeration of continuously failing ports.
Submitted by: phk@
Tested by: bz@
PR: 237666
MFC after: 1 week
Sponsored by: Mellanox Technologies // NVIDIA Networking
says it should be max 10 milliseconds.
This may fix some USB enumeration issues:
> usbd_req_re_enumerate: addr=3, set address failed! (USB_ERR_IOERROR, ignored)
> usbd_setup_device_desc: getting device descriptor at addr 3 failed,
Found by: Zhichao1.Li@dell.com
MFC after: 1 week
Sponsored by: Mellanox Technologies // NVIDIA Networking
0 milliseconds and 2 seconds inclusivly. Some style fixes while at it.
The USB specification has minimum values and maximum values,
and not only minimum values.
MFC after: 1 week
Sponsored by: Mellanox Technologies // NVIDIA Networking
The AP startup extern variable declarations are not longer needed,
since PVHv2 uses the native AP startup path using the lapic. Remove
the declaration and make the variables static to mp_machdep.c
Sponsored by: Citrix Systems R&D
When a module of type "hostuuid" is provided by the loader,
prison0_init strips any trailing whitespace and ASCII control
characters by (a) adjusting the buffer length, and (b) zeroing out
the characters in question, before storing it as the system's
hostuuid.
The buffer length adjustment was correct, but the zeroing overwrote
one byte higher in memory than intended -- in the typical case,
zeroing one byte past the end of the hostuuid buffer. Due to the
layout of buffers passed by the boot loader to the kernel, this will
be the first byte of a subsequent buffer.
This was *probably* harmless; prison0_init runs after preloaded kernel
modules have been linked and after the preloaded /boot/entropy cache
has been processed, so in both cases having the first byte overwritten
will not cause problems. We cannot however rule out the possibility
that other objects which are preloaded by the loader could suffer from
having the first byte overwritten.
Since the zeroing does not in fact serve any purpose, remove it and
trim trailing whitespace and ASCII control characters by adjusting
the buffer length alone.
Fixes: c3188289 Preload hostuuid for early-boot use
Reviewed by: kevans, markj
MFC after: 3 days
If the preloaded hostuuid value is invalid and verbose booting is
enabled, a warning is printed. This printf had two bugs:
1. It was missing a trailing \n character.
2. The malformed UUID is printed with %s even though it is not known
to be NUL-terminated.
This commit adds the missing \n and uses %.*s with the (already known)
length of the preloaded UUID to ensure that we don't read past the end
of the buffer.
Reported by: kevans
Fixes: c3188289 Preload hostuuid for early-boot use
MFC after: 3 days
We never set 'busy' and never dequeue from the pending mq. Remove this
code.
Reviewed by: ae
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D30313
Userspace relies on this pointer to work out if the kif is a group or
not. It can't use it for anything else, because it's a pointer to a
kernel address. Substitute 0xfeedc0de for 'true', so that we don't leak
kernel memory addresses to userspace.
PR: 255852
Reviewed by: donner
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D30284
qoriq_gpio_pin_setflags() locks the device mutex, as does
qoriq_gpio_map_gpios(), causing a recursion on non-recursive lock. This
was missed during testing for 16e549ebe.
Summary:
To make it easier to build a kernel with PowerISA 2.06 atomics (sub-word
atomics), add a kernel config option. User space still needs to specify
it as a CFLAG but that seems easier to do than for the kernel config.
Reviewed By: luporl
Differential Revision: https://reviews.freebsd.org/D29809
Summary:
There's no need to use a while loop in the IPI handler, the message list
is cached once and processed. Instead, since the existing code calls
ffs(), sort the handlers, and use a simple 'if' sequence.
Reviewed By: nwhitehorn
Differential Revision: https://reviews.freebsd.org/D30018
PVHv1 was officially removed from Xen in 4.9, so just axe the related
code from FreeBSD.
Note FreeBSD supports PVHv2, which is the replacement for PVHv1.
Sponsored by: Citrix Systems R&D
Reviewed by: kib, Elliott Mitchell
Differential Revision: https://reviews.freebsd.org/D30228
The change to futex_andl_smap() should have ordered stac before the
load from a user address, otherwise it does not fix anything.
Fixes: fb58045145 ("linux: Fix SMAP-enabled futex routines")
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
The original filesystem release (4.2BSD) had no embedded sysmlinks.
Historically symbolic links were just a different type of file, so
the content of the symbolic link was contained in a single disk block
fragment. We observed that most symbolic links were short enough that
they could fit in the area of the inode that normally holds the block
pointers. So we created embedded symlinks where the content of the
link was held in the inode's pointer area thus avoiding the need to
seek and read a data fragment and reducing the pressure on the block
cache. At the time we had only UFS1 with 32-bit block pointers,
so the test for a fastlink was:
di_size < (NDADDR + NIADDR) * sizeof(daddr_t)
(where daddr_t would be ufs1_daddr_t today).
When embedded symlinks were added, a spare field in the superblock
with a known zero value became fs_maxsymlinklen. New filesystems
set this field to (NDADDR + NIADDR) * sizeof(daddr_t). Embedded
symlinks were assumed when di_size < fs->fs_maxsymlinklen. Thus
filesystems that preceeded this change always read from blocks
(since fs->fs_maxsymlinklen == 0) and newer ones used embedded
symlinks if they fit. Similarly symlinks created on pre-embedded
symlink filesystems always spill into blocks while newer ones will
embed if they fit.
At the same time that the embedded symbolic links were added, the
on-disk directory structure was changed splitting the former
u_int16_t d_namlen into u_int8_t d_type and u_int8_t d_namlen.
Thus fs_maxsymlinklen <= 0 (as used by the OFSFMT() macro) can
be used to distinguish old directory formats. In retrospect that
should have just been an added flag, but we did not realize we
needed to know about that change until it was already in production.
Code was split into ufs/ffs so that the log structured filesystem could
use ufs functionality while doing its own disk layout. This meant
that no ffs superblock fields could be used in the ufs code. Thus
ffs superblock fields that were needed in ufs code had to be copied
to fields in the mount structure. Since ufs_readlink needed to know
if a link was embedded, fs_maxlinklen gets copied to mnt_maxsymlinklen.
The kernel panic that arose to making this fix was triggered when a
disk error created an inode of type symlink with no allocated data
blocks but a large size. When readlink was called the uiomove was
attempted which segment faulted.
static int
ufs_readlink(ap)
struct vop_readlink_args /* {
struct vnode *a_vp;
struct uio *a_uio;
struct ucred *a_cred;
} */ *ap;
{
struct vnode *vp = ap->a_vp;
struct inode *ip = VTOI(vp);
doff_t isize;
isize = ip->i_size;
if ((isize < vp->v_mount->mnt_maxsymlinklen) ||
DIP(ip, i_blocks) == 0) { /* XXX - for old fastlink support */
return (uiomove(SHORTLINK(ip), isize, ap->a_uio));
}
return (VOP_READ(vp, ap->a_uio, 0, ap->a_cred));
}
The second part of the "if" statement that adds
DIP(ip, i_blocks) == 0) { /* XXX - for old fastlink support */
is problematic. It never appeared in BSD released by Berkeley because
as noted above mnt_maxsymlinklen is 0 for old format filesystems, so
will always fall through to the VOP_READ as it should. I had to dig
back through `git blame' to find that Rodney Grimes added it as
part of ``The big 4.4BSD Lite to FreeBSD 2.0.0 (Development) patch.''
He must have brought it across from an earlier FreeBSD. Unfortunately
the source-control logs for FreeBSD up to the merger with the
AT&T-blessed 4.4BSD-Lite conversion were destroyed as part of the
agreement to let FreeBSD remain unencumbered, so I cannot pin-point
where that line got added on the FreeBSD side.
The one change needed here is that mnt_maxsymlinklen is declared as
an `int' and should be changed to be `u_int64_t'.
This discovery led us to check out the code that deletes symbolic
links. Specifically
if (vp->v_type == VLNK &&
(ip->i_size < vp->v_mount->mnt_maxsymlinklen ||
datablocks == 0)) {
if (length != 0)
panic("ffs_truncate: partial truncate of symlink");
bzero(SHORTLINK(ip), (u_int)ip->i_size);
ip->i_size = 0;
DIP_SET(ip, i_size, 0);
UFS_INODE_SET_FLAG(ip, IN_SIZEMOD | IN_CHANGE | IN_UPDATE);
if (needextclean)
goto extclean;
return (ffs_update(vp, waitforupdate));
}
Here too our broken symlink inode with no data blocks allocated
and a large size will segment fault as we are incorrectly using the
test that we have no data blocks to decide that it is an embdedded
symbolic link and attempting to bzero past the end of the inode.
The test for datablocks == 0 is unnecessary as the test for
ip->i_size < vp->v_mount->mnt_maxsymlinklen will do the right
thing in all cases.
The test for datablocks == 0 was added by David Greenman in this commit:
Author: David Greenman <dg@FreeBSD.org>
Date: Tue Aug 2 13:51:05 1994 +0000
Completed (hopefully) the kernel support for old style "fastlinks".
Notes:
svn path=/head/; revision=1821
I am guessing that he likely earlier added the incorrect test in the
ufs_readlink code.
I asked David if he had any recollection of why he made this change.
Amazingly, he still had a recollection of why he had made a one-line
change more than twenty years ago. And unsurpisingly it was because
he had been stuck between a rock and a hard place.
FreeBSD was up to 1.1.5 before the switch to the 4.4BSD-Lite code
base. Prior to that, there were three years of development in all
areas of the kernel, including the filesystem code, from the combined
set of people including Bill Jolitz, Patchkit contributors, and
FreeBSD Project members. The compatibility issue at hand was caused
by the FASTLINKS patches from Curt Mayer. In merging in the 4.4BSD-Lite
changes David had to find a way to provide compatibility with both
the changes that had been made in FreeBSD 1.1.5 and with 4.4BSD-Lite.
He felt that these changes would provide compatibility with both systems.
In his words:
``My recollection is that the 'FASTLINKS' symlinks support in
FreeBSD-1.x, as implemented by Curt Mayer, worked differently than
4.4BSD. He used a spare field in the inode to duplicately store the
length. When the 4.4BSD-Lite merge was done, the optimized symlinks
support for existing filesystems (those that were initialized in
FreeBSD-1.x) were broken due to the FFS on-disk structure of
4.4BSD-Lite differing from FreeBSD-1.x. My commit was needed to
restore the backward compatibility with FreeBSD-1.x filesystems.
I think it was the best that could be done in the somewhat urgent
circumstances of the post Berkeley-USL settlement. Also, regarding
Rod's massive commit with little explanation, some context: John
Dyson and I did the initial re-port of the 4.4BSD-Lite kernel to
the 386 platform in just 10 days. It was by far the most intense
hacking effort of my life. In addition to the porting of tons of
FreeBSD-1 code, I think we wrote more than 30,000 lines of new code
in that time to deal with the missing pieces and architectural
changes of 4.4BSD-Lite. We didn't make many notes along the way.
There was a lot of pressure to get something out to the rest of the
developer community as fast as possible, so detailed discrete commits
didn't happen - it all came as a giant wad, which is why Rod's
commit message was worded the way it was.''
Reported by: Chuck Silvers
Tested by: Chuck Silvers
History by: David Greenman Lawrence
MFC after: 1 week
Sponsored by: Netflix
Commit 7a606f280a allowed the server to do retries of CB_RECALL
callbacks every couple of seconds. This was needed to allow the
Linux client to re-establish the back channel.
However this patch broke the delegation timeout check, such that
it would just keep retrying CB_RECALLS.
If the client has crashed or been network patitioned from the
server, this continues until the client TCP reconnects to
the server and re-establishes the back channel.
This patch modifies the code such that it still times out the
delegation recall after some minutes, so that the server will
allow the conflicting client request once the delegation times out.
This patch only affects the NFSv4 server when delegations are
enabled and a NFSv4 client that holds a delegation has crashed
or been network partitioned from the server for at least several
minutes when a delegation needs to be recalled.
MFC after: 2 weeks
It looks like I've missed a couple of places where we don't clear
stack-allocated CCBs. Don't panic when that happens, just print
a warning.
This is a temporary measure until I get those cases fixed.
Reviewed By: markj
Sponsored by: NetApp, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D30296
Some of them were dereferencing the user pointer before disabling SMAP.
PR: 255591
Reviewed by: kib
Tested by: pitwuu@gmail.com
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D30276
m_pullup(9) frees the mbuf(9) chain in the case of an allocation error.
The mbuf chain must not be freed again in this case.
PR: 255874
Submitted by: <lylgood@foxmail.com>
Approved by: markj
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D30273
Previously, daregister() could have been called before dainit()
initialized the UMA zone. This would trip a KASSERT.
Reported By: pho
Tested By: pho
Sponsored by: NetApp, Inc.
Sponsored by: Klara, Inc.
At some places the ASSERT was inserted before variable declarations are
finished. This is fixed now.
Reported by: kib
Reviewed by: kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D30282
This patch makes it possible for CAM to use small CCBs allocated
from an periph-specific UMA zone instead of the usual, huge ones.
The end result is that CCBs issued via da(4) take 544B (size of
ccb_scsiio) instead of the usual 2kB (size of 'union ccb', ~1.5kB,
rounded up by malloc(9)). For ATA it's 272B. We waste less
memory, we avoid zeroing the unused 1kB, and it should be easier
to allocate those CCBs in low memory conditions. It should also
be possible to use uma_zone_reserve(9) to improve behaviour
in low memory conditions even further.
Note that this does not change the size, or the layout, of CCBs
as such. CCBs get allocated in various different ways, in particular
on the stack, and I don't want to redo all that. Instead, this
provides an opt-in mechanism for the periph to declare "my start()
callback is fine with receiving a CCB allocated from this UMA zone".
In other words, most of the code works exactly as it used to; the
change only happens to IOs issued by xpt_run_allockq(), which
is - conveniently - pretty much all that matters for performance.
The reason for doing it this way is that it's pretty small, localized
change, and can be implemented gradually and iteratively: take a
periph, make sure its start() callback only casts the CCBs it takes
to a particular type of CCB, for example ccb_scsiio, and that it only
casts CCBs returned by cam_periph_getccb() to that type, then add UMA
zone for that size, and declare it safe to XPT.
This is disabled by default. Set 'kern.cam.ada.enable_uma_ccbs=1'
and 'kern.cam.da.enable_uma_ccbs=1' tunables to enable it. Testing
is welcome; I will flip the default to enable in two weeks from now.
Reviewed By: imp
Sponsored by: NetApp, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D28674
The field nullAddress in struct libalias is never set and never used.
It exists as a placeholder for an unused argument only.
Reviewed by: hselasky
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D30253
libalias is a convolut of various coding styles modified by a series
of different editors enforcing interesting convetions on spacing and
comments.
This patch is a baseline to start with a perfomance rework of
libalias. Upcoming patches should be focus on the code, not on the
style. That's why most annoying style errors should be fixed
beforehand.
Reviewed by: hselasky
Discussed by: emaste
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D30259
The CTL frontend might have provided a buffer that is smaller than the
FirstBurstLength and thus smaller than the amount of unsolicited data
included in the request PDU. Treat these transfers as an empty
transfer.
Reported by: Jithesh Arakkan @ Chelsio
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D29940
A single union ctl_io can be reused across multiple transfers (in
particular by the ramdisk backend). On a reuse, the reservation
pointer would retain its value from the previous transfer tripping an
assertion.
Reported by: Jithesh Arakkan @ Chelsio
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D29939
- Switch to allocating the cxgbei version of icl_pdu explicitly
as a separate refcounted object allocated via malloc/free
instead of storing it in the bhs mbuf prior to the bhs.
- Support the icl_conn_pdu_queue_cb() method to set a callback
on a PDU to be invoked when the PDU is freed.
- For ICL_NOCOPY buffers, use an external mbuf to manage the
storage for the buffer via m_extaddref(). Each external mbuf
holds a reference on the associated PDU, so the callback is
invoked once all of the external mbufs have been freed.
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D29910
- Only allocate 16K jumbo mbufs if the region of data to be
appended is sufficiently large, and use a loop.
- Use m_getm2() to allocate a chain for data less than 16K, or
if m_getjcl() fails.
- Use ENOMEM as the return value instead of '1' if the hook fails due
to a memory allocation error.
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D29909
A CAM target layer I/O CCB can use a S/G list of virtual address ranges
to describe its data buffer. This change adds zero-copy receive support
for such requests.
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D29908
As a result, CPL_FW4_ACK now returns credits for these work requests.
To support this, page pod work requests are now constructed in special
mbufs similar to "raw" mbufs used for NIC TLS in plain TX queues.
These special mbufs are stored in the ulp_pduq and dispatched in order
with PDU work requests.
Sponsored by: Chelsio Communications
Discussed with: np
Differential Revision: https://reviews.freebsd.org/D29904
update_rtm_from_rc() calls update_rtm_from_info() internally.
The latter one may update provided prtm pointer with a new rtm.
Reassign rtm from prtm afeter calling update_rtm_from_info() to
avoid touching the freed rtm.
PR: 255871
Submitted by: lylgood@foxmail.com
MFC after: 3 days
There are some scenarios where a timer event may be detached when it is
on the process' kqueue timer stop queue. If kqtimer_proc_continue() is
called after that point, it will iterate over the queue and access freed
timer structures.
It is also possible, at least in a multithreaded program, for a stopped
timer event to be scheduled without removing it from the process' stop
queue. Ensure that we do not doubly enqueue the event structure in this
case.
Reported by: syzbot+cea0931bb4e34cd728bd@syzkaller.appspotmail.com
Reported by: syzbot+9e1a2f3734652015998c@syzkaller.appspotmail.com
Reviewed by: kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D30251
Enabled driver initialization causes an abort
on the NXP LS1028ARDB platform (without any external
endpoints connected). Temporarily disable qoriq_dw_pci
probe, so that to allow successful booting of the OS.
Submitted by: Lukasz Hajec <lha@semihalf.com>
Obtained from: Semihalf
Sponsored by: Alstom Group
Differential Revision: https://reviews.freebsd.org/D30229
We shouldn't overwrite capability register. Instead, voltages supported
by the controller have to be read from dts, as the hardware doesn't
report correct values.
Submitted by: Lukasz Hajec <lha@semihalf.com>
Reviewed by: manu
Obtained from: Semihalf
Sponsored by: Alstom Group
Differential Revision: https://reviews.freebsd.org/D30123
Add a missing call to mmc_fdt_parse, without it some dts properties
are not parsed.
Submitted by: Lukasz Hajec <lha@semihalf.com>
Reviewed by: manu
Obtained from: Semihalf
Sponsored by: Alstom Group
Differential Revision: https://reviews.freebsd.org/D30122
Add data specific for SoC, including all necessary quirks.
Submitted by: Lukasz Hajec <lha@semihalf.com>
Reviewed by: manu
Obtained from: Semihalf
Sponsored by: Alstom Group
Differential Revision: https://reviews.freebsd.org/D30121
If sending out a packet fails during the loop over all links, the
allocated memory is leaked and not all links receive a copy. This
patch fixes those problems, clarifies a premature abort of the loop,
and fixes a minory style(9) bug.
PR: 255430
Submitted by: Dancho Penev
Tested by: Dancho Penev
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D30008
Hint the compiler, that this update is needed at most once per second.
Only in this case the memory line needs to be written. This will
reduce the amount of cache trashing during forward of most frames.
Suggested by: zec
Approved by: zec
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D28601
Other FDT platform (like powerpc64* or riscv64) don't have gpio built
by default so just compile the module for those two arches.
Fixes: 9e08f82058 ("modules: Add sdhci_fdt module")
We generally like to avoid style changes when other changes are not
planned. In this case there are some makesyscalls.lua changes in the
pipeline, and this cleans up style nits in generated files that were
highlighted by experiments with clang-format.
Reviewed by: brooks, kevans
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D30235
Remove OBJT_SWAP_TMPFS. Move tmpfs-specific swap pager bits into
tmpfs_subr.c.
There is no longer any code to directly support tmpfs in sys/vm, most
tmpfs knowledge is shared by non-anon swap object type implementation.
The tmpfs-specific methods are provided by registered tmpfs pager, which
inherits from the swap pager.
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D30168
Pager is allowed to inherit part of its implementation from the existing
pager, which is done by copying non-NULL virtual method slots.
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D30168
Mostly in cases where OBJ_SWAP flag works as well, or by reversing the
condition so that object types can be listed.
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D30168
This avoids the need to know all existing object types in advance, by the
cost of loosing the assert that unknown object type is handled in a sane
manner.
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D30168
to get object type, and stop enumerating OBJT_XXX constants. This also
provides properly a pointer for the vnode, if object backs any.
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D30168