- thr_kill(2) and thr_exit(2) generally (no argument auditing here.
- A set of syscalls for the process descriptor family, specifically:
pdfork(2), pdgetpid(2) and pdkill(2)
For these syscalls, audit the file descriptor. In the case of pdfork(2)
a pointer to an integer (file descriptor) is passed in as an argument.
We audit the post initialized file descriptor (not the random garbage
that would have been passed in). We will also audit the child process
which was created from the fork operation (similar to what is done for
the fork(2) syscall).
pdkill(2) we audit the signal value and fd, and finally pdgetpid(2)
just the file descriptor:
- Following is a sample of the produced audit trails:
header,111,11,pdfork(2),0,Sat May 16 03:07:50 2020, + 394 msec
argument,0,0x39d,child PID
argument,2,0x2,flags
argument,1,0x8,fd
subject,root,root,0,root,0,924,0,0,0.0.0.0
return,success,925
header,79,11,pdgetpid(2),0,Sat May 16 03:07:50 2020, + 394 msec
argument,1,0x8,fd
subject,root,root,0,root,0,924,0,0,0.0.0.0
return,success,0
trailer,79
header,135,11,pdkill(2),0,Sat May 16 03:07:50 2020, + 395 msec
argument,1,0x8,fd
argument,2,0xf,signal
process_ex,root,root,0,root,0,925,0,0,0.0.0.0
subject,root,root,0,root,0,924,0,0,0.0.0.0
return,success,0
trailer,135
MFC after: 1 week
This fixes a race where concurrent calls to doenterpgrp() and
leavepgrp() while TIOCSCTTY is executing may result in tp->t_pgrp
changing value so that tty_rel_pgrp() misses clearing it to NULL. For
more details refer to the use of pgdelete() in the kernel.
No functional change intended.
Panic backtrace:
__mtx_lock_sleep() # page fault due to using destroyed mutex
tty_signal_pgrp()
tty_ioctl()
ptsdev_ioctl()
kern_ioctl()
sys_ioctl()
amd64_syscall()
MFC after: 1 week
Sponsored by: Mellanox Technologies
Reorder flag manipulations and use barrier to ensure that the program
order is followed by compiler and CPU, for unlocked reader of so_state.
In collaboration with: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D24842
Sometimes, when doing read(2) over unix domain socket, for which the
other side socket was closed, read(2) returns -1/ENOTCONN instead of
EOF AKA zero-size read. This is because soreceive_generic() does not
lock socket when testing the so_state SS_ISCONNECTED|SS_ISCONNECTING
flags. It could end up that we do not observe so->so_rcv.sb_state bit
SBS_CANTRCVMORE, and then miss SS_ flags.
Change the test to check that the socket was never connected before
returning ENOTCONN, by adding all state bits for connected.
Reported and tested by: pho
In collaboration with: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D24819
Extattr names are allowed to be 255 bytes -- not 254 bytes plus trailing
NUL. Provide a 256 buffer so that copyinstr() has room for the trailing
NUL.
Re-enable test for maximal name lengths.
PR: 208965
Reported by: asomers
Reviewed by: asomers
Differential Revision: https://reviews.freebsd.org/D24584
Unlike the other copy*() functions, it does not serve to copy from one
address space to another or protect against potential faults. It's just
an older incarnation of the now-more-common strlcpy().
Add a coccinelle script to tools/ which can be used to mechanically
convert existing instances where replacement with strlcpy is trivial.
In the two cases which matched, fuse_vfsops.c and union_vfsops.c, the
code was further refactored manually to simplify.
Replace the declaration of copystr() in systm.h with a small macro
wrapper around strlcpy.
Remove N redundant MI implementations of copystr. For MIPS, this
entailed inlining the assembler copystr into the only consumer,
copyinstr, and making the latter a leaf function.
Reviewed by: jhb
Differential Revision: https://reviews.freebsd.org/D24672
If single-threaded process receives a signal during critical section
established by sigfastblock(2) word, unblock did not caused signal
delivery because sigfastblock(SIGFASTBLOCK_UNBLOCK) failed to request
ast handling of the pending signals.
Set TDF_ASTPENDING | TDF_NEEDSIGCHK on unblock or when kernel forces
end of sigfastblock critical section, to cause syscall exit to recheck
and deliver any signal pending.
Reported by: corydoras@ridiculousfish.com
PR: 246385
Sponsored by: The FreeBSD Foundation
We know the value must be greater than 0 and less than MAXSECFLAVORS.
Reject values outside this range in the initial check in vfs_export and add KASSERTs
in the later consumers.
Also check that we are called with one of either MNT_DELEXPORT or MNT_EXPORTED set.
Reviewed by: rmacklem
Approved by: mav (mentor)
MFC after: 1 week
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D24753
It can be dangerous and there is no need for it in the kernel.
Inspired by Kees Cook's change in Linux, and later OpenBSD.
Reviewed by: cem, gordon, philip
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D24760
This is a general cleanup of the relocatable kernel support on powerpc,
needed to enable kernel ifuncs.
* Fix some relocatable issues in the kernel linker, and change to using
a RELOCATABLE_KERNEL #define instead of #ifdef __powerpc__ for parts that
other platforms can use in the future if they wish to have ET_DYN kernels.
* Get rid of the DB_STOFFS hack now that the kernel is relocated to the DMAP
properly across the board on powerpc64.
* Add powerpc64 and powerpc32 ifunc functionality.
* Allow AIM64 virtual mode OF kernels to run from the DMAP like other AIM64
by implementing a virtual mode restart. This fixes the runtime address on
PowerMac G5.
* Fix symbol relocation problems on post-relocation kernels by relocating
the symbol table.
* Add an undocumented method for supplying kernel symbols on powernv and
other powerpc machines using linux-style kernel/initrd loading -- If
you pass the kernel in as the initrd as well, the copy resident in initrd
will be used as a source for symbols when initializing the debugger.
This method is subject to removal once we have a better way of doing this.
Approved by: jhibbits
Relnotes: yes
Sponsored by: Tag1 Consulting, Inc.
Differential Revision: https://reviews.freebsd.org/D23156
They have more differencies than similarities. For now there is lots
of code that would check for M_EXT only and work correctly on M_EXTPG
buffers, so still carry M_EXT bit together with M_EXTPG. However,
prepare some code for explicit check for M_EXTPG.
Reviewed by: gallatin
Differential Revision: https://reviews.freebsd.org/D24598
o Shrink sglist(9) functions to work with multipage mbufs down from
four functions to two.
o Don't use 'struct mbuf_ext_pgs *' as argument, use struct mbuf.
o Rename to something matching _epg.
Reviewed by: gallatin
Differential Revision: https://reviews.freebsd.org/D24598
next commit brings in second flag, so let them already be in the
future namespace.
Reviewed by: gallatin
Differential Revision: https://reviews.freebsd.org/D24598
but we need buffer of MLEN bytes. This isn't just a simplification,
but important fixup, because previous commit shrinked sizeof(struct
mbuf) down below MSIZE, and instantiating an mbuf on stack no longer
provides enough data.
Reviewed by: gallatin
Differential Revision: https://reviews.freebsd.org/D24598
The following series of patches addresses three things:
Now that array of pages is embedded into mbuf, we no longer need
separate structure to pass around, so struct mbuf_ext_pgs is an
artifact of the first implementation. And struct mbuf_ext_pgs_data
is a crutch to accomodate the main idea r359919 with minimal churn.
Also, M_EXT of type EXT_PGS are just a synonym of M_NOMAP.
The namespace for the newfeature is somewhat inconsistent and
sometimes has a lengthy prefixes. In these patches we will
gradually bring the namespace to "m_epg" prefix for all mbuf
fields and most functions.
Step 1 of 4:
o Anonymize mbuf_ext_pgs_data, embed in m_ext
o Embed mbuf_ext_pgs
o Start documenting all this entanglement
Reviewed by: gallatin
Differential Revision: https://reviews.freebsd.org/D24598
Previously procctl(PROC_PROTMAX_STATUS, ... used the PROC_ASLR_NOFORCE
macro for the "system-wide configured policy" status, instead of
PROC_PROTMAX_NOFORCE.
They both have a value of 3, so no functional change.
Sponsored by: The FreeBSD Foundation
Openfirmare enumerates and installs the driver for all processors,
regardless of whether they will be started later (because of power
constrains for example).
MFC after: 3 weeks
Otherwise, since the CV is not signalled until data is drained from the
socket, it is trivial to create an unkillable process using
sendfile(SF_SYNC) and a process-private PF_LOCAL socket pair. In
particular, the cv_wait() in sendfile() does not get interrupted until
data is drained from the receiving socket buffer.
Reported by: pho
Discussed with: kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Without this patch, sosend_generic() will try to use top->m_pkthdr.len,
assuming that the first mbuf has a pkthdr.
When a list of ext_pgs mbufs is passed in, the first mbuf is not a
pkthdr and cannot be post-r359919. As such, the value of top->m_pkthdr.len
is bogus (0 for my testing).
This patch fixes sosend_generic() to handle this case, calculating the
total length via m_length() for this case.
There is currently nothing that hands a list of ext_pgs mbufs to
sosend_generic(), but the nfs-over-tls kernel RPC code in
projects/nfs-over-tls will do that and was used to test this patch.
Reviewed by: gallatin
Differential Revision: https://reviews.freebsd.org/D24568
- Add a new TCP_RXTLS_ENABLE socket option to set the encryption and
authentication algorithms and keys as well as the initial sequence
number.
- When reading from a socket using KTLS receive, applications must use
recvmsg(). Each successful call to recvmsg() will return a single
TLS record. A new TCP control message, TLS_GET_RECORD, will contain
the TLS record header of the decrypted record. The regular message
buffer passed to recvmsg() will receive the decrypted payload. This
is similar to the interface used by Linux's KTLS RX except that
Linux does not return the full TLS header in the control message.
- Add plumbing to the TOE KTLS interface to request either transmit
or receive KTLS sessions.
- When a socket is using receive KTLS, redirect reads from
soreceive_stream() into soreceive_generic().
- Note that this interface is currently only defined for TLS 1.1 and
1.2, though I believe we will be able to reuse the same interface
and structures for 1.3.
For userland, MACHINE_ARCH reflects the current ABI via preprocessor
directives. For the kernel, the hw.machine_arch sysctl uses the ELF
header flags of the current process to select the correct MACHINE_ARCH
value.
Reviewed by: imp, kp
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D24543
This extends some of the changes in place to support reporting support
for 32-bit ABIs to permit reporting hard-float vs soft-float ABIs.
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D24542
Contrary to the kevent man page, EV_EOF on a fifo is not cleared by
EV_CLEAR. Modify the read and write filters to clear EV_EOF when the
fifo's PIPE_EOF flag is clear, and update the man page to document the
new behaviour.
Modify the write filter to return the amount of buffer space available
even if no readers are present. This matches the behaviour for sockets.
When reading from a pipe, only call pipeselwakeup() if some data was
actually read. This prevents the continuous re-triggering of a
EVFILT_READ event on EOF when in edge-triggered mode.
PR: 203366, 224615
Submitted by: Jan Kokemüller <jan.kokemueller@gmail.com>
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D24528
This ensures that pipe_poll() and the pipe kqueue filters observe
PIPE_EOF and set EV_EOF accordingly. As a result an extra call to
knote() after setting PIPE_EOF is unnecessary.
Submitted by: Jan Kokemüller <jan.kokemueller@gmail.com>
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D24528
Previously we allocated a separate VM object for each kernel stack.
However, fully constructed kernel stacks are cached by UMA, so there is
no harm in using a single global object for all stacks. This reduces
memory consumption and makes it easier to define a memory allocation
policy for kernel stack pages, with the aim of reducing physical memory
fragmentation.
Add a global kstack_object, and use the stack KVA address to index into
the object like we do with kernel_object.
Reviewed by: kib
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D24473
Release kernels have no KDB backends enabled, so they discard an NMI
if it is not due to a hardware failure. This includes NMIs from
IPMI BMCs and hypervisors.
Furthermore, the interaction of panic_on_nmi, kdb_on_nmi, and
debugger_on_panic is confusing.
Respond to all NMIs according to panic_on_nmi and debugger_on_panic.
Remove kdb_on_nmi. Expand the meaning of panic_on_nmi by making
it a bitfield. There are currently two bits: one for NMIs due to
hardware failure, and one for all others. Leave room for more.
If panic_on_nmi and debugger_on_panic are both true, don't actually panic,
but directly enter the debugger, to allow someone to leave the debugger
and [hopefully] resume normal execution.
Reviewed by: kib
MFC after: 2 weeks
Relnotes: yes: machdep.kdb_on_nmi is gone; machdep.panic_on_nmi changed
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D24558
This change is build on top of nexthop objects introduced in r359823.
Nexthops are separate datastructures, containing all necessary information
to perform packet forwarding such as gateway interface and mtu. Nexthops
are shared among the routes, providing more pre-computed cache-efficient
data while requiring less memory. Splitting the LPM code and the attached
data solves multiple long-standing problems in the routing layer,
drastically reduces the coupling with outher parts of the stack and allows
to transparently introduce faster lookup algorithms.
Route caching was (re)introduced to minimise (slow) routing lookups, allowing
for notably better performance for large TCP senders. Caching works by
acquiring rtentry reference, which is protected by per-rtentry mutex.
If the routing table is changed (checked by comparing the rtable generation id)
or link goes down, cache record gets withdrawn.
Nexthops have the same reference counting interface, backed by refcount(9).
This change merely replaces rtentry with the actual forwarding nextop as a
cached object, which is mostly mechanical. Other moving parts like cache
cleanup on rtable change remains the same.
Differential Revision: https://reviews.freebsd.org/D24340
blockcount_wait() still unconditionally waits for the count to reach
zero before returning.
Tested by: pho (a larger patch)
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D24513