userspace to control NUMA policy administratively and programmatically.
Implement domainset based iterators in the page layer.
Remove the now legacy numa_* syscalls.
Cleanup some header polution created by having seq.h in proc.h.
Reviewed by: markj, kib
Discussed with: alc
Tested by: pho
Sponsored by: Netflix, Dell/EMC Isilon
Differential Revision: https://reviews.freebsd.org/D13403
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
- Add a new KTR_STRUCT_ARRAY ktrace record type which dumps an array of
structures.
The structure name in the record payload is preceded by a size_t
containing the size of the individual structures. Use this to
replace the previous code that dumped the kevent arrays dumped for
kevent(). kdump is now able to decode the kevent structures rather
than dumping their contents via a hexdump.
One change from before is that the 'changes' and 'events' arrays are
not marked with separate 'read' and 'write' annotations in kdump
output. Instead, the first array is the 'changes' array, and the
second array (only present if kevent doesn't fail with an error) is
the 'events' array. For kevent(), empty arrays are denoted by an
entry with an array containing zero entries rather than no record.
- Move kevent decoding tables from truss to libsysdecode.
This adds three new functions to decode members of struct kevent:
sysdecode_kevent_filter, sysdecode_kevent_flags, and
sysdecode_kevent_fflags.
kdump uses these helper functions to pretty-print kevent fields.
- Move structure definitions for freebsd11 and freebsd32 kevent
structures to <sys/event.h> so that they can be shared with userland.
The 32-bit structures are only exposed if _WANT_KEVENT32 is defined.
The freebsd11 structures are only exposed if _WANT_FREEBSD11_KEVENT is
defined. The 32-bit freebsd11 structure requires both.
- Decode freebsd11 kevent structures in truss for the compat11.kevent()
system call.
- Log 32-bit kevent structures via ktrace for 32-bit compat kevent()
system calls.
- While here, constify the 'void *data' argument to ktrstruct().
Reviewed by: kib (earlier version)
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D12470
and convert any messages of types SCM_BINTIME, SCM_TIMESTAMP,
SCM_REALTIME and SCM_MONOTONIC from 64-bit to its 32-bit
representation. Otherwise we either run out of user-supplied
buffer to copy those out resulting in the MSG_CTRUNC or simply
return values that the userland 32-bit code is not going
to parse correctly. This fixes at least two regression tests
failing to function properly in 32-bit compat mode:
tools/regression/sockets/udp_pingpong
tools/regression/sockets/unix_cmsg
PR: kern/222039
MFC after: 30 days
All 32bit MIPS ABIs align uint64_t on 8-byte. Since struct kevent32
is defined using 32bit types to avoid extra alignment on amd64/i386,
layout of the structure needs paddings on PowerPC and apparently MIPS.
Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D11434
Only amd64 (because of i386) needs 32-bit time_t compat now, everything else is
64-bit time_t. Rather than checking on all 64-bit time_t archs, only check the
oddball amd64/i386.
Reviewed By: emaste, kib, andrew
Differential Revision: https://reviews.freebsd.org/D11364
AKA Make time_t 64 bits on powerpc(32).
PowerPC currently (until now) was one of two architectures with a 32-bit time_t
on 32-bit archs (the other being i386). This is an ABI breakage, so all ports,
and all local binaries, *must* be recompiled.
Tested by: andreast, others
MFC after: Never
Relnotes: Yes
This change implements NOTE_ABSTIME flag for EVFILT_TIMER, which
specifies that the data field contains absolute time to fire the
event.
To make this useful, data member of the struct kevent must be extended
to 64bit. Using the opportunity, I also added ext members. This
changes struct kevent almost to Apple struct kevent64, except I did
not changed type of ident and udata, the later would cause serious API
incompatibilities.
The type of ident was kept uintptr_t since EVFILT_AIO returns a
pointer in this field, and e.g. CHERI is sensitive to the type
(discussed with brooks, jhb).
Unlike Apple kevent64, symbol versioning allows us to claim ABI
compatibility and still name the new syscall kevent(2). Compat shims
are provided for both host native and compat32.
Requested by: bapt
Reviewed by: bapt, brooks, ngie (previous version)
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D11025
Provide a new mode "2" which returns a special overflow indicator in
the non-representable field instead of the silent truncation (mode
"0") or EOVERFLOW (mode "1").
In particular, the typical use of st_ino to detect hard links with
mode "2" reports false positives, which might be more suitable for
some uses.
Discussed with: bde
Sponsored by: The FreeBSD Foundation
inode number or link count for the ABI compat binaries.
Right now, and by default after the change, too large 64bit values are
silently truncated to 32 bits. Enabling the knob causes the system to
return EOVERFLOW for stat(2) family of compat syscalls when some
values cannot be completely represented by the old structures. For
getdirentries(2), knob skips the dirents which would cause non-trivial
truncation of d_ino.
EOVERFLOW error is specified by the X/Open 1996 LFS document
('Adding Support for Arbitrary File Sizes to the Single UNIX
Specification').
Based on the discussion with: bde
Sponsored by: The FreeBSD Foundation
Extend the ino_t, dev_t, nlink_t types to 64-bit ints. Modify
struct dirent layout to add d_off, increase the size of d_fileno
to 64-bits, increase the size of d_namlen to 16-bits, and change
the required alignment. Increase struct statfs f_mntfromname[] and
f_mntonname[] array length MNAMELEN to 1024.
ABI breakage is mitigated by providing compatibility using versioned
symbols, ingenious use of the existing padding in structures, and
by employing other tricks. Unfortunately, not everything can be
fixed, especially outside the base system. For instance, third-party
APIs which pass struct stat around are broken in backward and
forward incompatible ways.
Kinfo sysctl MIBs ABI is changed in backward-compatible way, but
there is no general mechanism to handle other sysctl MIBS which
return structures where the layout has changed. It was considered
that the breakage is either in the management interfaces, where we
usually allow ABI slip, or is not important.
Struct xvnode changed layout, no compat shims are provided.
For struct xtty, dev_t tty device member was reduced to uint32_t.
It was decided that keeping ABI compat in this case is more useful
than reporting 64-bit dev_t, for the sake of pstat.
Update note: strictly follow the instructions in UPDATING. Build
and install the new kernel with COMPAT_FREEBSD11 option enabled,
then reboot, and only then install new world.
Credits: The 64-bit inode project, also known as ino64, started life
many years ago as a project by Gleb Kurtsou (gleb). Kirk McKusick
(mckusick) then picked up and updated the patch, and acted as a
flag-waver. Feedback, suggestions, and discussions were carried
by Ed Maste (emaste), John Baldwin (jhb), Jilles Tjoelker (jilles),
and Rick Macklem (rmacklem). Kris Moore (kris) performed an initial
ports investigation followed by an exp-run by Antoine Brodin (antoine).
Essential and all-embracing testing was done by Peter Holm (pho).
The heavy lifting of coordinating all these efforts and bringing the
project to completion were done by Konstantin Belousov (kib).
Sponsored by: The FreeBSD Foundation (emaste, kib)
Differential revision: https://reviews.freebsd.org/D10439
The previous misuse of sys_sigqueue() was sending random register or
stack garbage to 64-bit targets. The freebsd32 implementation preserves
the sival_int member of value when signaling a 64-bit process.
Document the mixed ABI implementation of union sigval and the
incompability of sival_ptr with pointer integrity schemes.
Reviewed by: kib, wblock
MFC after: 1 week
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D10605
Add a clock_nanosleep() syscall, as specified by POSIX.
Make nanosleep() a wrapper around it.
Attach the clock_nanosleep test from NetBSD. Adjust it for the
FreeBSD behavior of updating rmtp only when interrupted by a signal.
I believe this to be POSIX-compliant, since POSIX mentions the rmtp
parameter only in the paragraph about EINTR. This is also what
Linux does. (NetBSD updates rmtp unconditionally.)
Copy the whole nanosleep.2 man page from NetBSD because it is complete
and closely resembles the POSIX description. Edit, polish, and reword it
a bit, being sure to keep any relevant text from the FreeBSD page.
Reviewed by: kib, ngie, jilles
MFC after: 3 weeks
Relnotes: yes
Sponsored by: Dell EMC
Differential Revision: https://reviews.freebsd.org/D10020
nanosleep() updates rmtp on EINVAL. In that case, kern_nanosleep()
has not updated rmt, so sys_nanosleep() updates the user-space rmtp
by copying garbage from its stack frame. This is not only a kernel
memory disclosure, it's also not POSIX-compliant. Fix it to update
rmtp only on EINTR.
Reviewed by: jilles (via D10020), dchagin
MFC after: 3 days
Security: possibly
Sponsored by: Dell EMC
Differential Revision: https://reviews.freebsd.org/D10044
Rename kern_vm_* functions to kern_*. Move the prototypes to
syscallsubr.h. Also change Mach VM types to uintptr_t/size_t as
needed, to avoid headers pollution.
Requested by: alc, jhb
Reviewed by: alc
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D9535
kern_vm_munmap(), and kern_vm_madvise(), and use them in various compats
instead of their sys_*() counterparts.
Reviewed by: ed, dchagin, kib
MFC after: 2 weeks
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D9378
and use it in compats instead of their sys_*() counterparts.
Reviewed by: kib, jhb, dchagin
MFC after: 2 weeks
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D9383
of their sys_*() counterparts. The svr4 is left unchanged.
Reviewed by: kib@
MFC after: 2 weeks
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D9379
Right now size of the structure is 472 bytes on amd64, which is
already large and stack allocations are indesirable. With the ino64
work, MNAMELEN is increased to 1024, which will make it impossible to have
struct statfs on the stack.
Extracted from: ino64 work by gleb
Discussed with: mckusick
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
This argument is not a bitmask of flags, but only accepts a single value.
Fail with EINVAL if an invalid value is passed to 'flag'. Rename the
'flags' argument to getmntinfo(3) to 'mode' as well to match.
This is a followup to r308088.
Reviewed by: kib
MFC after: 1 month
Both can be used to cause processes in capability mode to receive
SIGTRAP when ENOTCAPABLE or ECAPMODE errors are returned from
syscalls.
Idea by: emaste
Reviewed by: oshogbo (previous version), emaste
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D7965
Mark the pipe() system call as COMPAT10.
As of r302092 libc uses pipe2() with a zero flags value instead of pipe().
Approved by: re (gjb)
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D6816
data (headers). Historically the size of the headers was not checked
against the socket buffer space. Application could easily overcommit the
socket buffer space.
With the new sendfile (r293439) the problem remained, but a KASSERT was
inserted that checked that amount of data written to the socket matches
its space. In case when size of headers is bigger that socket space,
KASSERT fires. Without INVARIANTS the new sendfile won't panic, but
would report incorrect amount of bytes sent.
o With this change, the headers copyin is moved down into the cycle, after
the sbspace() check. The uio size is trimmed by socket space there,
which fixes the overcommit problem and its consequences.
o The compatibility handling for FreeBSD 4 sendfile headers API is pushed
up the stack to syscall wrappers. This required a copy and paste of the
code, but in turn this allowed to remove extra stack carried parameter
from fo_sendfile_t, and embrace entire compat code into #ifdef. If in
future we got more fo_sendfile_t function, the copy and paste level would
even reduce.
Reviewed by: emax, gallatin, Maxim Dounin <mdounin mdounin.ru>
Tested by: Vitalij Satanivskij <satan ukr.net>
Sponsored by: Netflix
- Set td_errno so that ktrace and dtrace can obtain the syscall error
number in the usual way.
- Pass negative error numbers directly to the syscall layer, as they're
not intended to be returned to userland.
Reviewed by: kib
Sponsored by: EMC / Isilon Storage Division
Differential Revision: https://reviews.freebsd.org/D5425
allocated from exec_map. If many threads try to perform execve(2) in
parallel, the exec map is exhausted and some threads sleep
uninterruptible waiting for the map space. Then, the thread which won
the race for the space allocation, cannot single-thread the process,
causing deadlock.
Reported and tested by: pho (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
rework. The number of entries was supposed to be returned to the user,
not used as a scratch variable.
This broke RELENG_4 jails starting up on current systems.
The core kernel part is patch file utimes.2008.4.diff from
pluknet@FreeBSD.org. I updated the code for API changes, added the manual
page and added compatibility code for old kernels. There is also audit and
Capsicum support.
A new UTIME_* constant might allow setting birthtimes in future.
Differential Revision: https://reviews.freebsd.org/D1426
Submitted by: pluknet (partially)
Reviewed by: delphij, pluknet, rwatson
Relnotes: yes
attachment to the process. Note that the command is not intended to
be a security measure, rather it is an obfuscation feature,
implemented for parity with other operating systems.
Discussed with: jilles, rwatson
Man page fixes by: rwatson
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
the orphaned descendants. Base of the API is modelled after the same
feature from the DragonFlyBSD.
Requested by: bapt
Reviewed by: jilles (previous version)
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks
have both kern_open() and kern_openat(); change the callers to use
kern_openat().
This removes one (sometimes two) levels of indirection and
consolidates arguments checks.
Reviewed by: mckusick
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
ever used. It didn't go into stable/10, neither was documented.
It might be useful, but we collectively decided to remove it, rather
leave it abandoned and unmaintained. It is removed in one single
commit, so restoring it should be easy, if anyone wants to reopen
this idea.
Sponsored by: Netflix
The kernel tracks syscall users so that modules can safely unregister them.
But if the module is not unloadable or was compiled into the kernel, there is
no need to do this.
Achieve this by adding SY_THR_STATIC_KLD macro which expands to SY_THR_STATIC
during kernel build and 0 otherwise.
Reviewed by: kib (previous version)
MFC after: 2 weeks
struct flock are done in the sys_fcntl(), which mean that compat32 used
direct access to userland pointers.
Move code from sys_fcntl() to new wrapper, kern_fcntl_freebsd(), which
performs neccessary userland memory accesses, and use it from both
native and compat32 fcntl syscalls.
Reported by: jhibbits
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
This includes:
o All directories named *ia64*
o All files named *ia64*
o All ia64-specific code guarded by __ia64__
o All ia64-specific makefile logic
o Mention of ia64 in comments and documentation
This excludes:
o Everything under contrib/
o Everything under crypto/
o sys/xen/interface
o sys/sys/elf_common.h
Discussed at: BSDcan