outside the range of valid file descriptors
PR: kern/164970
Submitted by: Peter Jeremy <peterjeremy@acm.org>
Reviewed by: jilles
Approved by: cperciva
MFC after: 1 week
usermode context switches (long jumps and ucontext operations). If these
are used across threads, multiple threads can end up with the same TLS base.
Madness will then result.
This makes behavior on PPC match that on x86 systems and on Linux.
MFC after: 10 days
privilege attempts to toggle SF_SETTABLE flags.
- Use the '^' operator in the SF_SNAPSHOT anti-toggling check.
Flags are now stored to ip->i_flags in one place after all checks.
Submitted by: bde
is already open in this process.
If the named semaphore is already open, sem_open() only increments a
reference count and did not take the flags into account (which otherwise
happens by passing them to open()). Add an extra check for O_CREAT|O_EXCL.
PR: kern/166706
Reviewed by: davidxu
MFC after: 10 days
quotation. Also make sure we have the same amount of columns in each row as
the number of columns we specify in the head arguments.
Reviewed by: brueffer
application destroys semaphore after sem_wait returns. Just enter
kernel to wake up sleeping threads, only update _has_waiters if
it is safe. While here, check if the value exceed SEM_VALUE_MAX and
return EOVERFLOW if this is true.
Because the utmpx interface is generally not required to be thread-safe,
but it is nice to have, if easy to do so. Therefore don't make a mess
out of the code and only use it if __NO_TLS is not defined.
no waiters, we still increase and decrease count in user mode without
entering kernel, once there is a waiter, sem_post will enter kernel to
increase count and wake thread up, this is atomicy and allow us to
gracefully destroy semaphore after sem_wait returned.
pathnames.
With the current API (no *at functions), FTS_NOCHDIR requires that the
fts_accpath start with the original path passed to fts_open(); therefore,
the depth that can be reached is limited by the {PATH_MAX} constraint on
this pathname.
MFC after: 1 week
Just like kill(2), it is impossible for killpg(0, ...) to fail with
ESRCH, as a process always has a process group.
Discussed on: arch@
MFC after: 1 week
On FreeBSD, all processes have a process group, so it is impossible for
kill(2) to fail this way. POSIX also doesn't mention this error
condition.
Discussed on: arch@
MFC after: 3 weeks
- Fix TLS allocation for Variant I: both rtld and libc allocators
assume that tls_static_space includes space for TLS structure.
So increment calculated static size by the size of it.
syscall. Before r5958, seekdir() was called for its side effect of
freeing memory allocated by opendir() for rewinddir(), but that revision
added _reclaim_telldir() that frees all memory allocated by telldir()
calls, making this call redundant.
This introduces a slight change. If an application duplicated the descriptor
obtained through dirfd(), it can no longer rely on file position to be
reset to the start of file after a call to closedir(). It's believed to
be safe because neither POSIX, nor any other OS I've tested (NetBSD, Linux,
OS X) rewind the file offset pointer on closedir().
Reported by: Igor Sysoev
They were made excessive in r205424 by opening with O_DIRECTORY.
Also eliminated the fcntl() call used to set FD_CLOEXEC by opening
with O_CLOEXEC.
(fdopendir() still checks that the passed descriptor is a directory,
and sets FD_CLOEXEC on it.)
Reviewed by: ed
according to POSIX document, the clock ID may be dynamically allocated,
it unlikely will be in 64K forever. To make it future compatible, we
pack all timeout information into a new structure called _umtx_time, and
use fourth argument as a size indication, a zero means it is old code
using timespec as timeout value, but the new structure also includes flags
and a clock ID, so the size argument is different than before, and it is
non-zero. With this change, it is possible that a thread can sleep
on any supported clock, though current kernel code does not have such a
POSIX clock driver system.
initialize the cache of the system information as it was done for the
dynamic libc. This removes several sysctls from the static binary
startup.
Use the aux vector to fill the single struct dl_phdr_info describing
the static binary itself, to implement dl_iterate_phdr(3) for the
static binaries. [1]
Based on the submission by: John Marino <draco marino st> [1]
Tested by: flo (sparc64)
MFC after: 2 weeks
- Address performance regressions encountered by das@ by caching per-thread
data in TLS where available.
- Add a __NO_TLS flag to cdefs.h to indicate where not available.
- Reorganise the xlocale.h definitions into xlocale/*.h so that they can be
included from multiple places.
- Export the POSIX2008 subset of xlocale when POSIX2008 says it should be
exported, independently of whether xlocale.h is included.
- Fix the bug where programs using ctype functions always assumed ASCII unless
recompiled.
- Fix some style(9) violations.
Reviewed by: brooks (mentor)
Approved by: dim (mentor)
The reasoning behind this, is that if we are consistent in our
documentation about the uint*_t stuff, people will be less tempted to
write new code that uses the non-standard types.
I am not going to bump the man page dates, as these changes can be
considered style nits. The meaning of the man pages is unaffected.
MFC after: 1 month
At first, I added a utility called utxrm(8) to remove stale entries from
the user accounting database. It seems there are cases in which we need
to perform different operations on the database as well. Simply rename
utxrm(8) to utx(8) and place the old code under the "rm" command.
In addition to "rm", this tool supports "boot" and "shutdown", which are
going to be used by an rc-script which I am going to commit separately.
If the utmpx database gets updated while an application is reading it,
there is a chance the reading application processes partially
overwritten entries. To solve this, make sure we always read a multiple
of sizeof(struct futx) at a time.
MFC after: 2 weeks
conditional code parts not used by or applicable to FreeBSD.
The new implementation is supposed to be able to cope with changes to
the 'l' versions of the msghdr structs now used as well as to if_data
allowing future changes without breaking things.
This restores carp(4) config support in HEAD after r231504.
Reviewed by: glebius, brooks
MFC After: 3 months
on extended and extensible structs if_msghdrl and ifa_msghdrl. This
will allow us to extend both the msghdrl structs and eventually if_data
in the future without breaking the ABI.
Bump __FreeBSD_version to allow ports to more easily detect the new API.
Reviewed by: glebius, brooks
MFC after: 3 days
fit into existing mcontext_t.
On i386 and amd64 do return the extended FPU states using
getcontextx(3). For other architectures, getcontextx(3) returns the
same information as getcontext(2).
Tested by: pho
MFC after: 1 month
profiling and kernel profiling. To enable kernel profiling one has to build
kgmon(8). I will enable the build once I managed to build and test powerpc
(32-bit) kernels with profiling support.
- add a powerpc64 PROF_PROLOGUE for _mcount.
- add macros to avoid adding the PROF_PROLOGUE in certain assembly entries.
- apply these macros where needed.
- add size information to the MCOUNT function.
MFC after: 3 weeks, together with r230291
NetBSD's rev 1.6 of this file, on !defined(SOFTFLOAT_FOR_GCC). These
functions are provided by libgcc, so we don't need them. This should
unbreak mips.
the function bodies require only 2 to 10 instructions. However, it
leads to application binaries that refer to a private ABI, namely, the
softfloat innards in libc. This could complicate future changes in
the implementation of the floating-point emulation layer, so it seems
best to have programs refer to the official fe* entry points in libm.
original vendor, but we're using their heavily modified version.)
This brings in functions for long double emulation (both extended and
quad formats), which may be useful for testing, and also for replacing
libc/sparc64/fpu/.
dynamic rounding modes, but FPUless chips that use softfloat can support it
because everything is emulated anyway. (We presently have incomplete
support for hardware FPUs.)
Submitted by: Ian Lepore
The wtmpcvt(1) utility converts wtmp files to the new format used by
utmpx(3). Now that HEAD has been branched to stable/9 and 9.0 is
released, there is no need for it in HEAD.
MFC after: never
The C11 folks reinvented the wheel by introducing an aligned version of
malloc(3) called aligned_alloc(3), instead of posix_memalign(3). Instead
of returning the allocation by reference, it returns the address, just
like malloc(3).
Reviewed by: jasone@
This allows people to still write statically linked applications that
call strchr() or strrchr() and have a local variable or function called
index.
Discussed with: bde@
The index() and rindex() functions were marked LEGACY in the 2001
revision of POSIX and were subsequently removed from the 2008 revision.
The strchr() and strrchr() functions are part of the C standard.
This makes the source code a lot more consistent, as most of these C
files also call into other str*() routines. In fact, about a dozen
already perform strchr() calls.
As I looked through the C library, I noticed the FreeBSD MIPS port has a
hand-written version of index(). This is nice, if it weren't for the
fact that most applications call strchr() instead.
Also, on the other architectures index() and strchr() are identical,
meaning we have two identical pieces of code in the C library and
statically linked applications.
Solve this by naming the actual file strchr.[cS] and let it use
__strong_reference()/STRONG_ALIAS() to provide the index() routine. Do
the same for rindex()/strrchr().
This seems to make the C libraries and static binaries slightly smaller,
but this reduction in size seems negligible.
lib/libc/gen/strtofflags.c became const, but gcc did not warn about
assigning its members to non-const pointers. Clang warned about this
with:
lib/libc/gen/strtofflags.c:98:12: error: assigning to 'char *' from 'const char *' discards qualifiers [-Werror,-Wincompatible-pointer-types]
for (sp = mapping[i].invert ? mapping[i].name :
^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Reviewed by: jilles
Add an API for alerting internal libc routines to the presence of
"unsafe" paths post-chroot, and use it in ftpd. [11:07]
Fix a buffer overflow in telnetd. [11:08]
Make pam_ssh ignore unpassphrased keys unless the "nullok" option is
specified. [11:09]
Add sanity checking of service names in pam_start. [11:10]
Approved by: so (cperciva)
Approved by: re (bz)
Security: FreeBSD-SA-11:06.bind
Security: FreeBSD-SA-11:07.chroot
Security: FreeBSD-SA-11:08.telnetd
Security: FreeBSD-SA-11:09.pam_ssh
Security: FreeBSD-SA-11:10.pam
from scratch, copying needed functionality from the old implemenation
on demand, with a thorough review of all code. The main change is that
interface layer has been removed from the CARP. Now redundant addresses
are configured exactly on the interfaces, they run on.
The CARP configuration itself is, as before, configured and read via
SIOCSVH/SIOCGVH ioctls. A new prefix created with SIOCAIFADDR or
SIOCAIFADDR_IN6 may now be configured to a particular virtual host id,
which makes the prefix redundant.
ifconfig(8) semantics has been changed too: now one doesn't need
to clone carpXX interface, he/she should directly configure a vhid
on a Ethernet interface.
To supply vhid data from the kernel to an application the getifaddrs(8)
function had been changed to pass ifam_data with each address. [1]
The new implementation definitely closes all PRs related to carp(4)
being an interface, and may close several others. It also allows
to run a single redundant IP per interface.
Big thanks to Bjoern Zeeb for his help with inet6 part of patch, for
idea on using ifam_data and for several rounds of reviewing!
PR: kern/117000, kern/126945, kern/126714, kern/120130, kern/117448
Reviewed by: bz
Submitted by: bz [1]
yet (see LLVM PR 9788), and warns about it, rub it out for now. When
clang grows support for this attribute, I will revert this again.
MFC after: 1 week
conversion between enum desdir/desmode from include/rpc/des.h, and enum
desdir/desmode from include/rpcsvc/crypt.x. These are actually
different enums, with different value names, but by accident the integer
representation of the enum values happened to be the same.
MFC after: 1 week
warn about it. I guess this was originally done to silence a bogus
warning by an older version of gcc, but I could not reproduce it with
any version of gcc that I have access to.
MFC after: 1 week
__noreturn macro and modify the other exiting functions to use it.
The __noreturn macro, unlike __dead2, must be used BEFORE the function.
This is in line with the C and C++ specifications that place _Noreturn (c1x)
and [[noreturn]] (C++11) in front of the functions. As with __dead2, this
macro falls back to using the GCC attribute.
Unfortunately, clang currently sets the same value for the C version macro
in C99 and C1x modes, so these functions are hidden by default. At some
point before 10.0, I need to go through the headers and clean up the C1x /
C++11 visibility.
Reviewed by: brooks (mentor)
vs. the comment documented "If we are working with a privileged socket,
then take only one attempt". Make the code match.
Furthermore, critical privileged applications that [over] log a vast amount
can look like a DoS to this code. Given it's unlikely the single reattempted
send() will succeeded, avoid usurping the scheduler in a library API for a
single non-critical facility in critical applications.
Obtained from: Juniper Networks
Discussed with: glebius
than silently failing and returning success.
Without this, code calls pthread_once(), receives a return value of
success, and thinks that the passed function has been called.
Approved by: dim (mentor)
system calls to provide feed-forward clock management capabilities to
userspace processes. ffclock_getcounter() returns the current value of the
kernel's feed-forward clock counter. ffclock_getestimate() returns the current
feed-forward clock parameter estimates and ffclock_setestimate() updates the
feed-forward clock parameter estimates.
- Document the syscalls in the ffclock.2 man page.
- Regenerate the script-derived syscall related files.
Committed on behalf of Julien Ridoux and Darryl Veitch from the University of
Melbourne, Australia, as part of the FreeBSD Foundation funded "Feed-Forward
Clock Synchronization Algorithms" project.
For more information, see http://www.synclab.org/radclock/
Submitted by: Julien Ridoux (jridoux at unimelb edu au)
load of _l suffixed versions of various standard library functions that use
the global locale, making them take an explicit locale parameter. Also
adds support for per-thread locales. This work was funded by the FreeBSD
Foundation.
Please test any code you have that uses the C standard locale functions!
Reviewed by: das (gdtoa changes)
Approved by: dim (mentor)
change here is to ensure that when a process forks after arc4random
is seeded, the parent and child don't observe the same random sequence.
OpenBSD's fix introduces some additional overhead in the form of a
getpid() call. This could be improved upon, e.g., by setting a flag
in fork(), if it proves to be a problem.
This was discussed with secteam (simon, csjp, rwatson) in 2008, shortly
prior to my going out of town and forgetting all about it. The conclusion
was that the problem with forks is worrisome, but it doesn't appear to
have introduced an actual vulnerability for any known programs.
The only significant remaining difference between our arc4random and
OpenBSD's is in how we seed the generator in arc4_stir().
OpenBSD's version (r1.22). While some of our style changes were
indeed small improvements, being able to easily track functionality
changes in OpenBSD seems more useful.
Also fix style bugs in the FreeBSD-specific parts of this file.
No functional changes, as verified with md5.
The size passed to strlcat() must depend on the input length, not the
output length. Because the input and output buffers are equal in size,
the resulting binary does not change at all.
madvise(2) except that it operates on a file descriptor instead of a
memory region. It is currently only supported on regular files.
Just as with madvise(2), the advice given to posix_fadvise(2) can be
divided into two types. The first type provide hints about data access
patterns and are used in the file read and write routines to modify the
I/O flags passed down to VOP_READ() and VOP_WRITE(). These modes are
thus filesystem independent. Note that to ease implementation (and
since this API is only advisory anyway), only a single non-normal
range is allowed per file descriptor.
The second type of hints are used to hint to the OS that data will or
will not be used. These hints are implemented via a new VOP_ADVISE().
A default implementation is provided which does nothing for the WILLNEED
request and attempts to move any clean pages to the cache page queue for
the DONTNEED request. This latter case required two other changes.
First, a new V_CLEANONLY flag was added to vinvalbuf(). This requests
vinvalbuf() to only flush clean buffers for the vnode from the buffer
cache and to not remove any backing pages from the vnode. This is
used to ensure clean pages are not wired into the buffer cache before
attempting to move them to the cache page queue. The second change adds
a new vm_object_page_cache() method. This method is somewhat similar to
vm_object_page_remove() except that instead of freeing each page in the
specified range, it attempts to move clean pages to the cache queue if
possible.
To preserve the ABI of struct file, the f_cdevpriv pointer is now reused
in a union to point to the currently active advice region if one is
present for regular files.
Reviewed by: jilles, kib, arch@
Approved by: re (kib)
MFC after: 1 month
the word alignment, some versions of gcc do require 16-byte alignment.
Make sure the stack is 16-byte aligned before calling a subroutine.
Inspired by: PR amd64/162214
MFC after: 1 week
as it is required by amd64 ABI. Add a comment for the places were
the stack is accidentally properly aligned already.
PR: amd64/162214
Submitted by: yamayan <yamayan kbh biglobe ne jp>
MFC after: 1 week
When booting the system, truncate the utx.active file, but do write the
BOOT_TIME record into it afterwards. This allows one to obtain the boot
time of the system as follows:
struct utmpx u1 = { .ut_type = BOOT_TIME }, *u2;
setutxent();
u2 = getutxid(&u1);
Now, the boot time is stored in u2->ut_tv, just like on Linux and other
systems.
We don't open the utx.active file with O_EXLOCK. It's rather unlikely
that other applications use this database at the same time and I want to
prevent the possibility of deadlocks in init(8).
Discussed with: pluknet
working MI one. The MI one only needs to be overridden on machines
with non-IEEE754 arithmetic. (The last supported one was the VAX.)
It can also be overridden if someone comes up with a faster one that
actually passes the regression tests -- but this is harder than it sounds.
Even though POSIX allows us to return simply /dev/tty as a pathname
identifying the controlling terminal of the running process, it is nicer
if this function were actually useful, by returning the actual pathname
of the controlling terminal.
Implement ctermid() by using the kern.devname sysctl to resolve the
actual name of /dev/tty. Don't use devname(3), since it may return bogus
strings like #C:0x123.
As of FreeBSD 6, devices can only be opened through devfs. These device
nodes don't have major and minor numbers anymore. The st_rdev field in
struct stat is simply based a copy of st_ino.
Simply display device numbers as hexadecimal, using "%#jx". This is
allowed by POSIX, since it explicitly states things like the following
(example taken from ls(1)):
"If the file is a character special or block special file, the
size of the file may be replaced with implementation-defined
information associated with the device in question."
This makes the output of these commands more compact. For example, ls(1)
now uses approximately four columns less. While there, simplify the
column length calculation from ls(1) by calling snprintf() with a NULL
buffer.
Don't be afraid; if needed one can still obtain individual major/minor
numbers using stat(1).
conversion, conversion must fail and errno must be set to E2BIG.
PR: standards/160673
Submitted by: Henning Petersen <henning.petersen@t-online.de>
Reviewed by: pluknet
Approved by: re (kib), delphij (mentor)
A "process descriptor" file descriptor is used to manage processes
without using the PID namespace. This is required for Capsicum's
Capability Mode, where the PID namespace is unavailable.
New system calls pdfork(2) and pdkill(2) offer the functional equivalents
of fork(2) and kill(2). pdgetpid(2) allows querying the PID of the remote
process for debugging purposes. The currently-unimplemented pdwait(2) will,
in the future, allow querying rusage/exit status. In the interim, poll(2)
may be used to check (and wait for) process termination.
When a process is referenced by a process descriptor, it does not issue
SIGCHLD to the parent, making it suitable for use in libraries---a common
scenario when using library compartmentalisation from within large
applications (such as web browsers). Some observers may note a similarity
to Mach task ports; process descriptors provide a subset of this behaviour,
but in a UNIX style.
This feature is enabled by "options PROCDESC", but as with several other
Capsicum kernel features, is not enabled by default in GENERIC 9.0.
Reviewed by: jhb, kib
Approved by: re (kib), mentor (rwatson)
Sponsored by: Google Inc
* Decouple the path supervision using a separate HB timer per path.
* Add support for potentially failed state.
* Bring back RTO.min to 1 second.
* Accept packets on IP-addresses already announced via an ASCONF
* While there: do some cleanups.
Approved by: re@
MFC after: 2 months.
These system calls have already been implemented in the kernel; now we
hook up libc symbols so userspace can drive them.
Approved by: re (kib), mentor (rwatson)
Sponsored by: Google Inc
-g, by reverting r219139. The LLVM PR referenced in that revision was
fixed in the mean time, and we imported a clang snapshot soon
afterwards, so the temporary workaround of disabling clang's integrated
assembler is no longer needed.
In this particular case, using e.g. DEBUG_FLAGS=-g causes clang to
output certain directives into assembly that our version of GNU as
chokes on.
Reported by: dougb
Approved by: re (kib)
Formerly, in this case an error was returned but the pid was also returned
to the application, requiring the application to use unspecified behaviour
(the returned pid in error situations) to avoid zombies.
Now, reap the zombie and do not return the pid.
MFC after: 2 weeks
As noted in Austin Group issue #370 (an interpretation has been issued),
failing posix_spawn() because an fd specified with
posix_spawn_file_actions_addclose() is not open is unnecessarily harsh, and
there are existing implementations that do not fail posix_spawn() for this
reason.
Reviewed by: ed
MFC after: 10 days
Some files keep the SUN4V tags as a code reference, for the future,
if any rewamped sun4v support wants to be added again.
Reviewed by: marius
Tested by: sbruno
Approved by: re
* Cleanup usage of iov's.
* Add support for SCTP_TIMEOUTS socketoption.
* Fix a bug in sctp_recvmsg(): return the msg_flags in case of an error.
* Fix a bug in the error handling of sctp_peeloff(): return the -1.
- While here, remove a few C comments that don't seem to contribute
anything additional to the man page.
PR: 146047
Submitted by: arundel
MFC after: 3 days
vop_stdallocate() is filesystem agnostic and will run as slow as a
read/write loop in userspace; however, it serves to correctly
implement the functionality for filesystems that do not implement a
VOP_ALLOCATE.
Note that __FreeBSD_version was already bumped today to 900036 for any
ports which would like to use this function.
Also reserve space in the syscall table for posix_fadvise(2).
Reviewed by: -arch (previous version)
negative return value from write to update its position in
a buffer. The patch, courtesy of Andrey Simonenko, also simplifies
a conditional by removing the "i != cnt" clause, since it is
always true at this point in the code. The bug caused problems
for mountd, when it generated a large reply to an exports RPC
request.
Submitted by: simon at comsys.ntu-kpi.kiev.ua
MFC after: 2 weeks
Of course, strerror_r() may still fail with ERANGE.
Although the POSIX specification said this could fail with EINVAL and
doing this likely indicates invalid use of errno, most other
implementations permitted it, various POSIX testsuites require it to
work (matching the older sys_errlist array) and apparently some
applications depend on it.
PR: standards/151316
MFC after: 1 week
syscall assembly files. This results in conflicting dependencies and can
cause unexpected results for parallel builds. This is because the .c file
and the .S file both generate the same .o file.
Submitted by: Simon Gerraty <sjg@juniper.net>
Sponsored by: Juniper Networks