Commit Graph

131 Commits

Author SHA1 Message Date
Warner Losh
5ab191c42b Forward compatibility for ino64.
Add forward compatibility so that new binaries can run on old
kernels. If the new system call from ino64 isn't available on your
system, then the old one will be used and the results translated.  The
stat and statfs families of functions are fully emulated. While not
required by policy, in this case it is helpful to our users to provide
this compatibility. In this case, it allows rollback of the kernel
after installing a new userland should a problem be discovered. It
also prevents foot-shooting if a user does an install before rebooting
with the new kernel. Finally, it allows the use case where one needs
to run new binaries on an old kernel as part of an upgrade process.

The getdirentries family uses tricks that may not work on remote
filesystems. Specifically, it uses a buffer 1/4 the size requested to
get the data from he old syscall.

The code carefully uses direct syscalls for old system calls to avoid
referencing freebsd11_* symbols, which contaminate ld-elf.so.1's
export table due to its use of stat functions, which causes errno to
be incorrect in client programs due to the wrong *stat* function being
resolved in some cases.

This code should removed sometime after 12 is branched.

Tested on: 12-current binaries on a 10.3-beta kernel run and return
       consistent results. 12-current kernel and userland with
       packages from before ino64 was committed also work.

Differential Revision: https://reviews.freebsd.org/D11185
Reviewed by: kib@, emaste@
2017-06-23 18:06:20 +00:00
Konstantin Belousov
2b34e84335 Add abstime kqueue(2) timers and expand struct kevent members.
This change implements NOTE_ABSTIME flag for EVFILT_TIMER, which
specifies that the data field contains absolute time to fire the
event.

To make this useful, data member of the struct kevent must be extended
to 64bit.  Using the opportunity, I also added ext members.  This
changes struct kevent almost to Apple struct kevent64, except I did
not changed type of ident and udata, the later would cause serious API
incompatibilities.

The type of ident was kept uintptr_t since EVFILT_AIO returns a
pointer in this field, and e.g. CHERI is sensitive to the type
(discussed with brooks, jhb).

Unlike Apple kevent64, symbol versioning allows us to claim ABI
compatibility and still name the new syscall kevent(2).  Compat shims
are provided for both host native and compat32.

Requested by:	bapt
Reviewed by:	bapt, brooks, ngie (previous version)
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D11025
2017-06-17 00:57:26 +00:00
Eric van Gyzen
5a6d7b723f libthr: fix warnings from GCC when WARNS=6
Fix warnings about:
- redundant declarations
- a local variable shadowing a global function (dlinfo)
- an old-style function definition (with an empty parameter list)
- a variable that is possibly used uninitialized

"make tinderbox" passes this time, except for a few unrelated
kernel failures.

Reviewed by:	kib
MFC after:	3 days
Sponsored by:	Dell EMC
Differential Revision:	https://reviews.freebsd.org/D10870
2017-05-23 16:12:50 +00:00
Konstantin Belousov
6992112349 Commit the 64-bit inode project.
Extend the ino_t, dev_t, nlink_t types to 64-bit ints.  Modify
struct dirent layout to add d_off, increase the size of d_fileno
to 64-bits, increase the size of d_namlen to 16-bits, and change
the required alignment.  Increase struct statfs f_mntfromname[] and
f_mntonname[] array length MNAMELEN to 1024.

ABI breakage is mitigated by providing compatibility using versioned
symbols, ingenious use of the existing padding in structures, and
by employing other tricks.  Unfortunately, not everything can be
fixed, especially outside the base system.  For instance, third-party
APIs which pass struct stat around are broken in backward and
forward incompatible ways.

Kinfo sysctl MIBs ABI is changed in backward-compatible way, but
there is no general mechanism to handle other sysctl MIBS which
return structures where the layout has changed. It was considered
that the breakage is either in the management interfaces, where we
usually allow ABI slip, or is not important.

Struct xvnode changed layout, no compat shims are provided.

For struct xtty, dev_t tty device member was reduced to uint32_t.
It was decided that keeping ABI compat in this case is more useful
than reporting 64-bit dev_t, for the sake of pstat.

Update note: strictly follow the instructions in UPDATING.  Build
and install the new kernel with COMPAT_FREEBSD11 option enabled,
then reboot, and only then install new world.

Credits: The 64-bit inode project, also known as ino64, started life
many years ago as a project by Gleb Kurtsou (gleb).  Kirk McKusick
(mckusick) then picked up and updated the patch, and acted as a
flag-waver.  Feedback, suggestions, and discussions were carried
by Ed Maste (emaste), John Baldwin (jhb), Jilles Tjoelker (jilles),
and Rick Macklem (rmacklem).  Kris Moore (kris) performed an initial
ports investigation followed by an exp-run by Antoine Brodin (antoine).
Essential and all-embracing testing was done by Peter Holm (pho).
The heavy lifting of coordinating all these efforts and bringing the
project to completion were done by Konstantin Belousov (kib).

Sponsored by:	The FreeBSD Foundation (emaste, kib)
Differential revision:	https://reviews.freebsd.org/D10439
2017-05-23 09:29:05 +00:00
Eric van Gyzen
07f29d9f76 Remove old spinlock_debug code from libc
This no longer seems useful.  Remove it.

This was prompted by a "cast discards volatile qualifier" warning
in libthr when WARNS=6.

Reviewed by:	kib
MFC after:	3 days
Sponsored by:	Dell EMC
Differential Revision:	https://reviews.freebsd.org/D10832
2017-05-20 17:32:01 +00:00
Konstantin Belousov
9851b3400a Implement the memset_s(3) function as specified by the C11 ISO/IEC
9899:2011 Appendix K 3.7.4.1.

Other needed supporting types, defines and constraint_handler
infrastructure is added as specified in the C11 spec.

Submitted by:	Tom Rix <trix@juniper.net>
Sponsored by:	Juniper Networks
Discussed with:	ed
MFC after:	3 weeks
Differential revision:	https://reviews.freebsd.org/D9903
Differential revision:	https://reviews.freebsd.org/D10161
2017-03-30 04:57:26 +00:00
Eric van Gyzen
3f8455b090 Add clock_nanosleep()
Add a clock_nanosleep() syscall, as specified by POSIX.
Make nanosleep() a wrapper around it.

Attach the clock_nanosleep test from NetBSD. Adjust it for the
FreeBSD behavior of updating rmtp only when interrupted by a signal.
I believe this to be POSIX-compliant, since POSIX mentions the rmtp
parameter only in the paragraph about EINTR. This is also what
Linux does. (NetBSD updates rmtp unconditionally.)

Copy the whole nanosleep.2 man page from NetBSD because it is complete
and closely resembles the POSIX description. Edit, polish, and reword it
a bit, being sure to keep any relevant text from the FreeBSD page.

Reviewed by:	kib, ngie, jilles
MFC after:	3 weeks
Relnotes:	yes
Sponsored by:	Dell EMC
Differential Revision:	https://reviews.freebsd.org/D10020
2017-03-19 00:51:12 +00:00
Eric van Gyzen
b215ceaaec Add sem_clockwait_np()
This function allows the caller to specify the reference clock
and choose between absolute and relative mode.  In relative mode,
the remaining time can be returned.

The API is similar to clock_nanosleep(3).  Thanks to Ed Schouten
for that suggestion.

While I'm here, reduce the sleep time in the semaphore "child"
test to greatly reduce its runtime.  Also add a reasonable timeout.

Reviewed by:	ed (userland)
MFC after:	2 weeks
Relnotes:	yes
Sponsored by:	Dell EMC
Differential Revision:	https://reviews.freebsd.org/D9656
2017-02-23 19:36:38 +00:00
Konstantin Belousov
b7c7684ae2 Export __cxa_thread_atexit_impl as an alias for __cxa_thread_atexit.
libstdc++ before gcc r244057 expected that libc provided
__cxa_thread_atexit_impl, and libstdc++ implemented
__cxa_thread_atexit, by forwarding the calls to _impl.  Mentioned gcc
revision checks for __cxa_thread_atexit in libc and does not provide
the symbol from libstdc++ if found.

This change helps older gcc, in particular, all released versions
which implement thread_local, by consolidating the implementation into
libc.  For that versions, if configured with the current libc, the
__cxa_thread_atexit is exported from libstdc++ as a trivial wrapper
around libc::__cxa_thread_atexit_impl.

The __cxa_thread_atexit implementation is put into separate source
file to allow for static linking with older libstdc++.a.

gcc bugzilla:	https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78968
Reported by:	Hannes Hauswedell <h2+fbsdports@fsfe.org>
PR:	215709
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2017-01-07 16:05:19 +00:00
Konstantin Belousov
afd3e268d2 Rewrite ptrace(2) wrappers in C.
Besides removing hand-translation to assembler, this also adds missing
wrappers for arm64 and risc-v.

Reviewed by:	emaste, jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D7694
2016-08-29 18:47:51 +00:00
Konstantin Belousov
1c1cc89580 The fdatasync(2) call must be cancellation point.
Sponsored by:	The FreeBSD Foundation
MFC after:	13 days
2016-08-16 08:27:03 +00:00
Konstantin Belousov
b585cd3e2c Add __cxa_thread_atexit(3) API implementation.
This is the backing feature to implement C++11 thread storage duration
specified by the thread_local keyword.  A destructor for given
thread-local object is registered to be executed at the thread
termination time using __cxa_thread_atexit().  Libc calls the
__cxa_thread_calls_dtors() during exit(3), before finalizers and
atexit functions, and libthr calls the function at the thread
termination time, after the stack unwinding and thread-specific key
destruction.

There are several uncertainties in the API which lacks a formal
specification.  Among them:
- is it allowed to register destructors during destructing;
	we allow, but limiting the nesting level.  If too many iterations
	detected, a diagnostic is issued to stderr and thread forcibly
	terminates for now.
- how to handle destructors which belong to an unloading dso;
	for now, we ignore destructor calls for such entries, and
	issue a diagnostic.  Linux does prevent dso unload until all
	threads with destructors from the dso terminated.
It is supposed that the diagnostics allow to detect real-world
applications relying on the above details and possibly adjust
our implementation.  Right now the choices were to provide the slim
API (but that rarely stands the practice test).

Tests are added to check generic functionality and to specify some of
the above implementation choices.

Submitted by:	Mahdi Mokhtari <mokhi64_gmail.com>
Reviewed by:	theraven
Discussed with:	dim (detection of -std=c++11 supoort for tests)
Sponsored by:	The FreeBSD Foundation (my involvement)
MFC after:	2 weeks
Differential revisions:	https://reviews.freebsd.org/D7224,
    https://reviews.freebsd.org/D7427
2016-08-06 13:32:40 +00:00
Konstantin Belousov
2a339d9e3d Add implementation of robust mutexes, hopefully close enough to the
intention of the POSIX IEEE Std 1003.1TM-2008/Cor 1-2013.

A robust mutex is guaranteed to be cleared by the system upon either
thread or process owner termination while the mutex is held.  The next
mutex locker is then notified about inconsistent mutex state and can
execute (or abandon) corrective actions.

The patch mostly consists of small changes here and there, adding
neccessary checks for the inconsistent and abandoned conditions into
existing paths.  Additionally, the thread exit handler was extended to
iterate over the userspace-maintained list of owned robust mutexes,
unlocking and marking as terminated each of them.

The list of owned robust mutexes cannot be maintained atomically
synchronous with the mutex lock state (it is possible in kernel, but
is too expensive).  Instead, for the duration of lock or unlock
operation, the current mutex is remembered in a special slot that is
also checked by the kernel at thread termination.

Kernel must be aware about the per-thread location of the heads of
robust mutex lists and the current active mutex slot.  When a thread
touches a robust mutex for the first time, a new umtx op syscall is
issued which informs about location of lists heads.

The umtx sleep queues for PP and PI mutexes are split between
non-robust and robust.

Somewhat unrelated changes in the patch:
1. Style.
2. The fix for proper tdfind() call use in umtxq_sleep_pi() for shared
   pi mutexes.
3. Removal of the userspace struct pthread_mutex m_owner field.
4. The sysctl kern.ipc.umtx_vnode_persistent is added, which controls
   the lifetime of the shared mutex associated with a vnode' page.

Reviewed by:	jilles (previous version, supposedly the objection was fixed)
Discussed with:	brooks, Martin Simmons <martin@lispworks.com> (some aspects)
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
2016-05-17 09:56:22 +00:00
Pedro F. Giffuni
32223c1b7d libc: spelling fixes.
Mostly on comments.
2016-04-30 01:24:24 +00:00
Andrew Turner
4798b7f381 Disable support for compat syscalls on arm64. These symbols were never
shipped since arm64 exists only on 11+.

Submitted by:	brooks
Reviewed by:	emaste, imp
2016-04-06 16:09:10 +00:00
Mark Johnston
c5dd49afec Fix the gcc build after r295407.
X-MFC-With:	r295407
2016-02-08 22:02:56 +00:00
Konstantin Belousov
bd43f0691c If libthr.so is dlopened without RTLD_GLOBAL flag, the libthr symbols
do not participate in the global symbols namespace, but rtld locks are
still replaced and functions are interposed.  In particular,
__pthread_map_stacks_exec is resolved to the libc version.  If a
library is loaded later, which requires adjustment of the stack
protection mode, rtld calls into libc __pthread_map_stacks_exec due to
the symbols scope.  The libc version might recurse into binder and
recursively acquire rtld bind lock, causing the hang.

Make libc __pthread_map_stacks_exec() interposed, which synchronizes
rtld locks and version of the stack exec hook when libthr loaded,
regardless of the symbol scope control or symbol resolution order.

The __pthread_map_stacks_exec() symbol is removed from the private
version in libthr since libc symbol now operates correctly in presence
of libthr.

Reported and tested by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2016-02-08 19:24:13 +00:00
Konstantin Belousov
bf420ace0a Add implementations of sendmmsg(3) and recvmmsg(3) functions which
wraps sendmsg(2) and recvmsg(2) into batch send and receive operation.
The goal of this implementation is only to provide API compatibility
with Linux.

The cancellation behaviour of the functions is not quite right, but
due to relative rare use of cancellation it is considered acceptable
comparing with the complexity of the correct implementation.  If
functions are reimplemented as syscalls, the fix would come almost
trivial.  The direct use of the syscall trampolines instead of libc
wrappers for sendmsg(2) and recvmsg(2) is to avoid data loss on
cancellation.

Submitted by:	Boris Astardzhiev <boris.astardzhiev@gmail.com>
Discussed with:	jilles (cancellation behaviour)
MFC after:	1 month
2016-01-29 14:12:12 +00:00
Konstantin Belousov
bd6060a1c6 Switch libc from using _sig{procmask,action,suspend} symbols, which
are aliases for the syscall stubs and are plt-interposed, to the
libc-private aliases of internally interposed sigprocmask() etc.

Since e.g. _sigaction is not interposed by libthr, calling signal()
removes thr_sighandler() from the handler slot etc.  The result was
breaking signal semantic and rtld locking.

The added __libc_sigprocmask and other symbols are hidden, they are
not exported and cannot be called through PLT.  The setjmp/longjmp
functions for x86 were changed to use direct calls, and since
PIC_PROLOGUE only needed for functional PLT indirection on i386, it is
removed as well.

The PowerPC bug of calling the syscall directly in the setjmp/longjmp
implementation is kept as is.

Reported by:	Pete French <petefrench@ingresso.co.uk>
Tested by:	Michiel Boland <boland37@xs4all.nl>
Reviewed by:	jilles (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-08-29 14:25:01 +00:00
John Baldwin
179fa75e6e Reassign copyright statements on several files from Advanced
Computing Technologies LLC to Hudson River Trading LLC.

Approved by:	Hudson River Trading LLC (who owns ACT LLC)
MFC after:	1 week
2015-04-23 14:22:20 +00:00
Konstantin Belousov
0538aafc41 The lseek(2), mmap(2), truncate(2), ftruncate(2), pread(2), and
pwrite(2) syscalls are wrapped to provide compatibility with pre-7.x
kernels which required padding before the off_t parameter.  The
fcntl(2) contains compatibility code to handle kernels before the
struct flock was changed during the 8.x CURRENT development.  The
shims were reasonable to allow easier revert to the older kernel at
that time.

Now, two or three major releases later, shims do not serve any
purpose.  Such old kernels cannot handle current libc, so revert the
compatibility code.

Make padded syscalls support conditional under the COMPAT6 config
option.  For COMPAT32, the syscalls were under COMPAT6 already.

Remove WITHOUT_SYSCALL_COMPAT build option, which only purpose was to
(partially) disable the removed shims.

Reviewed by:	jhb, imp (previous versions)
Discussed with:	peter
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-04-18 21:50:13 +00:00
Konstantin Belousov
3d0045bb2b Make wait6(2), waitid(3) and ppoll(2) cancellation points. The
waitid() function is required to be cancellable by the standard.  The
wait6() and ppoll() follow the other syscalls in their groups.

Reviewed by:	jhb, jilles (previous versions)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-04-18 21:35:41 +00:00
Konstantin Belousov
1849df3006 Correctly handle __fcntl_compat symbol for the !SYSCALL_COMPAT case.
Both .weak and .alias assembler directives only work when assembling
the file which defines the symbol.

Reported and tested by:	andrew
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-04-01 16:55:30 +00:00
Konstantin Belousov
b072e86d09 Make kevent(2) a cancellation point.
Note that to cancel blocked kevent(2) call, changelist must be empty,
since we cannot cancel a call which already made changes to the
process state.  And in reverse, call which only makes changes to the
kqueue state, without waiting for an event, is not cancellable.  This
makes a natural usage model to migrate kqueue loop to support
cancellation, where existing single kevent(2) call must be split into
two: first uncancellable update of kqueue, then cancellable wait for
events.

Note that this is ABI-incompatible change, but it is believed that
there is no cancel-safe code that relies on kevent(2) not being a
cancellation point.  Option to preserve the ABI would be to keep
kevent(2) as is, but add new call with flags to specify cancellation
behaviour, which only value seems to add complications.

Suggested and reviewed by:	jilles
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-03-29 19:14:41 +00:00
Konstantin Belousov
5aed3effa4 Restore the extern qualifier on __cleanup.
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2015-02-17 08:54:03 +00:00
Konstantin Belousov
45468c5356 Properly interpose libc spinlocks, was missed in r276630. In
particular, stdio locking was affected.

Reported and tested by:	"Matthew D. Fuller" <fullermd@over-yonder.net>
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2015-02-14 11:47:40 +00:00
Jilles Tjoelker
2205e0d1bd Add futimens and utimensat system calls.
The core kernel part is patch file utimes.2008.4.diff from
pluknet@FreeBSD.org. I updated the code for API changes, added the manual
page and added compatibility code for old kernels. There is also audit and
Capsicum support.

A new UTIME_* constant might allow setting birthtimes in future.

Differential Revision:	https://reviews.freebsd.org/D1426
Submitted by:	pluknet (partially)
Reviewed by:	delphij, pluknet, rwatson
Relnotes:	yes
2015-01-23 21:07:08 +00:00
Konstantin Belousov
397d851d66 Reduce the size of the interposing table and amount of
cancellation-handling code in the libthr.  Translate some syscalls
into their more generic counterpart, and remove translated syscalls
from the table.

List of the affected syscalls:
creat, open -> openat
raise -> thr_kill
sleep, usleep -> nanosleep
pause -> sigsuspend
wait, wait3, waitpid -> wait4

Suggested and reviewed by:	jilles (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-01-11 22:16:31 +00:00
Konstantin Belousov
1a744fefc2 Avoid calling internal libc function through PLT or accessing data
though GOT, by staticizing and hiding.  Add setter for
__error_selector to hide it as well.

Suggested and reviewed by:	jilles
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-01-05 01:06:54 +00:00
Konstantin Belousov
8495e8b1e9 Fix known issues which blow up the process after dlopen("libthr.so")
(or loading a dso linked to libthr.so into process which was not
linked against threading library).

- Remove libthr interposers of the libc functions, including
  __error(). Instead, functions calls are indirected through the
  interposing table, similar to how pthread stubs in libc are already
  done.  Libc by default points either to syscall trampolines or to
  existing libc implementations.  On libthr load, libthr rewrites the
  pointers to the cancellable implementations already in libthr.  The
  interposition table is separate from pthreads stubs indirection
  table to not pull pthreads stubs into static binaries.

- Postpone the malloc(3) internal mutexes initialization until libthr
  is loaded.  This avoids recursion between calloc(3) and static
  pthread_mutex_t initialization.

- Reinstall signal handlers with wrapper on libthr load.  The
  _rtld_is_dlopened(3) is used to avoid useless calls to sigaction(2)
  when libthr is statically referenced from the main binary.

In the process, fix openat(2), swapcontext(2) and setcontext(2)
interposing.  The libc symbols were exported at different versions
than libthr interposers.  Export both libc and libthr versions from
libc now, with default set to the higher version from libthr.

Remove unused and disconnected swapcontext(3) userspace implementation
from libc/gen.

No objections from:	deischen
Tested by:	pho, antoine (exp-run) (previous versions)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-01-03 18:38:46 +00:00
Ed Maste
294246bb7d Revert r274772: it is not valid on MIPS
Reported by:	sbruno
2014-11-25 03:50:31 +00:00
Ed Maste
688fd61ae8 Use canonical __PIC__ flag
It is automatically set when -fPIC is passed to the compiler.

Reviewed by:	dim, kib
Sponsored by:	The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D1179
2014-11-21 02:05:48 +00:00
Hajimu UMEMOTO
e45764721a Update our stub resolver to final version of libbind.
Obtained from:	ISC
2014-08-12 12:36:06 +00:00
Hajimu UMEMOTO
08d41c70c1 We don't use these files. 2014-08-10 15:21:26 +00:00
Pedro F. Giffuni
046c3635cd Bring final version of libbind:
From
http://www.isc.org/downloads/libbind/

The libbind functions have been separated from the BIND suite as of BIND
9.6.0. Originally from older versions of BIND, they have been continually
maintained and improved but not installed by default with BIND 9. This
standard resolver library contains the same historical functions and
headers included with many Unix operating systems. In fact, most
implementations are based on the same original code.

At present, NetBSD maintains libbind code, now known as "netresolv".
2014-08-05 23:16:31 +00:00
David Chisnall
02da4cb451 Add an extra void* cast to work around a bug in FreeBSD-gcc inherited
from Apple.
2014-04-03 08:08:36 +00:00
David Chisnall
46cdc14062 Add support for some block functions that come from OS X. These are
intended to build with any C compiler.

Reviewed by:	pfg
MFC after:	3 weeks
2014-04-02 16:07:48 +00:00
Jilles Tjoelker
e852d6bc48 libc/resolv: Use poll() instead of kqueue().
The resolver in libc creates a kqueue for watching a single file descriptor.
This can be done using poll() which should be lighter on the kernel and
reduce possible problems with rlimits (file descriptors, kqueues).

Reviewed by:	jhb
2014-01-14 22:05:33 +00:00
Pawel Jakub Dawidek
7008be5bd7 Change the cap_rights_t type from uint64_t to a structure that we can extend
in the future in a backward compatible (API and ABI) way.

The cap_rights_t represents capability rights. We used to use one bit to
represent one right, but we are running out of spare bits. Currently the new
structure provides place for 114 rights (so 50 more than the previous
cap_rights_t), but it is possible to grow the structure to hold at least 285
rights, although we can make it even larger if 285 rights won't be enough.

The structure definition looks like this:

	struct cap_rights {
		uint64_t	cr_rights[CAP_RIGHTS_VERSION + 2];
	};

The initial CAP_RIGHTS_VERSION is 0.

The top two bits in the first element of the cr_rights[] array contain total
number of elements in the array - 2. This means if those two bits are equal to
0, we have 2 array elements.

The top two bits in all remaining array elements should be 0.
The next five bits in all array elements contain array index. Only one bit is
used and bit position in this five-bits range defines array index. This means
there can be at most five array elements in the future.

To define new right the CAPRIGHT() macro must be used. The macro takes two
arguments - an array index and a bit to set, eg.

	#define	CAP_PDKILL	CAPRIGHT(1, 0x0000000000000800ULL)

We still support aliases that combine few rights, but the rights have to belong
to the same array element, eg:

	#define	CAP_LOOKUP	CAPRIGHT(0, 0x0000000000000400ULL)
	#define	CAP_FCHMOD	CAPRIGHT(0, 0x0000000000002000ULL)

	#define	CAP_FCHMODAT	(CAP_FCHMOD | CAP_LOOKUP)

There is new API to manage the new cap_rights_t structure:

	cap_rights_t *cap_rights_init(cap_rights_t *rights, ...);
	void cap_rights_set(cap_rights_t *rights, ...);
	void cap_rights_clear(cap_rights_t *rights, ...);
	bool cap_rights_is_set(const cap_rights_t *rights, ...);

	bool cap_rights_is_valid(const cap_rights_t *rights);
	void cap_rights_merge(cap_rights_t *dst, const cap_rights_t *src);
	void cap_rights_remove(cap_rights_t *dst, const cap_rights_t *src);
	bool cap_rights_contains(const cap_rights_t *big, const cap_rights_t *little);

Capability rights to the cap_rights_init(), cap_rights_set(),
cap_rights_clear() and cap_rights_is_set() functions are provided by
separating them with commas, eg:

	cap_rights_t rights;

	cap_rights_init(&rights, CAP_READ, CAP_WRITE, CAP_FSTAT);

There is no need to terminate the list of rights, as those functions are
actually macros that take care of the termination, eg:

	#define	cap_rights_set(rights, ...)				\
		__cap_rights_set((rights), __VA_ARGS__, 0ULL)
	void __cap_rights_set(cap_rights_t *rights, ...);

Thanks to using one bit as an array index we can assert in those functions that
there are no two rights belonging to different array elements provided
together. For example this is illegal and will be detected, because CAP_LOOKUP
belongs to element 0 and CAP_PDKILL to element 1:

	cap_rights_init(&rights, CAP_LOOKUP | CAP_PDKILL);

Providing several rights that belongs to the same array's element this way is
correct, but is not advised. It should only be used for aliases definition.

This commit also breaks compatibility with some existing Capsicum system calls,
but I see no other way to do that. This should be fine as Capsicum is still
experimental and this change is not going to 9.x.

Sponsored by:	The FreeBSD Foundation
2013-09-05 00:09:56 +00:00
Jilles Tjoelker
e73151eb82 libc: Always use our own copy of sys_errlist and sys_nerr (.so only).
This ensures strerror() and friends continue to work correctly even if a
(non-PIE) executable linked against an older libc imports sys_errlist (which
causes sys_errlist to refer to the executable's copy with a size fixed when
that executable was linked).

The executable's use of sys_errlist remains broken because it uses the
current value of sys_nerr and may access past the bounds of the array.

Different from the message "Using sys_errlist from executables is not
ABI-stable" on freebsd-arch, this change does not affect the static library.
There seems no reason to prevent overriding the error messages in the static
library.
2013-08-31 22:32:42 +00:00
Jilles Tjoelker
442542b86c libc: Access some unexported variables more efficiently (related to stdio). 2013-08-23 14:23:54 +00:00
Pawel Jakub Dawidek
2609222ab4 Merge Capsicum overhaul:
- Capability is no longer separate descriptor type. Now every descriptor
  has set of its own capability rights.

- The cap_new(2) system call is left, but it is no longer documented and
  should not be used in new code.

- The new syscall cap_rights_limit(2) should be used instead of
  cap_new(2), which limits capability rights of the given descriptor
  without creating a new one.

- The cap_getrights(2) syscall is renamed to cap_rights_get(2).

- If CAP_IOCTL capability right is present we can further reduce allowed
  ioctls list with the new cap_ioctls_limit(2) syscall. List of allowed
  ioctls can be retrived with cap_ioctls_get(2) syscall.

- If CAP_FCNTL capability right is present we can further reduce fcntls
  that can be used with the new cap_fcntls_limit(2) syscall and retrive
  them with cap_fcntls_get(2).

- To support ioctl and fcntl white-listing the filedesc structure was
  heavly modified.

- The audit subsystem, kdump and procstat tools were updated to
  recognize new syscalls.

- Capability rights were revised and eventhough I tried hard to provide
  backward API and ABI compatibility there are some incompatible changes
  that are described in detail below:

	CAP_CREATE old behaviour:
	- Allow for openat(2)+O_CREAT.
	- Allow for linkat(2).
	- Allow for symlinkat(2).
	CAP_CREATE new behaviour:
	- Allow for openat(2)+O_CREAT.

	Added CAP_LINKAT:
	- Allow for linkat(2). ABI: Reuses CAP_RMDIR bit.
	- Allow to be target for renameat(2).

	Added CAP_SYMLINKAT:
	- Allow for symlinkat(2).

	Removed CAP_DELETE. Old behaviour:
	- Allow for unlinkat(2) when removing non-directory object.
	- Allow to be source for renameat(2).

	Removed CAP_RMDIR. Old behaviour:
	- Allow for unlinkat(2) when removing directory.

	Added CAP_RENAMEAT:
	- Required for source directory for the renameat(2) syscall.

	Added CAP_UNLINKAT (effectively it replaces CAP_DELETE and CAP_RMDIR):
	- Allow for unlinkat(2) on any object.
	- Required if target of renameat(2) exists and will be removed by this
	  call.

	Removed CAP_MAPEXEC.

	CAP_MMAP old behaviour:
	- Allow for mmap(2) with any combination of PROT_NONE, PROT_READ and
	  PROT_WRITE.
	CAP_MMAP new behaviour:
	- Allow for mmap(2)+PROT_NONE.

	Added CAP_MMAP_R:
	- Allow for mmap(PROT_READ).
	Added CAP_MMAP_W:
	- Allow for mmap(PROT_WRITE).
	Added CAP_MMAP_X:
	- Allow for mmap(PROT_EXEC).
	Added CAP_MMAP_RW:
	- Allow for mmap(PROT_READ | PROT_WRITE).
	Added CAP_MMAP_RX:
	- Allow for mmap(PROT_READ | PROT_EXEC).
	Added CAP_MMAP_WX:
	- Allow for mmap(PROT_WRITE | PROT_EXEC).
	Added CAP_MMAP_RWX:
	- Allow for mmap(PROT_READ | PROT_WRITE | PROT_EXEC).

	Renamed CAP_MKDIR to CAP_MKDIRAT.
	Renamed CAP_MKFIFO to CAP_MKFIFOAT.
	Renamed CAP_MKNODE to CAP_MKNODEAT.

	CAP_READ old behaviour:
	- Allow pread(2).
	- Disallow read(2), readv(2) (if there is no CAP_SEEK).
	CAP_READ new behaviour:
	- Allow read(2), readv(2).
	- Disallow pread(2) (CAP_SEEK was also required).

	CAP_WRITE old behaviour:
	- Allow pwrite(2).
	- Disallow write(2), writev(2) (if there is no CAP_SEEK).
	CAP_WRITE new behaviour:
	- Allow write(2), writev(2).
	- Disallow pwrite(2) (CAP_SEEK was also required).

	Added convinient defines:

	#define	CAP_PREAD		(CAP_SEEK | CAP_READ)
	#define	CAP_PWRITE		(CAP_SEEK | CAP_WRITE)
	#define	CAP_MMAP_R		(CAP_MMAP | CAP_SEEK | CAP_READ)
	#define	CAP_MMAP_W		(CAP_MMAP | CAP_SEEK | CAP_WRITE)
	#define	CAP_MMAP_X		(CAP_MMAP | CAP_SEEK | 0x0000000000000008ULL)
	#define	CAP_MMAP_RW		(CAP_MMAP_R | CAP_MMAP_W)
	#define	CAP_MMAP_RX		(CAP_MMAP_R | CAP_MMAP_X)
	#define	CAP_MMAP_WX		(CAP_MMAP_W | CAP_MMAP_X)
	#define	CAP_MMAP_RWX		(CAP_MMAP_R | CAP_MMAP_W | CAP_MMAP_X)
	#define	CAP_RECV		CAP_READ
	#define	CAP_SEND		CAP_WRITE

	#define	CAP_SOCK_CLIENT \
		(CAP_CONNECT | CAP_GETPEERNAME | CAP_GETSOCKNAME | CAP_GETSOCKOPT | \
		 CAP_PEELOFF | CAP_RECV | CAP_SEND | CAP_SETSOCKOPT | CAP_SHUTDOWN)
	#define	CAP_SOCK_SERVER \
		(CAP_ACCEPT | CAP_BIND | CAP_GETPEERNAME | CAP_GETSOCKNAME | \
		 CAP_GETSOCKOPT | CAP_LISTEN | CAP_PEELOFF | CAP_RECV | CAP_SEND | \
		 CAP_SETSOCKOPT | CAP_SHUTDOWN)

	Added defines for backward API compatibility:

	#define	CAP_MAPEXEC		CAP_MMAP_X
	#define	CAP_DELETE		CAP_UNLINKAT
	#define	CAP_MKDIR		CAP_MKDIRAT
	#define	CAP_RMDIR		CAP_UNLINKAT
	#define	CAP_MKFIFO		CAP_MKFIFOAT
	#define	CAP_MKNOD		CAP_MKNODAT
	#define	CAP_SOCK_ALL		(CAP_SOCK_CLIENT | CAP_SOCK_SERVER)

Sponsored by:	The FreeBSD Foundation
Reviewed by:	Christoph Mallon <christoph.mallon@gmx.de>
Many aspects discussed with:	rwatson, benl, jonathan
ABI compatibility discussed with:	kib
2013-03-02 00:53:12 +00:00
Konstantin Belousov
eb3d4e1fbd Implement the waitid() SUSv4 function using wait6() system call.
PR:	standards/170346
Submitted by:	"Jukka A. Ukkonen" <jau@iki.fi>
MFC after:	1 month
2012-11-13 12:55:52 +00:00
Konstantin Belousov
869fd80fd4 Use struct vdso_timehands data to implement fast gettimeofday(2) and
clock_gettime(2) functions if supported. The speedup seen in
microbenchmarks is in range 4x-7x depending on the hardware.

Only amd64 and i386 architectures are supported. Libc uses rdtsc and
kernel data to calculate current time, if enabled by kernel.

Hopefully, this code is going to migrate into vdso in some future.

Discussed with:	bde
Reviewed by:	jhb
Tested by:	flo
MFC after:	1 month
2012-06-22 07:13:30 +00:00
Dimitry Andric
f61ac9d9bd Fix two warnings about self-assignment in libc. These normally only
trigger with clang, when you either use -save-temps, or ccache.

Reported by:	Sevan / Venture37 <venture37@gmail.com>
MFC after:	3 days
2012-06-06 21:16:26 +00:00
Konstantin Belousov
46ffdf3bbd Take the spinlock around clearing of the fp->_flags in fclose(3), which
indicates the avaliability of FILE, to prevent possible reordering of
the writes as seen by other CPUs.

Reported by:	Fengwei yin <yfw.bsd gmail com>
Reviewed by:	jhb
MFC after:	1 week
2012-04-24 17:51:36 +00:00
Konstantin Belousov
a7d61fc17d Fetch the aux vector for the static libc, and use the entries to
initialize the cache of the system information as it was done for the
dynamic libc. This removes several sysctls from the static binary
startup.

Use the aux vector to fill the single struct dl_phdr_info describing
the static binary itself, to implement dl_iterate_phdr(3) for the
static binaries. [1]

Based on the submission by:	John Marino <draco marino st> [1]
Tested by:   flo (sparc64)
MFC after:	2 weeks
2012-02-17 10:49:29 +00:00
Colin Percival
3e65b9c6e6 Fix a problem whereby a corrupt DNS record can cause named to crash. [11:06]
Add an API for alerting internal libc routines to the presence of
"unsafe" paths post-chroot, and use it in ftpd. [11:07]

Fix a buffer overflow in telnetd. [11:08]

Make pam_ssh ignore unpassphrased keys unless the "nullok" option is
specified. [11:09]

Add sanity checking of service names in pam_start. [11:10]

Approved by:    so (cperciva)
Approved by:    re (bz)
Security:       FreeBSD-SA-11:06.bind
Security:       FreeBSD-SA-11:07.chroot
Security:       FreeBSD-SA-11:08.telnetd
Security:       FreeBSD-SA-11:09.pam_ssh
Security:       FreeBSD-SA-11:10.pam
2011-12-23 15:00:37 +00:00
Jung-uk Kim
678b238c85 Introduce a non-portable function pthread_getthreadid_np(3) to retrieve
calling thread's unique integral ID, which is similar to AIX function of
the same name.  Bump __FreeBSD_version to note its introduction.

Reviewed by:	kib
2011-02-07 21:26:46 +00:00
Dimitry Andric
9f0c8034d8 Remove some unneeded spaces from the __sym_compat() macro, since newer
versions of gas are more fussy about spaces surrounding '@' signs in
versioned symbol names.
2010-11-11 21:36:52 +00:00