15491 Commits

Author SHA1 Message Date
neel
aed205d5cd Merge projects/bhyve_npt_pmap into head.
Make the amd64/pmap code aware of nested page table mappings used by bhyve
guests. This allows bhyve to associate each guest with its own vmspace and
deal with nested page faults in the context of that vmspace. This also
enables features like accessed/dirty bit tracking, swapping to disk and
transparent superpage promotions of guest memory.

Guest vmspace:
Each bhyve guest has a unique vmspace to represent the physical memory
allocated to the guest. Each memory segment allocated by the guest is
mapped into the guest's address space via the 'vmspace->vm_map' and is
backed by an object of type OBJT_DEFAULT.

pmap types:
The amd64/pmap now understands two types of pmaps: PT_X86 and PT_EPT.

The PT_X86 pmap type is used by the vmspace associated with the host kernel
as well as user processes executing on the host. The PT_EPT pmap is used by
the vmspace associated with a bhyve guest.

Page Table Entries:
The EPT page table entries as mostly similar in functionality to regular
page table entries although there are some differences in terms of what
bits are used to express that functionality. For e.g. the dirty bit is
represented by bit 9 in the nested PTE as opposed to bit 6 in the regular
x86 PTE. Therefore the bitmask representing the dirty bit is now computed
at runtime based on the type of the pmap. Thus PG_M that was previously a
macro now becomes a local variable that is initialized at runtime using
'pmap_modified_bit(pmap)'.

An additional wrinkle associated with EPT mappings is that older Intel
processors don't have hardware support for tracking accessed/dirty bits in
the PTE. This means that the amd64/pmap code needs to emulate these bits to
provide proper accounting to the VM subsystem. This is achieved by using
the following mapping for EPT entries that need emulation of A/D bits:
               Bit Position           Interpreted By
PG_V               52                 software (accessed bit emulation handler)
PG_RW              53                 software (dirty bit emulation handler)
PG_A               0                  hardware (aka EPT_PG_RD)
PG_M               1                  hardware (aka EPT_PG_WR)

The idea to use the mapping listed above for A/D bit emulation came from
Alan Cox (alc@).

The final difference with respect to x86 PTEs is that some EPT implementations
do not support superpage mappings. This is recorded in the 'pm_flags' field
of the pmap.

TLB invalidation:
The amd64/pmap code has a number of ways to do invalidation of mappings
that may be cached in the TLB: single page, multiple pages in a range or the
entire TLB. All of these funnel into a single EPT invalidation routine called
'pmap_invalidate_ept()'. This routine bumps up the EPT generation number and
sends an IPI to the host cpus that are executing the guest's vcpus. On a
subsequent entry into the guest it will detect that the EPT has changed and
invalidate the mappings from the TLB.

Guest memory access:
Since the guest memory is no longer wired we need to hold the host physical
page that backs the guest physical page before we can access it. The helper
functions 'vm_gpa_hold()/vm_gpa_release()' are available for this purpose.

PCI passthru:
Guest's with PCI passthru devices will wire the entire guest physical address
space. The MMIO BAR associated with the passthru device is backed by a
vm_object of type OBJT_SG. An IOMMU domain is created only for guest's that
have one or more PCI passthru devices attached to them.

Limitations:
There isn't a way to map a guest physical page without execute permissions.
This is because the amd64/pmap code interprets the guest physical mappings as
user mappings since they are numerically below VM_MAXUSER_ADDRESS. Since PG_U
shares the same bit position as EPT_PG_EXECUTE all guest mappings become
automatically executable.

Thanks to Alan Cox and Konstantin Belousov for their rigorous code reviews
as well as their support and encouragement.

Thanks for John Baldwin for reviewing the use of OBJT_SG as the backing
object for pci passthru mmio regions.

Special thanks to Peter Holm for testing the patch on short notice.

Approved by:	re
Discussed with:	grehan
Reviewed by:	alc, kib
Tested by:	pho
2013-10-05 21:22:35 +00:00
jilles
b6f424e548 accept(2): Update portability note for accept4().
The accept(2) man page warns that O_NONBLOCK and other properties on the
new socket may vary across implementations. However, this issue only
applies to accept() and not to accept4(). On the other hand, accept4()
is not commonly available yet.

Reported by:	pluknet
Reviewed by:	bjk
Approved by:	re (kib)
2013-10-01 21:17:18 +00:00
des
aa2e4b623c Remove BIND.
Approved by:	re (gjb)
2013-09-30 17:23:45 +00:00
delphij
de2d546a38 Temporarily disable iconv for non-shared library builds. The dynamic
loading of conversation table is not yet compatible with static builds.

Approved by:	re (gjb)
2013-09-26 17:55:36 +00:00
delphij
74e37edc35 Import NetBSD readline.c,v 1.104: do not crash with add_history(NULL).
MFC after:	3 days
Approved by:	re (gjb)
2013-09-26 17:54:58 +00:00
andrew
9439877e98 Add an elf note on ARM to store the MACHINE_ARCH an executable was built
for. This is useful for software needing to know which architecture a
binary is built for as arm and armv6 have slight differences meaning only
some binaries build for one will work as expected on the other. It is
expected pkgng will be able to make use of this to simplify the logic to
determine which package ABI to use.

Approved by:	re (kib)
2013-09-26 07:53:18 +00:00
emaste
51ba585f88 Add LLDB bmake infrastructure
This connects LLDB to the build, but it is disabled by default.  Add
WITH_LLDB= to src.conf to build it.

Note that LLDB requires a C++11 compiler so is disabled on platforms
using GCC.

Approved by:	re (gjb)
Sponsored by:	DARPA, AFRL
2013-09-20 01:52:02 +00:00
joel
bd6ef8adfa Minor mdoc improvements.
Approved by:	re (blanket)
2013-09-19 19:43:38 +00:00
jhb
d3ef75b6c7 Extend the support for exempting processes from being killed when swap is
exhausted.
- Add a new protect(1) command that can be used to set or revoke protection
  from arbitrary processes.  Similar to ktrace it can apply a change to all
  existing descendants of a process as well as future descendants.
- Add a new procctl(2) system call that provides a generic interface for
  control operations on processes (as opposed to the debugger-specific
  operations provided by ptrace(2)).  procctl(2) uses a combination of
  idtype_t and an id to identify the set of processes on which to operate
  similar to wait6().
- Add a PROC_SPROTECT control operation to manage the protection status
  of a set of processes.  MADV_PROTECT still works for backwards
  compatability.
- Add a p_flag2 to struct proc (and a corresponding ki_flag2 to kinfo_proc)
  the first bit of which is used to track if P_PROTECT should be inherited
  by new child processes.

Reviewed by:	kib, jilles (earlier version)
Approved by:	re (delphij)
MFC after:	1 month
2013-09-19 18:53:42 +00:00
tuexen
0524de64dc Remove an unused variable and fix a memory leak in sctp_connectx().
Approved by:	re (gjb)
MFC after:	3 days
2013-09-19 06:19:24 +00:00
des
3d9cc85dd7 Move libldns to the correct (ordered) library list.
Approved by:	re (blanket)
2013-09-15 15:55:21 +00:00
des
ea05e625ec Build and install the Unbound caching DNS resolver daemon.
Approved by:	re (blanket)
2013-09-15 14:51:23 +00:00
dim
2bafcef1c8 After r255294, building lib/msun's symbol map (using clang as the
preprocessor) gives the following error:

--- Version.map ---
<stdin>:287:4: error: invalid preprocessing directive
        # Implemented as weak aliases for imprecise versions
          ^
1 error generated.

Change the comment to a C-style one, to prevent this error.

Approved by:	re (hrs)
2013-09-12 20:51:48 +00:00
bdrewery
b3237a11f6 Consistently reference file descriptors as "fd". 55 other manpages
used "fd", while these used "d" and "filedes".

MFC after:	1 week
Approved by:	gjb
Approved by:	re (delphij)
2013-09-12 00:53:38 +00:00
jhb
04bb6e10cd Add a mmap flag (MAP_32BIT) on 64-bit platforms to request that a mapping use
an address in the first 2GB of the process's address space.  This flag should
have the same semantics as the same flag on Linux.

To facilitate this, add a new parameter to vm_map_find() that specifies an
optional maximum virtual address.  While here, fix several callers of
vm_map_find() to use a VMFS_* constant for the findspace argument instead of
TRUE and FALSE.

Reviewed by:	alc
Approved by:	re (kib)
2013-09-09 18:11:59 +00:00
des
2b94dc11fa LDNS needs OpenSSL. This wasn't a problem as long as it was only build
statically, since any program using it would have to link with it anyway.

Approved by:	re (blanket)
2013-09-08 19:39:18 +00:00
des
aba57138f9 Make libldns and libssh private.
Approved by:	re (blanket)
2013-09-08 10:04:26 +00:00
des
6a7561b73b Update to OpenPAM Nummularia. 2013-09-07 19:43:39 +00:00
des
e50a38ba7d MFV (r255364): move the code around in preparation for Nummularia. 2013-09-07 18:46:35 +00:00
des
338d7c2adb Vendor import of OpenPAM Nummularia.. 2013-09-07 16:15:30 +00:00
des
e86dd36ab2 Prepare for OpenPAM Nummularia by reorganizing to match its new directory
structure.
2013-09-07 16:10:15 +00:00
andrew
59c30969f9 On ARM EABI double precision floating point values are stored in the
endian the CPU is in, i.e. little-endian on most ARM cores.

This allows ARMv4 and ARMv5 boards to boot with the ARM EABI.
2013-09-07 14:04:10 +00:00
jilles
eb5a66191b wait(2): Add some possible caveats to standards section. 2013-09-07 11:41:52 +00:00
jilles
979e7776c1 libc: Make resolver sockets close-on-exec (SOCK_CLOEXEC).
Although the resolver's sockets are exposed to applications via res_state,
I do not expect them to pass the sockets across execve().
2013-09-06 23:49:54 +00:00
jilles
a0c0abfff1 libc: Use SOCK_CLOEXEC for various internal file descriptors.
This change avoids undesirably passing some internal file descriptors to a
process created (fork+exec) by another thread.

Kernel support for SOCK_CLOEXEC was added in r248534, March 19, 2013.
2013-09-06 21:02:06 +00:00
jilles
68907dc598 libc/stdio: Allow fopen/freopen modes in any order (except initial r/w/a).
Austin Group issue #411 requires 'e' to be accepted before and after 'x',
and encourages accepting the characters in any order, except the initial
'r', 'w' or 'a'.

Given that glibc accepts the characters after r/w/a in any order and that
diagnosing this problem may be hard, change our libc to behave that way as
well.
2013-09-06 13:47:16 +00:00
theraven
63750491ac Use Makefile.inc instead of .export. 2013-09-06 10:40:38 +00:00
theraven
c04dfb0b19 Fix the namespace pollution caused by iconv.h including stdbool.h
This broke any C89 ports that defined bool themselves, including things
like gcc, gtk, and so on.
2013-09-06 09:46:44 +00:00
jilles
178dd060a8 Update some signal man pages for multithreading. 2013-09-06 09:08:40 +00:00
theraven
c8fcb04ad9 Add stub implementations of the missing C++11 math functions.
These are weak and so can be replaced by other versions in applications
that choose to do so, and will give a linker warning when used so that
applications that rely on the extra precision can avoid them.

Note that since the C/C++ specs only guarantee that long double has
precision equal to double, code that actually relies on these functions
having greater precision is unportable at best and broken at worst.
2013-09-06 07:58:23 +00:00
hselasky
d2f07e2fda Correct two comments. 2013-09-05 12:21:11 +00:00
pjd
029a6f5d92 Change the cap_rights_t type from uint64_t to a structure that we can extend
in the future in a backward compatible (API and ABI) way.

The cap_rights_t represents capability rights. We used to use one bit to
represent one right, but we are running out of spare bits. Currently the new
structure provides place for 114 rights (so 50 more than the previous
cap_rights_t), but it is possible to grow the structure to hold at least 285
rights, although we can make it even larger if 285 rights won't be enough.

The structure definition looks like this:

	struct cap_rights {
		uint64_t	cr_rights[CAP_RIGHTS_VERSION + 2];
	};

The initial CAP_RIGHTS_VERSION is 0.

The top two bits in the first element of the cr_rights[] array contain total
number of elements in the array - 2. This means if those two bits are equal to
0, we have 2 array elements.

The top two bits in all remaining array elements should be 0.
The next five bits in all array elements contain array index. Only one bit is
used and bit position in this five-bits range defines array index. This means
there can be at most five array elements in the future.

To define new right the CAPRIGHT() macro must be used. The macro takes two
arguments - an array index and a bit to set, eg.

	#define	CAP_PDKILL	CAPRIGHT(1, 0x0000000000000800ULL)

We still support aliases that combine few rights, but the rights have to belong
to the same array element, eg:

	#define	CAP_LOOKUP	CAPRIGHT(0, 0x0000000000000400ULL)
	#define	CAP_FCHMOD	CAPRIGHT(0, 0x0000000000002000ULL)

	#define	CAP_FCHMODAT	(CAP_FCHMOD | CAP_LOOKUP)

There is new API to manage the new cap_rights_t structure:

	cap_rights_t *cap_rights_init(cap_rights_t *rights, ...);
	void cap_rights_set(cap_rights_t *rights, ...);
	void cap_rights_clear(cap_rights_t *rights, ...);
	bool cap_rights_is_set(const cap_rights_t *rights, ...);

	bool cap_rights_is_valid(const cap_rights_t *rights);
	void cap_rights_merge(cap_rights_t *dst, const cap_rights_t *src);
	void cap_rights_remove(cap_rights_t *dst, const cap_rights_t *src);
	bool cap_rights_contains(const cap_rights_t *big, const cap_rights_t *little);

Capability rights to the cap_rights_init(), cap_rights_set(),
cap_rights_clear() and cap_rights_is_set() functions are provided by
separating them with commas, eg:

	cap_rights_t rights;

	cap_rights_init(&rights, CAP_READ, CAP_WRITE, CAP_FSTAT);

There is no need to terminate the list of rights, as those functions are
actually macros that take care of the termination, eg:

	#define	cap_rights_set(rights, ...)				\
		__cap_rights_set((rights), __VA_ARGS__, 0ULL)
	void __cap_rights_set(cap_rights_t *rights, ...);

Thanks to using one bit as an array index we can assert in those functions that
there are no two rights belonging to different array elements provided
together. For example this is illegal and will be detected, because CAP_LOOKUP
belongs to element 0 and CAP_PDKILL to element 1:

	cap_rights_init(&rights, CAP_LOOKUP | CAP_PDKILL);

Providing several rights that belongs to the same array's element this way is
correct, but is not advised. It should only be used for aliases definition.

This commit also breaks compatibility with some existing Capsicum system calls,
but I see no other way to do that. This should be fine as Capsicum is still
experimental and this change is not going to 9.x.

Sponsored by:	The FreeBSD Foundation
2013-09-05 00:09:56 +00:00
theraven
8b9f5e4153 Add a c++/v1/tr1 include directory containing symlinks to all of the standard
headrs.

Lots of third-party code expects to find C++03 headers under tr1 because that's
where GNU decided to hide them.  This should fix ports that expect them there.

MFC after:	1 week
2013-09-04 15:02:14 +00:00
emaste
4f53813f88 Connect libexecinfo to the build
Sponsored by:	DARPA, AFRL
2013-09-03 15:22:04 +00:00
emaste
4bfbadb2b3 Don't install private libexecinfo headers 2013-09-03 13:31:43 +00:00
rwatson
e6c5cc6ac3 Document SIGLIBRT in signal(3); take a stab at the signal description as
the original committer didn't provide one.

MFC after:	3 days
2013-09-03 08:19:06 +00:00
emaste
d5b6a7dd6b libexecinfo compatibility with devel/libexecinfo port
1. Match shlib number
2. Add libelf dependency

Suggested by: bapt[1]
2013-09-02 12:37:33 +00:00
jilles
73eea4eee6 system(): Restore behaviour for SIGINT and SIGQUIT.
As mentioned in r16117 and the book "Advanced Programming in the Unix
Environment" by W. Richard Stevens, we should ignore SIGINT and SIGQUIT
before forking, since it is not guaranteed that the parent process starts
running soon enough.

To avoid calling sigaction() in the vforked child, instead block SIGINT and
SIGQUIT before vfork() and keep the sigaction() to ignore after vfork(). The
FreeBSD kernel discards ignored signals, even if they are blocked;
therefore, it is not necessary to unblock SIGINT and SIGQUIT earlier.
2013-09-01 19:59:54 +00:00
jilles
ee4b8e07a8 libc: Always use our own copy of sys_errlist and sys_nerr (.so only).
This ensures strerror() and friends continue to work correctly even if a
(non-PIE) executable linked against an older libc imports sys_errlist (which
causes sys_errlist to refer to the executable's copy with a size fixed when
that executable was linked).

The executable's use of sys_errlist remains broken because it uses the
current value of sys_nerr and may access past the bounds of the array.

Different from the message "Using sys_errlist from executables is not
ABI-stable" on freebsd-arch, this change does not affect the static library.
There seems no reason to prevent overriding the error messages in the static
library.
2013-08-31 22:32:42 +00:00
andrew
8190b13763 Add support to the ARM platform specific section types. 2013-08-31 18:13:20 +00:00
theraven
3b54dfb62d Unconditionally compile the __sync_* atomics support functions into compiler-rt
for ARM.
This is quite ugly, because it has to work around a clang bug that does not
allow built-in functions to be defined, even when they're ones that are
expected to be built as part of a library.

Reviewed by:	ed
2013-08-31 08:50:45 +00:00
pluknet
0030cdac07 The round of expand_number() cleanups.
o Fix range error checking to detect overflow when uint64_t < uintmax_t.
o Remove a non-functional check for no valid digits as pointed out by Bruce.
o Remove a rather pointless comment describing what the function does.
o Clean up a bunch of style bugs.

Brucified by:	bde
2013-08-30 11:21:52 +00:00
jilles
baaacfdc28 libutil: Use O_CLOEXEC for internal file descriptors from open(). 2013-08-28 21:10:37 +00:00
rwatson
1dbe7f4b10 Xref capsicum(4) and procdesc(4) from pdfork(2).
Suggested by:	sbruno
MFC after:	3 days
2013-08-28 20:00:25 +00:00
kargl
0c40bd77af * Whitespace. 2013-08-28 16:59:55 +00:00
jilles
d4eb686387 wordexp(): Avoid leaking the pipe file descriptors to a parallel fork/exec.
This uses the new pipe2() system call added on May 1 (r250159).
2013-08-27 21:47:01 +00:00
kargl
71c97bf245 * s_erf.c:
. Use integer literal constants instead of double literal constants.

* s_erff.c:
  . Use integer literal constants instead of casting double literal
    constants to float.
  . Update the threshold values from those carried over from erf() to
    values appropriate for float.
  . New sets of polynomial coefficients for the rational approximations.
    These coefficients have little, but positive, effect on the maximum
    error in ULP in the four intervals, but do improve the overall
    speed of execution.
  . Remove redundant GET_FLOAT_WORD(ix,x) as hx already contained the
    contents that is packed into ix.
  . Update the mask that is used to zero-out lower-order bits in x in
    the intervals [1.25, 2.857143] and [2.857143, 12].  In tests on
    amd64, this change improves the maximum error in ULP from 6.27739
    and 63.8095 to 3.16774 and 2.92095 on these intervals for erffc().

Reviewed by:	bde
2013-08-27 19:46:56 +00:00
will
7c6cb741cf Make the PAM password strength checking module WARNS=2 safe.
lib/libpam/modules/pam_passwdqc/Makefile:
	Bump WARNS to 2.

contrib/pam_modules/pam_passwdqc/pam_passwdqc.c:
	Bump  _XOPEN_SOURCE and _XOPEN_VERSION from 500 to 600
	so that vsnprint() is declared.

	Use the two new union types (pam_conv_item_t and
	pam_text_item_t) to resolve strict aliasing violations
	caused by casts to comply with the pam_get_item() API taking
	a "const void **" for all item types.  Warnings are
	generated for casts that create "type puns" (pointers of
	conflicting sized types that are set to access the same
	memory location) since these pointers may be used in ways
	that violate C's strict aliasing rules.  Casts to a new
	type must be performed through a union in order to be
	compliant, and access must be performed through only one
	of the union's data types during the lifetime of the union
	instance.  Handle strict-aliasing warnings through pointer
	assignments, which drastically simplifies this change.

	Correct a CLANG "printf-like function with more arguments
	than format" error.

Submitted by:	gibbs
Sponsored by:	Spectra Logic
2013-08-27 15:50:26 +00:00
emaste
01af225f38 Add libexecinfo Makefile
Sponsored by:	DARPA, AFRL
2013-08-23 14:31:05 +00:00
jilles
ea95259d98 libc: Access some unexported variables more efficiently (related to stdio). 2013-08-23 14:23:54 +00:00