Commit Graph

15403 Commits

Author SHA1 Message Date
neel
75369cb181 Add a new capability, VM_CAP_ENABLE_INVPCID, that can be enabled to expose
'invpcid' instruction to the guest. Currently bhyve will try to enable this
capability unconditionally if it is available.

Consolidate code in bhyve to set the capabilities so it is no longer
duplicated in BSP and AP bringup.

Add a sysctl 'vm.pmap.invpcid_works' to display whether the 'invpcid'
instruction is available.

Reviewed by:	grehan
MFC after:	3 days
2013-10-16 18:20:27 +00:00
delphij
7a806a64ea Make it possible to seek within a gzip stream. 2013-10-16 17:16:40 +00:00
glebius
a87549c49c Revert r256514 for libkvm. It wasn't correct actually and breaks build. 2013-10-15 13:53:35 +00:00
glebius
ce5230d593 Make getutxent(3) more robust against bad utx.log files. Whenever we read
zeroes, don't stop processing the file, but read until its end or valid
data.

In collaboration with:	ed
2013-10-15 13:32:01 +00:00
glebius
53ef73d870 - While we are spreading the counter(9) across network stack, more userland
tools would need to know about the counter_u64_t type. Allow to include
sys/counter.h from userspace.
- Utilize now defined type in kvm_counter_u64_fetch().

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2013-10-15 10:05:37 +00:00
bdrewery
ae7fa1acd1 Rename libbsdyml to libyaml, make private, and bump
SHLIB_MAJOR to 1.0

Suggested by:	des
Approved by:	bapt
MFC after:	1 week
2013-10-14 18:31:15 +00:00
rpaulo
650bab0fa9 Remove most of the ATF tools and the _atf user.
This is necessary because ATF is deprecated and it will be replaced by Kyua.

Submitted by:	jmmv@netbsd.org
Reviewed by:	Garrett Cooper
Approved by:	re
2013-10-12 06:06:53 +00:00
dim
bde695ba4c Bump OS versions in the toolchain triples to 11.0, and bump the
__FreeBSD_cc_version predefined macros in clang and gcc.

Approved by:	re (gjb)
2013-10-10 20:47:11 +00:00
kan
0f43811dc1 Unbreak zfsloader with LOADER_TFTP_SUPPORT on
Only accept 'net' and 'pxe' devices as underlying transport
in tftp.c on x86. Prior to this change tftp code would attempt
to send packets over any boot device, including zfs one with
predictably sad results.

Approved by: re (gjb)
MFC After: 1 month
2013-10-09 21:33:19 +00:00
pjd
4ab5163697 Handle the cases where NULL is passed as cap_rightsp to the
filestat_new_entry() function.

Reported by:	Alex Kozlov <spam@rm-rf.kiev.ua>
Approved by:	re (gjb)
2013-10-09 20:58:50 +00:00
neel
f9f9a7e617 Parse the memory size parameter using expand_number() to allow specifying
the memory size more intuitively (e.g. 512M, 4G etc).

Submitted by:	rodrigc
Reviewed by:	grehan
Approved by:	re (blanket)
2013-10-09 03:56:07 +00:00
jmg
ec9fa283ad don't assert on bad args, instead return an error..
Since so many programs don't check return value, always NUL terminate
the buf...

fix rounding when using base 1024 (the bug that started it all)...

add a set of test cases so we can make sure that things don't break
in the future...

Thanks to Clifton Royston for testing and the test program...

Approved by:	re (hrs, glebius)
MFC after:	1 week
2013-10-07 22:22:57 +00:00
neel
aed205d5cd Merge projects/bhyve_npt_pmap into head.
Make the amd64/pmap code aware of nested page table mappings used by bhyve
guests. This allows bhyve to associate each guest with its own vmspace and
deal with nested page faults in the context of that vmspace. This also
enables features like accessed/dirty bit tracking, swapping to disk and
transparent superpage promotions of guest memory.

Guest vmspace:
Each bhyve guest has a unique vmspace to represent the physical memory
allocated to the guest. Each memory segment allocated by the guest is
mapped into the guest's address space via the 'vmspace->vm_map' and is
backed by an object of type OBJT_DEFAULT.

pmap types:
The amd64/pmap now understands two types of pmaps: PT_X86 and PT_EPT.

The PT_X86 pmap type is used by the vmspace associated with the host kernel
as well as user processes executing on the host. The PT_EPT pmap is used by
the vmspace associated with a bhyve guest.

Page Table Entries:
The EPT page table entries as mostly similar in functionality to regular
page table entries although there are some differences in terms of what
bits are used to express that functionality. For e.g. the dirty bit is
represented by bit 9 in the nested PTE as opposed to bit 6 in the regular
x86 PTE. Therefore the bitmask representing the dirty bit is now computed
at runtime based on the type of the pmap. Thus PG_M that was previously a
macro now becomes a local variable that is initialized at runtime using
'pmap_modified_bit(pmap)'.

An additional wrinkle associated with EPT mappings is that older Intel
processors don't have hardware support for tracking accessed/dirty bits in
the PTE. This means that the amd64/pmap code needs to emulate these bits to
provide proper accounting to the VM subsystem. This is achieved by using
the following mapping for EPT entries that need emulation of A/D bits:
               Bit Position           Interpreted By
PG_V               52                 software (accessed bit emulation handler)
PG_RW              53                 software (dirty bit emulation handler)
PG_A               0                  hardware (aka EPT_PG_RD)
PG_M               1                  hardware (aka EPT_PG_WR)

The idea to use the mapping listed above for A/D bit emulation came from
Alan Cox (alc@).

The final difference with respect to x86 PTEs is that some EPT implementations
do not support superpage mappings. This is recorded in the 'pm_flags' field
of the pmap.

TLB invalidation:
The amd64/pmap code has a number of ways to do invalidation of mappings
that may be cached in the TLB: single page, multiple pages in a range or the
entire TLB. All of these funnel into a single EPT invalidation routine called
'pmap_invalidate_ept()'. This routine bumps up the EPT generation number and
sends an IPI to the host cpus that are executing the guest's vcpus. On a
subsequent entry into the guest it will detect that the EPT has changed and
invalidate the mappings from the TLB.

Guest memory access:
Since the guest memory is no longer wired we need to hold the host physical
page that backs the guest physical page before we can access it. The helper
functions 'vm_gpa_hold()/vm_gpa_release()' are available for this purpose.

PCI passthru:
Guest's with PCI passthru devices will wire the entire guest physical address
space. The MMIO BAR associated with the passthru device is backed by a
vm_object of type OBJT_SG. An IOMMU domain is created only for guest's that
have one or more PCI passthru devices attached to them.

Limitations:
There isn't a way to map a guest physical page without execute permissions.
This is because the amd64/pmap code interprets the guest physical mappings as
user mappings since they are numerically below VM_MAXUSER_ADDRESS. Since PG_U
shares the same bit position as EPT_PG_EXECUTE all guest mappings become
automatically executable.

Thanks to Alan Cox and Konstantin Belousov for their rigorous code reviews
as well as their support and encouragement.

Thanks for John Baldwin for reviewing the use of OBJT_SG as the backing
object for pci passthru mmio regions.

Special thanks to Peter Holm for testing the patch on short notice.

Approved by:	re
Discussed with:	grehan
Reviewed by:	alc, kib
Tested by:	pho
2013-10-05 21:22:35 +00:00
jilles
b6f424e548 accept(2): Update portability note for accept4().
The accept(2) man page warns that O_NONBLOCK and other properties on the
new socket may vary across implementations. However, this issue only
applies to accept() and not to accept4(). On the other hand, accept4()
is not commonly available yet.

Reported by:	pluknet
Reviewed by:	bjk
Approved by:	re (kib)
2013-10-01 21:17:18 +00:00
des
aa2e4b623c Remove BIND.
Approved by:	re (gjb)
2013-09-30 17:23:45 +00:00
delphij
de2d546a38 Temporarily disable iconv for non-shared library builds. The dynamic
loading of conversation table is not yet compatible with static builds.

Approved by:	re (gjb)
2013-09-26 17:55:36 +00:00
delphij
74e37edc35 Import NetBSD readline.c,v 1.104: do not crash with add_history(NULL).
MFC after:	3 days
Approved by:	re (gjb)
2013-09-26 17:54:58 +00:00
andrew
9439877e98 Add an elf note on ARM to store the MACHINE_ARCH an executable was built
for. This is useful for software needing to know which architecture a
binary is built for as arm and armv6 have slight differences meaning only
some binaries build for one will work as expected on the other. It is
expected pkgng will be able to make use of this to simplify the logic to
determine which package ABI to use.

Approved by:	re (kib)
2013-09-26 07:53:18 +00:00
emaste
51ba585f88 Add LLDB bmake infrastructure
This connects LLDB to the build, but it is disabled by default.  Add
WITH_LLDB= to src.conf to build it.

Note that LLDB requires a C++11 compiler so is disabled on platforms
using GCC.

Approved by:	re (gjb)
Sponsored by:	DARPA, AFRL
2013-09-20 01:52:02 +00:00
joel
bd6ef8adfa Minor mdoc improvements.
Approved by:	re (blanket)
2013-09-19 19:43:38 +00:00
jhb
d3ef75b6c7 Extend the support for exempting processes from being killed when swap is
exhausted.
- Add a new protect(1) command that can be used to set or revoke protection
  from arbitrary processes.  Similar to ktrace it can apply a change to all
  existing descendants of a process as well as future descendants.
- Add a new procctl(2) system call that provides a generic interface for
  control operations on processes (as opposed to the debugger-specific
  operations provided by ptrace(2)).  procctl(2) uses a combination of
  idtype_t and an id to identify the set of processes on which to operate
  similar to wait6().
- Add a PROC_SPROTECT control operation to manage the protection status
  of a set of processes.  MADV_PROTECT still works for backwards
  compatability.
- Add a p_flag2 to struct proc (and a corresponding ki_flag2 to kinfo_proc)
  the first bit of which is used to track if P_PROTECT should be inherited
  by new child processes.

Reviewed by:	kib, jilles (earlier version)
Approved by:	re (delphij)
MFC after:	1 month
2013-09-19 18:53:42 +00:00
tuexen
0524de64dc Remove an unused variable and fix a memory leak in sctp_connectx().
Approved by:	re (gjb)
MFC after:	3 days
2013-09-19 06:19:24 +00:00
des
3d9cc85dd7 Move libldns to the correct (ordered) library list.
Approved by:	re (blanket)
2013-09-15 15:55:21 +00:00
des
ea05e625ec Build and install the Unbound caching DNS resolver daemon.
Approved by:	re (blanket)
2013-09-15 14:51:23 +00:00
dim
2bafcef1c8 After r255294, building lib/msun's symbol map (using clang as the
preprocessor) gives the following error:

--- Version.map ---
<stdin>:287:4: error: invalid preprocessing directive
        # Implemented as weak aliases for imprecise versions
          ^
1 error generated.

Change the comment to a C-style one, to prevent this error.

Approved by:	re (hrs)
2013-09-12 20:51:48 +00:00
bdrewery
b3237a11f6 Consistently reference file descriptors as "fd". 55 other manpages
used "fd", while these used "d" and "filedes".

MFC after:	1 week
Approved by:	gjb
Approved by:	re (delphij)
2013-09-12 00:53:38 +00:00
jhb
04bb6e10cd Add a mmap flag (MAP_32BIT) on 64-bit platforms to request that a mapping use
an address in the first 2GB of the process's address space.  This flag should
have the same semantics as the same flag on Linux.

To facilitate this, add a new parameter to vm_map_find() that specifies an
optional maximum virtual address.  While here, fix several callers of
vm_map_find() to use a VMFS_* constant for the findspace argument instead of
TRUE and FALSE.

Reviewed by:	alc
Approved by:	re (kib)
2013-09-09 18:11:59 +00:00
des
2b94dc11fa LDNS needs OpenSSL. This wasn't a problem as long as it was only build
statically, since any program using it would have to link with it anyway.

Approved by:	re (blanket)
2013-09-08 19:39:18 +00:00
des
aba57138f9 Make libldns and libssh private.
Approved by:	re (blanket)
2013-09-08 10:04:26 +00:00
des
6a7561b73b Update to OpenPAM Nummularia. 2013-09-07 19:43:39 +00:00
des
e50a38ba7d MFV (r255364): move the code around in preparation for Nummularia. 2013-09-07 18:46:35 +00:00
des
338d7c2adb Vendor import of OpenPAM Nummularia.. 2013-09-07 16:15:30 +00:00
des
e86dd36ab2 Prepare for OpenPAM Nummularia by reorganizing to match its new directory
structure.
2013-09-07 16:10:15 +00:00
andrew
59c30969f9 On ARM EABI double precision floating point values are stored in the
endian the CPU is in, i.e. little-endian on most ARM cores.

This allows ARMv4 and ARMv5 boards to boot with the ARM EABI.
2013-09-07 14:04:10 +00:00
jilles
eb5a66191b wait(2): Add some possible caveats to standards section. 2013-09-07 11:41:52 +00:00
jilles
979e7776c1 libc: Make resolver sockets close-on-exec (SOCK_CLOEXEC).
Although the resolver's sockets are exposed to applications via res_state,
I do not expect them to pass the sockets across execve().
2013-09-06 23:49:54 +00:00
jilles
a0c0abfff1 libc: Use SOCK_CLOEXEC for various internal file descriptors.
This change avoids undesirably passing some internal file descriptors to a
process created (fork+exec) by another thread.

Kernel support for SOCK_CLOEXEC was added in r248534, March 19, 2013.
2013-09-06 21:02:06 +00:00
jilles
68907dc598 libc/stdio: Allow fopen/freopen modes in any order (except initial r/w/a).
Austin Group issue  requires 'e' to be accepted before and after 'x',
and encourages accepting the characters in any order, except the initial
'r', 'w' or 'a'.

Given that glibc accepts the characters after r/w/a in any order and that
diagnosing this problem may be hard, change our libc to behave that way as
well.
2013-09-06 13:47:16 +00:00
theraven
63750491ac Use Makefile.inc instead of .export. 2013-09-06 10:40:38 +00:00
theraven
c04dfb0b19 Fix the namespace pollution caused by iconv.h including stdbool.h
This broke any C89 ports that defined bool themselves, including things
like gcc, gtk, and so on.
2013-09-06 09:46:44 +00:00
jilles
178dd060a8 Update some signal man pages for multithreading. 2013-09-06 09:08:40 +00:00
theraven
c8fcb04ad9 Add stub implementations of the missing C++11 math functions.
These are weak and so can be replaced by other versions in applications
that choose to do so, and will give a linker warning when used so that
applications that rely on the extra precision can avoid them.

Note that since the C/C++ specs only guarantee that long double has
precision equal to double, code that actually relies on these functions
having greater precision is unportable at best and broken at worst.
2013-09-06 07:58:23 +00:00
hselasky
d2f07e2fda Correct two comments. 2013-09-05 12:21:11 +00:00
pjd
029a6f5d92 Change the cap_rights_t type from uint64_t to a structure that we can extend
in the future in a backward compatible (API and ABI) way.

The cap_rights_t represents capability rights. We used to use one bit to
represent one right, but we are running out of spare bits. Currently the new
structure provides place for 114 rights (so 50 more than the previous
cap_rights_t), but it is possible to grow the structure to hold at least 285
rights, although we can make it even larger if 285 rights won't be enough.

The structure definition looks like this:

	struct cap_rights {
		uint64_t	cr_rights[CAP_RIGHTS_VERSION + 2];
	};

The initial CAP_RIGHTS_VERSION is 0.

The top two bits in the first element of the cr_rights[] array contain total
number of elements in the array - 2. This means if those two bits are equal to
0, we have 2 array elements.

The top two bits in all remaining array elements should be 0.
The next five bits in all array elements contain array index. Only one bit is
used and bit position in this five-bits range defines array index. This means
there can be at most five array elements in the future.

To define new right the CAPRIGHT() macro must be used. The macro takes two
arguments - an array index and a bit to set, eg.

	#define	CAP_PDKILL	CAPRIGHT(1, 0x0000000000000800ULL)

We still support aliases that combine few rights, but the rights have to belong
to the same array element, eg:

	#define	CAP_LOOKUP	CAPRIGHT(0, 0x0000000000000400ULL)
	#define	CAP_FCHMOD	CAPRIGHT(0, 0x0000000000002000ULL)

	#define	CAP_FCHMODAT	(CAP_FCHMOD | CAP_LOOKUP)

There is new API to manage the new cap_rights_t structure:

	cap_rights_t *cap_rights_init(cap_rights_t *rights, ...);
	void cap_rights_set(cap_rights_t *rights, ...);
	void cap_rights_clear(cap_rights_t *rights, ...);
	bool cap_rights_is_set(const cap_rights_t *rights, ...);

	bool cap_rights_is_valid(const cap_rights_t *rights);
	void cap_rights_merge(cap_rights_t *dst, const cap_rights_t *src);
	void cap_rights_remove(cap_rights_t *dst, const cap_rights_t *src);
	bool cap_rights_contains(const cap_rights_t *big, const cap_rights_t *little);

Capability rights to the cap_rights_init(), cap_rights_set(),
cap_rights_clear() and cap_rights_is_set() functions are provided by
separating them with commas, eg:

	cap_rights_t rights;

	cap_rights_init(&rights, CAP_READ, CAP_WRITE, CAP_FSTAT);

There is no need to terminate the list of rights, as those functions are
actually macros that take care of the termination, eg:

	#define	cap_rights_set(rights, ...)				\
		__cap_rights_set((rights), __VA_ARGS__, 0ULL)
	void __cap_rights_set(cap_rights_t *rights, ...);

Thanks to using one bit as an array index we can assert in those functions that
there are no two rights belonging to different array elements provided
together. For example this is illegal and will be detected, because CAP_LOOKUP
belongs to element 0 and CAP_PDKILL to element 1:

	cap_rights_init(&rights, CAP_LOOKUP | CAP_PDKILL);

Providing several rights that belongs to the same array's element this way is
correct, but is not advised. It should only be used for aliases definition.

This commit also breaks compatibility with some existing Capsicum system calls,
but I see no other way to do that. This should be fine as Capsicum is still
experimental and this change is not going to 9.x.

Sponsored by:	The FreeBSD Foundation
2013-09-05 00:09:56 +00:00
theraven
8b9f5e4153 Add a c++/v1/tr1 include directory containing symlinks to all of the standard
headrs.

Lots of third-party code expects to find C++03 headers under tr1 because that's
where GNU decided to hide them.  This should fix ports that expect them there.

MFC after:	1 week
2013-09-04 15:02:14 +00:00
emaste
4f53813f88 Connect libexecinfo to the build
Sponsored by:	DARPA, AFRL
2013-09-03 15:22:04 +00:00
emaste
4bfbadb2b3 Don't install private libexecinfo headers 2013-09-03 13:31:43 +00:00
rwatson
e6c5cc6ac3 Document SIGLIBRT in signal(3); take a stab at the signal description as
the original committer didn't provide one.

MFC after:	3 days
2013-09-03 08:19:06 +00:00
emaste
d5b6a7dd6b libexecinfo compatibility with devel/libexecinfo port
1. Match shlib number
2. Add libelf dependency

Suggested by: bapt[1]
2013-09-02 12:37:33 +00:00
jilles
73eea4eee6 system(): Restore behaviour for SIGINT and SIGQUIT.
As mentioned in r16117 and the book "Advanced Programming in the Unix
Environment" by W. Richard Stevens, we should ignore SIGINT and SIGQUIT
before forking, since it is not guaranteed that the parent process starts
running soon enough.

To avoid calling sigaction() in the vforked child, instead block SIGINT and
SIGQUIT before vfork() and keep the sigaction() to ignore after vfork(). The
FreeBSD kernel discards ignored signals, even if they are blocked;
therefore, it is not necessary to unblock SIGINT and SIGQUIT earlier.
2013-09-01 19:59:54 +00:00