15368 Commits

Author SHA1 Message Date
Konstantin Belousov
85a0ddfd0b Add a resource limit for the total number of kqueues available to the
user.  Kqueue now saves the ucred of the allocating thread, to
correctly decrement the counter on close.

Under some specific and not real-world use scenario for kqueue, it is
possible for the kqueues to consume memory proportional to the square
of the number of the filedescriptors available to the process.  Limit
allows administrator to prevent the abuse.

This is kernel-mode side of the change, with the user-mode enabling
commit following.

Reported and tested by:	pho
Discussed with:	jmg
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2013-10-21 16:46:12 +00:00
Xin LI
5e9a119cd6 Drop cm_lock before calling mapper_close, which in turn could call
_citrus_mapper_close again and result in a deadlock otherwise.

This is similar to NetBSD PR/24023 (fixed in their r1.5 of this file).

PR:		bin/182994
Submitted by:	Fabian Keil <fk fabiankeil de>
MFC after:	3 days
2013-10-21 07:58:37 +00:00
Jilles Tjoelker
0b89df4a57 syslog: Use SOCK_CLOEXEC instead of separate fcntl() call. 2013-10-20 21:04:44 +00:00
Jilles Tjoelker
02804449a2 popen(): Try to prevent inappropriate fd passing even if 'e' is not used.
Even though not all race conditions can be fixed if the 'e' option is not
used, still fix some race conditions using pipe2():

* Prevent both ends of the pipe from leaking to a concurrent popen().

* Prevent the child process's end of the pipe from leaking to any concurrent
  fork and exec.

This change also simplifies the code.
2013-10-20 20:50:17 +00:00
Rui Paulo
d4a14c8563 Clearly split the logic to build ATF and plain tests apart.
This change introduces a new plain.test.mk file that provides the build
infrastructure to build test programs that don't use any framework.
Most of the code previously in bsd.test.mk moves to plain.test.mk and
atf.test.mk is extended with the missing pieces.

In doing so, this change pushes all test program building logic to the
various *.test.mk files instead of trying to reuse some tiny bits.
In fact, this attempt to reuse some definitions makes the code harder
to read and harder to extend.

The clear benefit of this is that the interface of bsd.test.mk is now
clearly delimited.

Submitted by:	Julio Merino jmmv google.com
MFC after:	2 weeks
2013-10-19 06:48:49 +00:00
Mark Johnston
25aecfbb23 Fix the libproc build when DEBUG is defined. 2013-10-17 03:39:21 +00:00
Neel Natu
49cc03da31 Add a new capability, VM_CAP_ENABLE_INVPCID, that can be enabled to expose
'invpcid' instruction to the guest. Currently bhyve will try to enable this
capability unconditionally if it is available.

Consolidate code in bhyve to set the capabilities so it is no longer
duplicated in BSP and AP bringup.

Add a sysctl 'vm.pmap.invpcid_works' to display whether the 'invpcid'
instruction is available.

Reviewed by:	grehan
MFC after:	3 days
2013-10-16 18:20:27 +00:00
Xin LI
86ccd3e00f Make it possible to seek within a gzip stream. 2013-10-16 17:16:40 +00:00
Gleb Smirnoff
3329973740 Revert r256514 for libkvm. It wasn't correct actually and breaks build. 2013-10-15 13:53:35 +00:00
Gleb Smirnoff
a173916590 Make getutxent(3) more robust against bad utx.log files. Whenever we read
zeroes, don't stop processing the file, but read until its end or valid
data.

In collaboration with:	ed
2013-10-15 13:32:01 +00:00
Gleb Smirnoff
511b5fa590 - While we are spreading the counter(9) across network stack, more userland
tools would need to know about the counter_u64_t type. Allow to include
sys/counter.h from userspace.
- Utilize now defined type in kvm_counter_u64_fetch().

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2013-10-15 10:05:37 +00:00
Bryan Drewery
e3ededfa24 Rename libbsdyml to libyaml, make private, and bump
SHLIB_MAJOR to 1.0

Suggested by:	des
Approved by:	bapt
MFC after:	1 week
2013-10-14 18:31:15 +00:00
Rui Paulo
ec0e2ac611 Remove most of the ATF tools and the _atf user.
This is necessary because ATF is deprecated and it will be replaced by Kyua.

Submitted by:	jmmv@netbsd.org
Reviewed by:	Garrett Cooper
Approved by:	re
2013-10-12 06:06:53 +00:00
Dimitry Andric
c60c0372b0 Bump OS versions in the toolchain triples to 11.0, and bump the
__FreeBSD_cc_version predefined macros in clang and gcc.

Approved by:	re (gjb)
2013-10-10 20:47:11 +00:00
Alexander Kabaev
b13ba46dbf Unbreak zfsloader with LOADER_TFTP_SUPPORT on
Only accept 'net' and 'pxe' devices as underlying transport
in tftp.c on x86. Prior to this change tftp code would attempt
to send packets over any boot device, including zfs one with
predictably sad results.

Approved by: re (gjb)
MFC After: 1 month
2013-10-09 21:33:19 +00:00
Pawel Jakub Dawidek
772f66457a Handle the cases where NULL is passed as cap_rightsp to the
filestat_new_entry() function.

Reported by:	Alex Kozlov <spam@rm-rf.kiev.ua>
Approved by:	re (gjb)
2013-10-09 20:58:50 +00:00
Neel Natu
200758f114 Parse the memory size parameter using expand_number() to allow specifying
the memory size more intuitively (e.g. 512M, 4G etc).

Submitted by:	rodrigc
Reviewed by:	grehan
Approved by:	re (blanket)
2013-10-09 03:56:07 +00:00
John-Mark Gurney
44f01c419d don't assert on bad args, instead return an error..
Since so many programs don't check return value, always NUL terminate
the buf...

fix rounding when using base 1024 (the bug that started it all)...

add a set of test cases so we can make sure that things don't break
in the future...

Thanks to Clifton Royston for testing and the test program...

Approved by:	re (hrs, glebius)
MFC after:	1 week
2013-10-07 22:22:57 +00:00
Neel Natu
318224bbe6 Merge projects/bhyve_npt_pmap into head.
Make the amd64/pmap code aware of nested page table mappings used by bhyve
guests. This allows bhyve to associate each guest with its own vmspace and
deal with nested page faults in the context of that vmspace. This also
enables features like accessed/dirty bit tracking, swapping to disk and
transparent superpage promotions of guest memory.

Guest vmspace:
Each bhyve guest has a unique vmspace to represent the physical memory
allocated to the guest. Each memory segment allocated by the guest is
mapped into the guest's address space via the 'vmspace->vm_map' and is
backed by an object of type OBJT_DEFAULT.

pmap types:
The amd64/pmap now understands two types of pmaps: PT_X86 and PT_EPT.

The PT_X86 pmap type is used by the vmspace associated with the host kernel
as well as user processes executing on the host. The PT_EPT pmap is used by
the vmspace associated with a bhyve guest.

Page Table Entries:
The EPT page table entries as mostly similar in functionality to regular
page table entries although there are some differences in terms of what
bits are used to express that functionality. For e.g. the dirty bit is
represented by bit 9 in the nested PTE as opposed to bit 6 in the regular
x86 PTE. Therefore the bitmask representing the dirty bit is now computed
at runtime based on the type of the pmap. Thus PG_M that was previously a
macro now becomes a local variable that is initialized at runtime using
'pmap_modified_bit(pmap)'.

An additional wrinkle associated with EPT mappings is that older Intel
processors don't have hardware support for tracking accessed/dirty bits in
the PTE. This means that the amd64/pmap code needs to emulate these bits to
provide proper accounting to the VM subsystem. This is achieved by using
the following mapping for EPT entries that need emulation of A/D bits:
               Bit Position           Interpreted By
PG_V               52                 software (accessed bit emulation handler)
PG_RW              53                 software (dirty bit emulation handler)
PG_A               0                  hardware (aka EPT_PG_RD)
PG_M               1                  hardware (aka EPT_PG_WR)

The idea to use the mapping listed above for A/D bit emulation came from
Alan Cox (alc@).

The final difference with respect to x86 PTEs is that some EPT implementations
do not support superpage mappings. This is recorded in the 'pm_flags' field
of the pmap.

TLB invalidation:
The amd64/pmap code has a number of ways to do invalidation of mappings
that may be cached in the TLB: single page, multiple pages in a range or the
entire TLB. All of these funnel into a single EPT invalidation routine called
'pmap_invalidate_ept()'. This routine bumps up the EPT generation number and
sends an IPI to the host cpus that are executing the guest's vcpus. On a
subsequent entry into the guest it will detect that the EPT has changed and
invalidate the mappings from the TLB.

Guest memory access:
Since the guest memory is no longer wired we need to hold the host physical
page that backs the guest physical page before we can access it. The helper
functions 'vm_gpa_hold()/vm_gpa_release()' are available for this purpose.

PCI passthru:
Guest's with PCI passthru devices will wire the entire guest physical address
space. The MMIO BAR associated with the passthru device is backed by a
vm_object of type OBJT_SG. An IOMMU domain is created only for guest's that
have one or more PCI passthru devices attached to them.

Limitations:
There isn't a way to map a guest physical page without execute permissions.
This is because the amd64/pmap code interprets the guest physical mappings as
user mappings since they are numerically below VM_MAXUSER_ADDRESS. Since PG_U
shares the same bit position as EPT_PG_EXECUTE all guest mappings become
automatically executable.

Thanks to Alan Cox and Konstantin Belousov for their rigorous code reviews
as well as their support and encouragement.

Thanks for John Baldwin for reviewing the use of OBJT_SG as the backing
object for pci passthru mmio regions.

Special thanks to Peter Holm for testing the patch on short notice.

Approved by:	re
Discussed with:	grehan
Reviewed by:	alc, kib
Tested by:	pho
2013-10-05 21:22:35 +00:00
Jilles Tjoelker
0f49c96cfc accept(2): Update portability note for accept4().
The accept(2) man page warns that O_NONBLOCK and other properties on the
new socket may vary across implementations. However, this issue only
applies to accept() and not to accept4(). On the other hand, accept4()
is not commonly available yet.

Reported by:	pluknet
Reviewed by:	bjk
Approved by:	re (kib)
2013-10-01 21:17:18 +00:00
Dag-Erling Smørgrav
56b72efe82 Remove BIND.
Approved by:	re (gjb)
2013-09-30 17:23:45 +00:00
Xin LI
e4dedeefae Temporarily disable iconv for non-shared library builds. The dynamic
loading of conversation table is not yet compatible with static builds.

Approved by:	re (gjb)
2013-09-26 17:55:36 +00:00
Xin LI
7a087fd50a Import NetBSD readline.c,v 1.104: do not crash with add_history(NULL).
MFC after:	3 days
Approved by:	re (gjb)
2013-09-26 17:54:58 +00:00
Andrew Turner
27b7672219 Add an elf note on ARM to store the MACHINE_ARCH an executable was built
for. This is useful for software needing to know which architecture a
binary is built for as arm and armv6 have slight differences meaning only
some binaries build for one will work as expected on the other. It is
expected pkgng will be able to make use of this to simplify the logic to
determine which package ABI to use.

Approved by:	re (kib)
2013-09-26 07:53:18 +00:00
Ed Maste
e8f1392d95 Add LLDB bmake infrastructure
This connects LLDB to the build, but it is disabled by default.  Add
WITH_LLDB= to src.conf to build it.

Note that LLDB requires a C++11 compiler so is disabled on platforms
using GCC.

Approved by:	re (gjb)
Sponsored by:	DARPA, AFRL
2013-09-20 01:52:02 +00:00
Joel Dahl
828378a6d3 Minor mdoc improvements.
Approved by:	re (blanket)
2013-09-19 19:43:38 +00:00
John Baldwin
55648840de Extend the support for exempting processes from being killed when swap is
exhausted.
- Add a new protect(1) command that can be used to set or revoke protection
  from arbitrary processes.  Similar to ktrace it can apply a change to all
  existing descendants of a process as well as future descendants.
- Add a new procctl(2) system call that provides a generic interface for
  control operations on processes (as opposed to the debugger-specific
  operations provided by ptrace(2)).  procctl(2) uses a combination of
  idtype_t and an id to identify the set of processes on which to operate
  similar to wait6().
- Add a PROC_SPROTECT control operation to manage the protection status
  of a set of processes.  MADV_PROTECT still works for backwards
  compatability.
- Add a p_flag2 to struct proc (and a corresponding ki_flag2 to kinfo_proc)
  the first bit of which is used to track if P_PROTECT should be inherited
  by new child processes.

Reviewed by:	kib, jilles (earlier version)
Approved by:	re (delphij)
MFC after:	1 month
2013-09-19 18:53:42 +00:00
Michael Tuexen
10e6d832d5 Remove an unused variable and fix a memory leak in sctp_connectx().
Approved by:	re (gjb)
MFC after:	3 days
2013-09-19 06:19:24 +00:00
Dag-Erling Smørgrav
71b5e1bb76 Move libldns to the correct (ordered) library list.
Approved by:	re (blanket)
2013-09-15 15:55:21 +00:00
Dag-Erling Smørgrav
8f8790cdf4 Build and install the Unbound caching DNS resolver daemon.
Approved by:	re (blanket)
2013-09-15 14:51:23 +00:00
Dimitry Andric
f5355ab1ba After r255294, building lib/msun's symbol map (using clang as the
preprocessor) gives the following error:

--- Version.map ---
<stdin>:287:4: error: invalid preprocessing directive
        # Implemented as weak aliases for imprecise versions
          ^
1 error generated.

Change the comment to a C-style one, to prevent this error.

Approved by:	re (hrs)
2013-09-12 20:51:48 +00:00
Bryan Drewery
c36029e6dc Consistently reference file descriptors as "fd". 55 other manpages
used "fd", while these used "d" and "filedes".

MFC after:	1 week
Approved by:	gjb
Approved by:	re (delphij)
2013-09-12 00:53:38 +00:00
John Baldwin
edb572a38c Add a mmap flag (MAP_32BIT) on 64-bit platforms to request that a mapping use
an address in the first 2GB of the process's address space.  This flag should
have the same semantics as the same flag on Linux.

To facilitate this, add a new parameter to vm_map_find() that specifies an
optional maximum virtual address.  While here, fix several callers of
vm_map_find() to use a VMFS_* constant for the findspace argument instead of
TRUE and FALSE.

Reviewed by:	alc
Approved by:	re (kib)
2013-09-09 18:11:59 +00:00
Dag-Erling Smørgrav
1a5d9b871d LDNS needs OpenSSL. This wasn't a problem as long as it was only build
statically, since any program using it would have to link with it anyway.

Approved by:	re (blanket)
2013-09-08 19:39:18 +00:00
Dag-Erling Smørgrav
0b2766bd4e Make libldns and libssh private.
Approved by:	re (blanket)
2013-09-08 10:04:26 +00:00
Dag-Erling Smørgrav
ce77a8d692 Update to OpenPAM Nummularia. 2013-09-07 19:43:39 +00:00
Dag-Erling Smørgrav
f7e6344d4a MFV (r255364): move the code around in preparation for Nummularia. 2013-09-07 18:46:35 +00:00
Dag-Erling Smørgrav
ff67676447 Vendor import of OpenPAM Nummularia.. 2013-09-07 16:15:30 +00:00
Dag-Erling Smørgrav
2dd970c2a1 Prepare for OpenPAM Nummularia by reorganizing to match its new directory
structure.
2013-09-07 16:10:15 +00:00
Andrew Turner
0a10f22a30 On ARM EABI double precision floating point values are stored in the
endian the CPU is in, i.e. little-endian on most ARM cores.

This allows ARMv4 and ARMv5 boards to boot with the ARM EABI.
2013-09-07 14:04:10 +00:00
Jilles Tjoelker
550ac4a8e8 wait(2): Add some possible caveats to standards section. 2013-09-07 11:41:52 +00:00
Jilles Tjoelker
d4c612c3b8 libc: Make resolver sockets close-on-exec (SOCK_CLOEXEC).
Although the resolver's sockets are exposed to applications via res_state,
I do not expect them to pass the sockets across execve().
2013-09-06 23:49:54 +00:00
Jilles Tjoelker
7253197882 libc: Use SOCK_CLOEXEC for various internal file descriptors.
This change avoids undesirably passing some internal file descriptors to a
process created (fork+exec) by another thread.

Kernel support for SOCK_CLOEXEC was added in r248534, March 19, 2013.
2013-09-06 21:02:06 +00:00
Jilles Tjoelker
ef70de180c libc/stdio: Allow fopen/freopen modes in any order (except initial r/w/a).
Austin Group issue #411 requires 'e' to be accepted before and after 'x',
and encourages accepting the characters in any order, except the initial
'r', 'w' or 'a'.

Given that glibc accepts the characters after r/w/a in any order and that
diagnosing this problem may be hard, change our libc to behave that way as
well.
2013-09-06 13:47:16 +00:00
David Chisnall
feae3109e8 Use Makefile.inc instead of .export. 2013-09-06 10:40:38 +00:00
David Chisnall
b49c0d5878 Fix the namespace pollution caused by iconv.h including stdbool.h
This broke any C89 ports that defined bool themselves, including things
like gcc, gtk, and so on.
2013-09-06 09:46:44 +00:00
Jilles Tjoelker
75b1cda430 Update some signal man pages for multithreading. 2013-09-06 09:08:40 +00:00
David Chisnall
4758b87596 Add stub implementations of the missing C++11 math functions.
These are weak and so can be replaced by other versions in applications
that choose to do so, and will give a linker warning when used so that
applications that rely on the extra precision can avoid them.

Note that since the C/C++ specs only guarantee that long double has
precision equal to double, code that actually relies on these functions
having greater precision is unportable at best and broken at worst.
2013-09-06 07:58:23 +00:00
Hans Petter Selasky
5122043efe Correct two comments. 2013-09-05 12:21:11 +00:00
Pawel Jakub Dawidek
7008be5bd7 Change the cap_rights_t type from uint64_t to a structure that we can extend
in the future in a backward compatible (API and ABI) way.

The cap_rights_t represents capability rights. We used to use one bit to
represent one right, but we are running out of spare bits. Currently the new
structure provides place for 114 rights (so 50 more than the previous
cap_rights_t), but it is possible to grow the structure to hold at least 285
rights, although we can make it even larger if 285 rights won't be enough.

The structure definition looks like this:

	struct cap_rights {
		uint64_t	cr_rights[CAP_RIGHTS_VERSION + 2];
	};

The initial CAP_RIGHTS_VERSION is 0.

The top two bits in the first element of the cr_rights[] array contain total
number of elements in the array - 2. This means if those two bits are equal to
0, we have 2 array elements.

The top two bits in all remaining array elements should be 0.
The next five bits in all array elements contain array index. Only one bit is
used and bit position in this five-bits range defines array index. This means
there can be at most five array elements in the future.

To define new right the CAPRIGHT() macro must be used. The macro takes two
arguments - an array index and a bit to set, eg.

	#define	CAP_PDKILL	CAPRIGHT(1, 0x0000000000000800ULL)

We still support aliases that combine few rights, but the rights have to belong
to the same array element, eg:

	#define	CAP_LOOKUP	CAPRIGHT(0, 0x0000000000000400ULL)
	#define	CAP_FCHMOD	CAPRIGHT(0, 0x0000000000002000ULL)

	#define	CAP_FCHMODAT	(CAP_FCHMOD | CAP_LOOKUP)

There is new API to manage the new cap_rights_t structure:

	cap_rights_t *cap_rights_init(cap_rights_t *rights, ...);
	void cap_rights_set(cap_rights_t *rights, ...);
	void cap_rights_clear(cap_rights_t *rights, ...);
	bool cap_rights_is_set(const cap_rights_t *rights, ...);

	bool cap_rights_is_valid(const cap_rights_t *rights);
	void cap_rights_merge(cap_rights_t *dst, const cap_rights_t *src);
	void cap_rights_remove(cap_rights_t *dst, const cap_rights_t *src);
	bool cap_rights_contains(const cap_rights_t *big, const cap_rights_t *little);

Capability rights to the cap_rights_init(), cap_rights_set(),
cap_rights_clear() and cap_rights_is_set() functions are provided by
separating them with commas, eg:

	cap_rights_t rights;

	cap_rights_init(&rights, CAP_READ, CAP_WRITE, CAP_FSTAT);

There is no need to terminate the list of rights, as those functions are
actually macros that take care of the termination, eg:

	#define	cap_rights_set(rights, ...)				\
		__cap_rights_set((rights), __VA_ARGS__, 0ULL)
	void __cap_rights_set(cap_rights_t *rights, ...);

Thanks to using one bit as an array index we can assert in those functions that
there are no two rights belonging to different array elements provided
together. For example this is illegal and will be detected, because CAP_LOOKUP
belongs to element 0 and CAP_PDKILL to element 1:

	cap_rights_init(&rights, CAP_LOOKUP | CAP_PDKILL);

Providing several rights that belongs to the same array's element this way is
correct, but is not advised. It should only be used for aliases definition.

This commit also breaks compatibility with some existing Capsicum system calls,
but I see no other way to do that. This should be fine as Capsicum is still
experimental and this change is not going to 9.x.

Sponsored by:	The FreeBSD Foundation
2013-09-05 00:09:56 +00:00