14370 Commits

Author SHA1 Message Date
Neel Natu
318224bbe6 Merge projects/bhyve_npt_pmap into head.
Make the amd64/pmap code aware of nested page table mappings used by bhyve
guests. This allows bhyve to associate each guest with its own vmspace and
deal with nested page faults in the context of that vmspace. This also
enables features like accessed/dirty bit tracking, swapping to disk and
transparent superpage promotions of guest memory.

Guest vmspace:
Each bhyve guest has a unique vmspace to represent the physical memory
allocated to the guest. Each memory segment allocated by the guest is
mapped into the guest's address space via the 'vmspace->vm_map' and is
backed by an object of type OBJT_DEFAULT.

pmap types:
The amd64/pmap now understands two types of pmaps: PT_X86 and PT_EPT.

The PT_X86 pmap type is used by the vmspace associated with the host kernel
as well as user processes executing on the host. The PT_EPT pmap is used by
the vmspace associated with a bhyve guest.

Page Table Entries:
The EPT page table entries as mostly similar in functionality to regular
page table entries although there are some differences in terms of what
bits are used to express that functionality. For e.g. the dirty bit is
represented by bit 9 in the nested PTE as opposed to bit 6 in the regular
x86 PTE. Therefore the bitmask representing the dirty bit is now computed
at runtime based on the type of the pmap. Thus PG_M that was previously a
macro now becomes a local variable that is initialized at runtime using
'pmap_modified_bit(pmap)'.

An additional wrinkle associated with EPT mappings is that older Intel
processors don't have hardware support for tracking accessed/dirty bits in
the PTE. This means that the amd64/pmap code needs to emulate these bits to
provide proper accounting to the VM subsystem. This is achieved by using
the following mapping for EPT entries that need emulation of A/D bits:
               Bit Position           Interpreted By
PG_V               52                 software (accessed bit emulation handler)
PG_RW              53                 software (dirty bit emulation handler)
PG_A               0                  hardware (aka EPT_PG_RD)
PG_M               1                  hardware (aka EPT_PG_WR)

The idea to use the mapping listed above for A/D bit emulation came from
Alan Cox (alc@).

The final difference with respect to x86 PTEs is that some EPT implementations
do not support superpage mappings. This is recorded in the 'pm_flags' field
of the pmap.

TLB invalidation:
The amd64/pmap code has a number of ways to do invalidation of mappings
that may be cached in the TLB: single page, multiple pages in a range or the
entire TLB. All of these funnel into a single EPT invalidation routine called
'pmap_invalidate_ept()'. This routine bumps up the EPT generation number and
sends an IPI to the host cpus that are executing the guest's vcpus. On a
subsequent entry into the guest it will detect that the EPT has changed and
invalidate the mappings from the TLB.

Guest memory access:
Since the guest memory is no longer wired we need to hold the host physical
page that backs the guest physical page before we can access it. The helper
functions 'vm_gpa_hold()/vm_gpa_release()' are available for this purpose.

PCI passthru:
Guest's with PCI passthru devices will wire the entire guest physical address
space. The MMIO BAR associated with the passthru device is backed by a
vm_object of type OBJT_SG. An IOMMU domain is created only for guest's that
have one or more PCI passthru devices attached to them.

Limitations:
There isn't a way to map a guest physical page without execute permissions.
This is because the amd64/pmap code interprets the guest physical mappings as
user mappings since they are numerically below VM_MAXUSER_ADDRESS. Since PG_U
shares the same bit position as EPT_PG_EXECUTE all guest mappings become
automatically executable.

Thanks to Alan Cox and Konstantin Belousov for their rigorous code reviews
as well as their support and encouragement.

Thanks for John Baldwin for reviewing the use of OBJT_SG as the backing
object for pci passthru mmio regions.

Special thanks to Peter Holm for testing the patch on short notice.

Approved by:	re
Discussed with:	grehan
Reviewed by:	alc, kib
Tested by:	pho
2013-10-05 21:22:35 +00:00
Peter Grehan
94c3b3bffc Remove obsolete cmd-line options and code associated with
these.
 The mux-vcpus option may return at some point, given it's utility
in finding bhyve (and FreeBSD) bugs.

Approved by:	re@ (blanket)
Discussed with:	neel@
2013-10-04 23:29:07 +00:00
Jilles Tjoelker
6d9cb20bd7 kldxref: Do not depend on the directory order.
Sort the filenames to get a consistent result between machines of the same
architecture.

Also, sort FTS_D entries after other entries so kldxref -R works properly in
the uncommon case that a directory contains both subdirectories and modules.
Previously, this may have happened to work, depending on the order of files
in the directory.

PR:		bin/182098
Submitted by:	Derek Schrock (original version)
Tested by:	Derek Schrock
Approved by:	re (delphij)
MFC after:	1 week
2013-10-04 21:25:55 +00:00
Peter Grehan
54b70fdcae Hook up the AHCI and blockif code to the build.
Approved by:	re@ (blanket)
2013-10-04 18:44:47 +00:00
Peter Grehan
c354c096d3 Import Zhixiang Yu's GSoC'13 AHCI emulation:
https://wiki.freebsd.org/SummerOfCode2013/bhyveAHCI

This provides ICH8 SATA disk and ATAPI ports, selectable
via the bhyve slot command-line parameter:

SATA
  -s <slot>,ahci-hd,<image-file>

ATAPI
  -s <slot>,ahci-cd,<image-file>

Slight modifications by:	grehan@
Approved by:	re@ (blanket)
Obtained from:	FreeBSD GSoC'13
2013-10-04 18:31:38 +00:00
Peter Grehan
7cf5a7eeb0 Block-layer backend interface for bhyve block-io device emulations.
Approved by:	re@ (blanket)
2013-10-04 16:52:03 +00:00
Ollivier Robert
be77ef1b5f Meinberg clocks support was inadvertently removed during the last vendor
import.  Add it back.

PR:		bin/182545
Submitted by:	Joerg Pulz <Joerg.Pulz@frm2.tum.de>
Approved by:	re (delphij)
MFC after:	1 week
2013-10-02 21:47:25 +00:00
Sergey Kandaurov
05d98029e9 Sweep man pages replacing ad -> ada.
Approved by:	re (blackend)
MFC after:	1 week
X-MFC note:	stable/9 only
2013-10-01 18:41:53 +00:00
Dag-Erling Smørgrav
56b72efe82 Remove BIND.
Approved by:	re (gjb)
2013-09-30 17:23:45 +00:00
Gavin Atkinson
582071bd52 Remove ftp5.se.f.o, as per request to -hubs@
Approved by:	re (glebius)
MFC after:	3 days
2013-09-28 13:58:21 +00:00
Brad Davis
59ecdf43de - Remove the is (Iceland) mirror per mail from the admins.
Approved by:	re
With hat: clusteradm@
2013-09-27 11:25:37 +00:00
Peter Grehan
6a77884d08 Fix incorrect assertion on the minimum side. ZFS would
trigger this.

Reported by:	Chris Torek, Allan Jude
Approved by:	re@ (blanket)
2013-09-26 16:25:06 +00:00
Dag-Erling Smørgrav
058a4e3419 Prevent resolvconf from updating /etc/resolv.conf. As Jakob Schlyter
pointed out, having additional nameservers listed in /etc/resolv.conf
can break DNSSEC verification by providing a false positive if unbound
returns SERVFAIL due to an invalid signature.  The downside is that
the domain / search path won't get updated either, but we can live
with that.

Approved by:	re (blanket)
2013-09-23 20:06:59 +00:00
Gleb Smirnoff
876fc15d47 Fix coredump on 'arp -d'.
Submitted by:	az
Approved by:	re (kib)
2013-09-23 18:12:25 +00:00
Dag-Erling Smørgrav
98e2cd036d Ensure that resolvconf(8) preserves the edns0 setting.
Approved by:	re (blanket)
2013-09-23 17:35:23 +00:00
Devin Teske
cda7fc92b7 Fix a bug in HTTP checking/fetching.
Fix a bug in HTTP checking/fetching. Add Main Site to HTTP menu. Add new
example script browse_packages_http.sh and move existing example script
browse_packages.sh -> browse_packages_ftp.sh

Reviewed by:	gjb, brd
Approved by:	re (gjb), clusteradm (brd)
MFC after:	3 days
2013-09-23 16:47:52 +00:00
Nathan Whitehorn
8d795806e9 Add installer support for CHRP/PAPR PowerPC systems that use MBR+BSD
formatting, like x86, but with an additional MBR slice containing a raw
boot partition.

Approved by:	re (gjb)
2013-09-23 14:18:34 +00:00
Dag-Erling Smørgrav
49cede74ee Add a setup script for unbound(8) called local-unbound-setup. It
generates a configuration suitable for running unbound as a caching
forwarding resolver, and configures resolvconf(8) to update unbound's
list of forwarders in addition to /etc/resolv.conf.  The initial list
is taken from the existing resolv.conf, which is rewritten to point to
localhost.  Alternatively, a list of forwarders can be provided on the
command line.

To assist this script, add an rc.subr command called "enabled" which
does nothing except return 0 if the service is enabled and 1 if it is
not, without going through the usual checks.  We should consider doing
the same for "status", which is currently pointless.

Add an rc script for unbound, called local_unbound.  If there is no
configuration file, the rc script runs local-unbound-setup to generate
one.

Note that these scripts place the unbound configuration files in
/var/unbound rather than /etc/unbound.  This is necessary so that
unbound can reload its configuration while chrooted.  We should
probably provide symlinks in /etc.

Approved by:	re (blanket)
2013-09-23 04:36:51 +00:00
Mikolaj Golub
9da0ef1316 1. Properly clean pid files in the case of the error.
2. Write the supervisor pid before the restart loop, so we don't
   uselessly rewrite it after every child restart.
3. Remove duplicate ppfh and pfh initialization.

Approved by:	re (glebius)
MFC after:	2 weeks
2013-09-19 18:00:05 +00:00
Peter Grehan
aa8cb5f311 Implement support for the interrupt-on-terminal-count and
s/w-strobe timer modes. These are commonly used by non-FreeBSD
o/s's.

Approved by:	re@ (blanket)
2013-09-19 04:59:44 +00:00
Peter Grehan
151dba4a87 Add simplistic periodic timer support to mevent using kqueue's
timer support. This should be enough for the emulation of
h/w periodic timers (and no more) e.g. some of the 8254's
more esoteric modes that happen to be used by non-FreeBSD o/s's.

Approved by:	re@ (blanket)
2013-09-19 04:48:26 +00:00
Peter Grehan
4458253e97 Allow the alarm hours/mins/seconds registers to be read/written,
though without any action. This avoids a hypervisor exit when
o/s's access these regs (Linux).

Reviewed by:	neel
Approved by:	re@ (blanket)
2013-09-19 04:29:03 +00:00
Peter Grehan
c20d3f633a Use correct offset for the high byte of high memory written to
RTC NVRAM.

Submitted by:	Bela Lubkin   bela dot lubkin at tidalscale dot com
Approved by:	re@ (blanket)
2013-09-19 04:20:18 +00:00
Edward Tomasz Napierala
7843bd031a Fix several problems in the new iSCSI stack; this includes interoperability
fix for LIO (Linux target), removing possibility for the target to avoid mutual
CHAP by choosing to skip authentication altogether, and fixing truncated error
messages in iscsictl(8) output.  This also fixes several of the problems found
with Coverity.

Note that this change requires world rebuild.

Coverity CID:	1088038, 1087998, 1087990, 1088004, 1088044, 1088041, 1088040
Approved by:	re (blanket)
Sponsored by:	FreeBSD Foundation
2013-09-18 21:15:21 +00:00
Edward Tomasz Napierala
c76e8a9aa0 Make iscsictl(8) automatically try to load the iscsi module. While here,
improve module loading in iscsid(8) and ctld(8).

Approved by:	re (delphij)
2013-09-18 08:37:14 +00:00
Peter Grehan
aaa3016924 Pass the number of supported vectors to pci_emul_add_msicap() and
not the actual PCI BAR number.

Reviewed by:	neel
Approved by:	re@ (blanket)
2013-09-17 18:42:13 +00:00
Edward Tomasz Napierala
1f90f28221 Improve iSCSI address resolution, fixing "InitiatorAddress" handling,
and error reporting.

Approved by:	re (kib)
2013-09-17 14:23:15 +00:00
Sean Bruno
22e3858c24 Assume that the -f argument is /dev/gpioc0 if it is not passed.
hrs@ provided this verison of the patch and showed me where all the needed
changes were to be made outside of gpioctl.c

Approved by:	re (hrs)
MFC after:	2 weeks
2013-09-17 11:48:47 +00:00
Dag-Erling Smørgrav
5a92968ca6 Set NO_WERROR for unbound until I can figure out how to unbreak the
non-clang build.

Approved by:	re (blanket)
2013-09-15 16:27:25 +00:00
Dag-Erling Smørgrav
8f8790cdf4 Build and install the Unbound caching DNS resolver daemon.
Approved by:	re (blanket)
2013-09-15 14:51:23 +00:00
Joel Dahl
33e1779ab8 Minor mdoc fixes.
Approved by:	re (blanket)
2013-09-14 21:43:18 +00:00
Edward Tomasz Napierala
009ea47eb2 Bring in the new iSCSI target and initiator.
Reviewed by:	ken (parts)
Approved by:	re (delphij)
Sponsored by:	FreeBSD Foundation
2013-09-14 15:29:06 +00:00
Sean Bruno
dbcb0e960a Add gpio(4) man page to attempt to document the current hints based setup of
pin outputs, functions and setup.

Add cross reference in gpioctl(8) for people to find.

This is by no means complete and really only covers gpioled(4) and the
Atheros based systems who expose a few extra hints at boot time.

This should be updated by developers who know more about this system than
I and viewed as the beginning of documentation, not the end.

Reviewed by:	adrian
Approved by:	re (joel)
MFC after:	2 weeks
2013-09-13 19:55:40 +00:00
Joel Dahl
40e463f4d9 mdoc: remove EOL whitespace.
Approved by:	re (blanket)
2013-09-13 19:19:21 +00:00
John-Mark Gurney
32b17786bd add support for writing the pid of the daemon program to a pid file so
that daemon can be used w/ rc.subr and ports can use the additional
functionality, such as keeping the ldap daemon up and running, and have
the proper program to signal to exit..

PR:		bin/181341
Submitted by:	feld
Approved by:	re (glebius)
2013-09-13 16:57:28 +00:00
Xin LI
fb9e5c2167 Do not emit size for non-regular files. There is nothing that
mtree(1) can do in this situation and would cause confusion.

MFC candidate.

Approved by:	re (hrs)
2013-09-12 00:14:25 +00:00
Baptiste Daroussin
ad59bd68ad Cleanup elf macros
Only define EF_MIPS_ABI when not already supplied
Remove old now unused ARM macros

Reported by:	imp
Approved by:	re (kib)
2013-09-11 06:42:55 +00:00
Baptiste Daroussin
40274a2d65 Add support to detect arm vs armv6
There are two different versions of the ARM ABI depending on the
TARGET_ARCH. As these are sligntly different a package built for
one may not work on another. We need to detect which one we are on
by parsing the .ARM.attributes section.

This will only work on the ARM EABI as this section is part of the
ABI definition. As armv6 only supports the ARM EABI this is not a
problem for the oabi.

Older versions of libelf in FreeBSD fail to read the
.ARM.attributes section needed. As armv6 is unsupported on these
versions we can assume we are running on arm.

Submitted by:	andrew
Approved by:	re (delphij)
Obtained from:	pkgng git
2013-09-10 20:56:01 +00:00
Peter Grehan
8d39ed16c2 Go way past 11 and bump bhyve's max vCPUs to 16.
This should be sufficient for 10.0 and will do
until forthcoming work to avoid limitations
in this area is complete.

Thanks to Bela Lubkin at tidalscale for the
headsup on the apic/cpu id/io apic ASL parameters
that are actually hex values and broke when
written as decimal when 11 vCPUs were configured.

Approved by:	re@
2013-09-10 03:48:18 +00:00
Xin LI
b10eed4081 Pass -n (do not emit comments) when saving mtree information for future
mergemaster(8) runs.

MFC after:	3 days
Approved by:	re (kib)
2013-09-09 20:36:28 +00:00
Dag-Erling Smørgrav
452262c8ea Tweak wording. 2013-09-07 20:25:22 +00:00
Devin Teske
05f7eb31a2 Remove unnecessary mediaClose (FTP operations are done with either ftp(1)
or fetch(1), neither of which are stateful, compared to how sysinstall(8)
did FTP operations, maintaining an open session until mediaClose).
2013-09-07 03:27:13 +00:00
Devin Teske
7e9ed73522 Long URLs don't always appear even with autosizing and other tricks. So,
add some whitespace to put the URL on a line by itself, maximizing view.
2013-09-07 03:24:22 +00:00
Peter Grehan
fa48032049 Fix spelling. 2013-09-06 05:58:10 +00:00
Peter Grehan
841caa4090 Allow level-triggered interrupt sources. While this isn't
precisely emulated, it is good enough for the single consumer
i.e. irq4, the serial port on Linux.
2013-09-06 05:55:43 +00:00
Jilles Tjoelker
79e6a2c01e watch: Do not mess up the tty modes on early error.
Record the initial state earlier, so it is always safe to restore it.

One way this happens is if watch(8) is started by a user that does not have
access to /dev/snp. The result is "staircase effect" during later commands.

PR:		bin/153052
MFC after:	1 week
2013-09-05 19:02:03 +00:00
Pawel Jakub Dawidek
2057b58b17 Remove fallback to fork(2) if pdfork(2) is not available. If the parent
process dies, the process descriptor will be closed and pdfork(2)ed child
will be killed, which is not the case when regular fork(2) is used.

The PROCDESC option is now part of the GENERIC kernel configuration, so we
can start depending on it.

Add UPDATING entry to inform that this option is now required and log
detailed instruction to syslog if pdfork(2) is not available:

	The pdfork(2) system call is not available; recompile the kernel with options PROCDESC

Submitted by:	Mariusz Zaborski <oshogbo@FreeBSD.org>
Sponsored by:	Google Summer of Code 2013
2013-09-05 01:05:48 +00:00
Pawel Jakub Dawidek
7008be5bd7 Change the cap_rights_t type from uint64_t to a structure that we can extend
in the future in a backward compatible (API and ABI) way.

The cap_rights_t represents capability rights. We used to use one bit to
represent one right, but we are running out of spare bits. Currently the new
structure provides place for 114 rights (so 50 more than the previous
cap_rights_t), but it is possible to grow the structure to hold at least 285
rights, although we can make it even larger if 285 rights won't be enough.

The structure definition looks like this:

	struct cap_rights {
		uint64_t	cr_rights[CAP_RIGHTS_VERSION + 2];
	};

The initial CAP_RIGHTS_VERSION is 0.

The top two bits in the first element of the cr_rights[] array contain total
number of elements in the array - 2. This means if those two bits are equal to
0, we have 2 array elements.

The top two bits in all remaining array elements should be 0.
The next five bits in all array elements contain array index. Only one bit is
used and bit position in this five-bits range defines array index. This means
there can be at most five array elements in the future.

To define new right the CAPRIGHT() macro must be used. The macro takes two
arguments - an array index and a bit to set, eg.

	#define	CAP_PDKILL	CAPRIGHT(1, 0x0000000000000800ULL)

We still support aliases that combine few rights, but the rights have to belong
to the same array element, eg:

	#define	CAP_LOOKUP	CAPRIGHT(0, 0x0000000000000400ULL)
	#define	CAP_FCHMOD	CAPRIGHT(0, 0x0000000000002000ULL)

	#define	CAP_FCHMODAT	(CAP_FCHMOD | CAP_LOOKUP)

There is new API to manage the new cap_rights_t structure:

	cap_rights_t *cap_rights_init(cap_rights_t *rights, ...);
	void cap_rights_set(cap_rights_t *rights, ...);
	void cap_rights_clear(cap_rights_t *rights, ...);
	bool cap_rights_is_set(const cap_rights_t *rights, ...);

	bool cap_rights_is_valid(const cap_rights_t *rights);
	void cap_rights_merge(cap_rights_t *dst, const cap_rights_t *src);
	void cap_rights_remove(cap_rights_t *dst, const cap_rights_t *src);
	bool cap_rights_contains(const cap_rights_t *big, const cap_rights_t *little);

Capability rights to the cap_rights_init(), cap_rights_set(),
cap_rights_clear() and cap_rights_is_set() functions are provided by
separating them with commas, eg:

	cap_rights_t rights;

	cap_rights_init(&rights, CAP_READ, CAP_WRITE, CAP_FSTAT);

There is no need to terminate the list of rights, as those functions are
actually macros that take care of the termination, eg:

	#define	cap_rights_set(rights, ...)				\
		__cap_rights_set((rights), __VA_ARGS__, 0ULL)
	void __cap_rights_set(cap_rights_t *rights, ...);

Thanks to using one bit as an array index we can assert in those functions that
there are no two rights belonging to different array elements provided
together. For example this is illegal and will be detected, because CAP_LOOKUP
belongs to element 0 and CAP_PDKILL to element 1:

	cap_rights_init(&rights, CAP_LOOKUP | CAP_PDKILL);

Providing several rights that belongs to the same array's element this way is
correct, but is not advised. It should only be used for aliases definition.

This commit also breaks compatibility with some existing Capsicum system calls,
but I see no other way to do that. This should be fine as Capsicum is still
experimental and this change is not going to 9.x.

Sponsored by:	The FreeBSD Foundation
2013-09-05 00:09:56 +00:00
Jeremie Le Hen
94582faa19 Include the calling context in the mail subject, if any.
More concretely, periodic security scripts defaults to being
called from daily ones -- daily context -- so the mail subject
will now be "${HOST} daily security run output" instead of
"{HOST} security run output".

If you switch the period of some security checks to weekly, you
will receive another email "${HOST} weekly security run output".
2013-09-03 13:40:24 +00:00
Hiroki Sato
42f725eefb Ignore if the interface is not IPv6-capable.
Spotted by:	rpaulo
2013-09-02 20:44:19 +00:00