14319 Commits

Author SHA1 Message Date
neel
aed205d5cd Merge projects/bhyve_npt_pmap into head.
Make the amd64/pmap code aware of nested page table mappings used by bhyve
guests. This allows bhyve to associate each guest with its own vmspace and
deal with nested page faults in the context of that vmspace. This also
enables features like accessed/dirty bit tracking, swapping to disk and
transparent superpage promotions of guest memory.

Guest vmspace:
Each bhyve guest has a unique vmspace to represent the physical memory
allocated to the guest. Each memory segment allocated by the guest is
mapped into the guest's address space via the 'vmspace->vm_map' and is
backed by an object of type OBJT_DEFAULT.

pmap types:
The amd64/pmap now understands two types of pmaps: PT_X86 and PT_EPT.

The PT_X86 pmap type is used by the vmspace associated with the host kernel
as well as user processes executing on the host. The PT_EPT pmap is used by
the vmspace associated with a bhyve guest.

Page Table Entries:
The EPT page table entries as mostly similar in functionality to regular
page table entries although there are some differences in terms of what
bits are used to express that functionality. For e.g. the dirty bit is
represented by bit 9 in the nested PTE as opposed to bit 6 in the regular
x86 PTE. Therefore the bitmask representing the dirty bit is now computed
at runtime based on the type of the pmap. Thus PG_M that was previously a
macro now becomes a local variable that is initialized at runtime using
'pmap_modified_bit(pmap)'.

An additional wrinkle associated with EPT mappings is that older Intel
processors don't have hardware support for tracking accessed/dirty bits in
the PTE. This means that the amd64/pmap code needs to emulate these bits to
provide proper accounting to the VM subsystem. This is achieved by using
the following mapping for EPT entries that need emulation of A/D bits:
               Bit Position           Interpreted By
PG_V               52                 software (accessed bit emulation handler)
PG_RW              53                 software (dirty bit emulation handler)
PG_A               0                  hardware (aka EPT_PG_RD)
PG_M               1                  hardware (aka EPT_PG_WR)

The idea to use the mapping listed above for A/D bit emulation came from
Alan Cox (alc@).

The final difference with respect to x86 PTEs is that some EPT implementations
do not support superpage mappings. This is recorded in the 'pm_flags' field
of the pmap.

TLB invalidation:
The amd64/pmap code has a number of ways to do invalidation of mappings
that may be cached in the TLB: single page, multiple pages in a range or the
entire TLB. All of these funnel into a single EPT invalidation routine called
'pmap_invalidate_ept()'. This routine bumps up the EPT generation number and
sends an IPI to the host cpus that are executing the guest's vcpus. On a
subsequent entry into the guest it will detect that the EPT has changed and
invalidate the mappings from the TLB.

Guest memory access:
Since the guest memory is no longer wired we need to hold the host physical
page that backs the guest physical page before we can access it. The helper
functions 'vm_gpa_hold()/vm_gpa_release()' are available for this purpose.

PCI passthru:
Guest's with PCI passthru devices will wire the entire guest physical address
space. The MMIO BAR associated with the passthru device is backed by a
vm_object of type OBJT_SG. An IOMMU domain is created only for guest's that
have one or more PCI passthru devices attached to them.

Limitations:
There isn't a way to map a guest physical page without execute permissions.
This is because the amd64/pmap code interprets the guest physical mappings as
user mappings since they are numerically below VM_MAXUSER_ADDRESS. Since PG_U
shares the same bit position as EPT_PG_EXECUTE all guest mappings become
automatically executable.

Thanks to Alan Cox and Konstantin Belousov for their rigorous code reviews
as well as their support and encouragement.

Thanks for John Baldwin for reviewing the use of OBJT_SG as the backing
object for pci passthru mmio regions.

Special thanks to Peter Holm for testing the patch on short notice.

Approved by:	re
Discussed with:	grehan
Reviewed by:	alc, kib
Tested by:	pho
2013-10-05 21:22:35 +00:00
grehan
26b6c9487a Remove obsolete cmd-line options and code associated with
these.
 The mux-vcpus option may return at some point, given it's utility
in finding bhyve (and FreeBSD) bugs.

Approved by:	re@ (blanket)
Discussed with:	neel@
2013-10-04 23:29:07 +00:00
jilles
64b622034f kldxref: Do not depend on the directory order.
Sort the filenames to get a consistent result between machines of the same
architecture.

Also, sort FTS_D entries after other entries so kldxref -R works properly in
the uncommon case that a directory contains both subdirectories and modules.
Previously, this may have happened to work, depending on the order of files
in the directory.

PR:		bin/182098
Submitted by:	Derek Schrock (original version)
Tested by:	Derek Schrock
Approved by:	re (delphij)
MFC after:	1 week
2013-10-04 21:25:55 +00:00
grehan
56fd486581 Hook up the AHCI and blockif code to the build.
Approved by:	re@ (blanket)
2013-10-04 18:44:47 +00:00
grehan
00cc733e89 Import Zhixiang Yu's GSoC'13 AHCI emulation:
https://wiki.freebsd.org/SummerOfCode2013/bhyveAHCI

This provides ICH8 SATA disk and ATAPI ports, selectable
via the bhyve slot command-line parameter:

SATA
  -s <slot>,ahci-hd,<image-file>

ATAPI
  -s <slot>,ahci-cd,<image-file>

Slight modifications by:	grehan@
Approved by:	re@ (blanket)
Obtained from:	FreeBSD GSoC'13
2013-10-04 18:31:38 +00:00
grehan
6d6539c1e2 Block-layer backend interface for bhyve block-io device emulations.
Approved by:	re@ (blanket)
2013-10-04 16:52:03 +00:00
roberto
960b6b30ed Meinberg clocks support was inadvertently removed during the last vendor
import.  Add it back.

PR:		bin/182545
Submitted by:	Joerg Pulz <Joerg.Pulz@frm2.tum.de>
Approved by:	re (delphij)
MFC after:	1 week
2013-10-02 21:47:25 +00:00
pluknet
3f9b259642 Sweep man pages replacing ad -> ada.
Approved by:	re (blackend)
MFC after:	1 week
X-MFC note:	stable/9 only
2013-10-01 18:41:53 +00:00
des
aa2e4b623c Remove BIND.
Approved by:	re (gjb)
2013-09-30 17:23:45 +00:00
gavin
068e00d46b Remove ftp5.se.f.o, as per request to -hubs@
Approved by:	re (glebius)
MFC after:	3 days
2013-09-28 13:58:21 +00:00
brd
f1a7c5fc50 - Remove the is (Iceland) mirror per mail from the admins.
Approved by:	re
With hat: clusteradm@
2013-09-27 11:25:37 +00:00
grehan
128d4995ee Fix incorrect assertion on the minimum side. ZFS would
trigger this.

Reported by:	Chris Torek, Allan Jude
Approved by:	re@ (blanket)
2013-09-26 16:25:06 +00:00
des
0f8f840670 Prevent resolvconf from updating /etc/resolv.conf. As Jakob Schlyter
pointed out, having additional nameservers listed in /etc/resolv.conf
can break DNSSEC verification by providing a false positive if unbound
returns SERVFAIL due to an invalid signature.  The downside is that
the domain / search path won't get updated either, but we can live
with that.

Approved by:	re (blanket)
2013-09-23 20:06:59 +00:00
glebius
d232e740fa Fix coredump on 'arp -d'.
Submitted by:	az
Approved by:	re (kib)
2013-09-23 18:12:25 +00:00
des
56573b50a6 Ensure that resolvconf(8) preserves the edns0 setting.
Approved by:	re (blanket)
2013-09-23 17:35:23 +00:00
dteske
64fbf75855 Fix a bug in HTTP checking/fetching.
Fix a bug in HTTP checking/fetching. Add Main Site to HTTP menu. Add new
example script browse_packages_http.sh and move existing example script
browse_packages.sh -> browse_packages_ftp.sh

Reviewed by:	gjb, brd
Approved by:	re (gjb), clusteradm (brd)
MFC after:	3 days
2013-09-23 16:47:52 +00:00
nwhitehorn
23d9cd5eff Add installer support for CHRP/PAPR PowerPC systems that use MBR+BSD
formatting, like x86, but with an additional MBR slice containing a raw
boot partition.

Approved by:	re (gjb)
2013-09-23 14:18:34 +00:00
des
b1d537a11d Add a setup script for unbound(8) called local-unbound-setup. It
generates a configuration suitable for running unbound as a caching
forwarding resolver, and configures resolvconf(8) to update unbound's
list of forwarders in addition to /etc/resolv.conf.  The initial list
is taken from the existing resolv.conf, which is rewritten to point to
localhost.  Alternatively, a list of forwarders can be provided on the
command line.

To assist this script, add an rc.subr command called "enabled" which
does nothing except return 0 if the service is enabled and 1 if it is
not, without going through the usual checks.  We should consider doing
the same for "status", which is currently pointless.

Add an rc script for unbound, called local_unbound.  If there is no
configuration file, the rc script runs local-unbound-setup to generate
one.

Note that these scripts place the unbound configuration files in
/var/unbound rather than /etc/unbound.  This is necessary so that
unbound can reload its configuration while chrooted.  We should
probably provide symlinks in /etc.

Approved by:	re (blanket)
2013-09-23 04:36:51 +00:00
trociny
8ecfe4666e 1. Properly clean pid files in the case of the error.
2. Write the supervisor pid before the restart loop, so we don't
   uselessly rewrite it after every child restart.
3. Remove duplicate ppfh and pfh initialization.

Approved by:	re (glebius)
MFC after:	2 weeks
2013-09-19 18:00:05 +00:00
grehan
88b044ba33 Implement support for the interrupt-on-terminal-count and
s/w-strobe timer modes. These are commonly used by non-FreeBSD
o/s's.

Approved by:	re@ (blanket)
2013-09-19 04:59:44 +00:00
grehan
9239c077f4 Add simplistic periodic timer support to mevent using kqueue's
timer support. This should be enough for the emulation of
h/w periodic timers (and no more) e.g. some of the 8254's
more esoteric modes that happen to be used by non-FreeBSD o/s's.

Approved by:	re@ (blanket)
2013-09-19 04:48:26 +00:00
grehan
bc1d6700f2 Allow the alarm hours/mins/seconds registers to be read/written,
though without any action. This avoids a hypervisor exit when
o/s's access these regs (Linux).

Reviewed by:	neel
Approved by:	re@ (blanket)
2013-09-19 04:29:03 +00:00
grehan
6f44e4d05b Use correct offset for the high byte of high memory written to
RTC NVRAM.

Submitted by:	Bela Lubkin   bela dot lubkin at tidalscale dot com
Approved by:	re@ (blanket)
2013-09-19 04:20:18 +00:00
trasz
84d8bf623b Fix several problems in the new iSCSI stack; this includes interoperability
fix for LIO (Linux target), removing possibility for the target to avoid mutual
CHAP by choosing to skip authentication altogether, and fixing truncated error
messages in iscsictl(8) output.  This also fixes several of the problems found
with Coverity.

Note that this change requires world rebuild.

Coverity CID:	1088038, 1087998, 1087990, 1088004, 1088044, 1088041, 1088040
Approved by:	re (blanket)
Sponsored by:	FreeBSD Foundation
2013-09-18 21:15:21 +00:00
trasz
2320759748 Make iscsictl(8) automatically try to load the iscsi module. While here,
improve module loading in iscsid(8) and ctld(8).

Approved by:	re (delphij)
2013-09-18 08:37:14 +00:00
grehan
6f381c7c57 Pass the number of supported vectors to pci_emul_add_msicap() and
not the actual PCI BAR number.

Reviewed by:	neel
Approved by:	re@ (blanket)
2013-09-17 18:42:13 +00:00
trasz
8e562bbcba Improve iSCSI address resolution, fixing "InitiatorAddress" handling,
and error reporting.

Approved by:	re (kib)
2013-09-17 14:23:15 +00:00
sbruno
1c5decc9ed Assume that the -f argument is /dev/gpioc0 if it is not passed.
hrs@ provided this verison of the patch and showed me where all the needed
changes were to be made outside of gpioctl.c

Approved by:	re (hrs)
MFC after:	2 weeks
2013-09-17 11:48:47 +00:00
des
0169b48bfa Set NO_WERROR for unbound until I can figure out how to unbreak the
non-clang build.

Approved by:	re (blanket)
2013-09-15 16:27:25 +00:00
des
ea05e625ec Build and install the Unbound caching DNS resolver daemon.
Approved by:	re (blanket)
2013-09-15 14:51:23 +00:00
joel
44f767ab70 Minor mdoc fixes.
Approved by:	re (blanket)
2013-09-14 21:43:18 +00:00
trasz
a992abf041 Bring in the new iSCSI target and initiator.
Reviewed by:	ken (parts)
Approved by:	re (delphij)
Sponsored by:	FreeBSD Foundation
2013-09-14 15:29:06 +00:00
sbruno
0d8bb1bec1 Add gpio(4) man page to attempt to document the current hints based setup of
pin outputs, functions and setup.

Add cross reference in gpioctl(8) for people to find.

This is by no means complete and really only covers gpioled(4) and the
Atheros based systems who expose a few extra hints at boot time.

This should be updated by developers who know more about this system than
I and viewed as the beginning of documentation, not the end.

Reviewed by:	adrian
Approved by:	re (joel)
MFC after:	2 weeks
2013-09-13 19:55:40 +00:00
joel
99904243da mdoc: remove EOL whitespace.
Approved by:	re (blanket)
2013-09-13 19:19:21 +00:00
jmg
1b9dc8f60c add support for writing the pid of the daemon program to a pid file so
that daemon can be used w/ rc.subr and ports can use the additional
functionality, such as keeping the ldap daemon up and running, and have
the proper program to signal to exit..

PR:		bin/181341
Submitted by:	feld
Approved by:	re (glebius)
2013-09-13 16:57:28 +00:00
delphij
457a4abaaa Do not emit size for non-regular files. There is nothing that
mtree(1) can do in this situation and would cause confusion.

MFC candidate.

Approved by:	re (hrs)
2013-09-12 00:14:25 +00:00
bapt
fee688be70 Cleanup elf macros
Only define EF_MIPS_ABI when not already supplied
Remove old now unused ARM macros

Reported by:	imp
Approved by:	re (kib)
2013-09-11 06:42:55 +00:00
bapt
aed5107b91 Add support to detect arm vs armv6
There are two different versions of the ARM ABI depending on the
TARGET_ARCH. As these are sligntly different a package built for
one may not work on another. We need to detect which one we are on
by parsing the .ARM.attributes section.

This will only work on the ARM EABI as this section is part of the
ABI definition. As armv6 only supports the ARM EABI this is not a
problem for the oabi.

Older versions of libelf in FreeBSD fail to read the
.ARM.attributes section needed. As armv6 is unsupported on these
versions we can assume we are running on arm.

Submitted by:	andrew
Approved by:	re (delphij)
Obtained from:	pkgng git
2013-09-10 20:56:01 +00:00
grehan
3d2b366a36 Go way past 11 and bump bhyve's max vCPUs to 16.
This should be sufficient for 10.0 and will do
until forthcoming work to avoid limitations
in this area is complete.

Thanks to Bela Lubkin at tidalscale for the
headsup on the apic/cpu id/io apic ASL parameters
that are actually hex values and broke when
written as decimal when 11 vCPUs were configured.

Approved by:	re@
2013-09-10 03:48:18 +00:00
delphij
504259487c Pass -n (do not emit comments) when saving mtree information for future
mergemaster(8) runs.

MFC after:	3 days
Approved by:	re (kib)
2013-09-09 20:36:28 +00:00
des
7fcc90cb2e Tweak wording. 2013-09-07 20:25:22 +00:00
dteske
3b61a48b9f Remove unnecessary mediaClose (FTP operations are done with either ftp(1)
or fetch(1), neither of which are stateful, compared to how sysinstall(8)
did FTP operations, maintaining an open session until mediaClose).
2013-09-07 03:27:13 +00:00
dteske
ef37ef8335 Long URLs don't always appear even with autosizing and other tricks. So,
add some whitespace to put the URL on a line by itself, maximizing view.
2013-09-07 03:24:22 +00:00
grehan
32d186cf83 Fix spelling. 2013-09-06 05:58:10 +00:00
grehan
b669765ba2 Allow level-triggered interrupt sources. While this isn't
precisely emulated, it is good enough for the single consumer
i.e. irq4, the serial port on Linux.
2013-09-06 05:55:43 +00:00
jilles
4b58ef1a3f watch: Do not mess up the tty modes on early error.
Record the initial state earlier, so it is always safe to restore it.

One way this happens is if watch(8) is started by a user that does not have
access to /dev/snp. The result is "staircase effect" during later commands.

PR:		bin/153052
MFC after:	1 week
2013-09-05 19:02:03 +00:00
pjd
3c622b2c1f Remove fallback to fork(2) if pdfork(2) is not available. If the parent
process dies, the process descriptor will be closed and pdfork(2)ed child
will be killed, which is not the case when regular fork(2) is used.

The PROCDESC option is now part of the GENERIC kernel configuration, so we
can start depending on it.

Add UPDATING entry to inform that this option is now required and log
detailed instruction to syslog if pdfork(2) is not available:

	The pdfork(2) system call is not available; recompile the kernel with options PROCDESC

Submitted by:	Mariusz Zaborski <oshogbo@FreeBSD.org>
Sponsored by:	Google Summer of Code 2013
2013-09-05 01:05:48 +00:00
pjd
029a6f5d92 Change the cap_rights_t type from uint64_t to a structure that we can extend
in the future in a backward compatible (API and ABI) way.

The cap_rights_t represents capability rights. We used to use one bit to
represent one right, but we are running out of spare bits. Currently the new
structure provides place for 114 rights (so 50 more than the previous
cap_rights_t), but it is possible to grow the structure to hold at least 285
rights, although we can make it even larger if 285 rights won't be enough.

The structure definition looks like this:

	struct cap_rights {
		uint64_t	cr_rights[CAP_RIGHTS_VERSION + 2];
	};

The initial CAP_RIGHTS_VERSION is 0.

The top two bits in the first element of the cr_rights[] array contain total
number of elements in the array - 2. This means if those two bits are equal to
0, we have 2 array elements.

The top two bits in all remaining array elements should be 0.
The next five bits in all array elements contain array index. Only one bit is
used and bit position in this five-bits range defines array index. This means
there can be at most five array elements in the future.

To define new right the CAPRIGHT() macro must be used. The macro takes two
arguments - an array index and a bit to set, eg.

	#define	CAP_PDKILL	CAPRIGHT(1, 0x0000000000000800ULL)

We still support aliases that combine few rights, but the rights have to belong
to the same array element, eg:

	#define	CAP_LOOKUP	CAPRIGHT(0, 0x0000000000000400ULL)
	#define	CAP_FCHMOD	CAPRIGHT(0, 0x0000000000002000ULL)

	#define	CAP_FCHMODAT	(CAP_FCHMOD | CAP_LOOKUP)

There is new API to manage the new cap_rights_t structure:

	cap_rights_t *cap_rights_init(cap_rights_t *rights, ...);
	void cap_rights_set(cap_rights_t *rights, ...);
	void cap_rights_clear(cap_rights_t *rights, ...);
	bool cap_rights_is_set(const cap_rights_t *rights, ...);

	bool cap_rights_is_valid(const cap_rights_t *rights);
	void cap_rights_merge(cap_rights_t *dst, const cap_rights_t *src);
	void cap_rights_remove(cap_rights_t *dst, const cap_rights_t *src);
	bool cap_rights_contains(const cap_rights_t *big, const cap_rights_t *little);

Capability rights to the cap_rights_init(), cap_rights_set(),
cap_rights_clear() and cap_rights_is_set() functions are provided by
separating them with commas, eg:

	cap_rights_t rights;

	cap_rights_init(&rights, CAP_READ, CAP_WRITE, CAP_FSTAT);

There is no need to terminate the list of rights, as those functions are
actually macros that take care of the termination, eg:

	#define	cap_rights_set(rights, ...)				\
		__cap_rights_set((rights), __VA_ARGS__, 0ULL)
	void __cap_rights_set(cap_rights_t *rights, ...);

Thanks to using one bit as an array index we can assert in those functions that
there are no two rights belonging to different array elements provided
together. For example this is illegal and will be detected, because CAP_LOOKUP
belongs to element 0 and CAP_PDKILL to element 1:

	cap_rights_init(&rights, CAP_LOOKUP | CAP_PDKILL);

Providing several rights that belongs to the same array's element this way is
correct, but is not advised. It should only be used for aliases definition.

This commit also breaks compatibility with some existing Capsicum system calls,
but I see no other way to do that. This should be fine as Capsicum is still
experimental and this change is not going to 9.x.

Sponsored by:	The FreeBSD Foundation
2013-09-05 00:09:56 +00:00
jlh
97f176654a Include the calling context in the mail subject, if any.
More concretely, periodic security scripts defaults to being
called from daily ones -- daily context -- so the mail subject
will now be "${HOST} daily security run output" instead of
"{HOST} security run output".

If you switch the period of some security checks to weekly, you
will receive another email "${HOST} weekly security run output".
2013-09-03 13:40:24 +00:00
hrs
279128805f Ignore if the interface is not IPv6-capable.
Spotted by:	rpaulo
2013-09-02 20:44:19 +00:00