Commit Graph

6963 Commits

Author SHA1 Message Date
dchagin
a034df74fd MFC r283410:
Put linux_platform into the vdso to avoid copying it onto the stack at
every exec.
2016-01-09 15:48:11 +00:00
dchagin
5c3e282c6e MFC r283408:
Eliminate a now unused global declaration of elf_linux_sysvec.
2016-01-09 15:46:05 +00:00
dchagin
18c1672334 MFC r283407:
Implement vdso - virtual dynamic shared object. Through vdso Linux
exposes functions from kernel with proper DWARF CFI information so that
it becomes easier to unwind through them.
Using vdso is a mandatory for a thread cancelation && cleanup
on a modern glibc.
2016-01-09 15:44:38 +00:00
dchagin
2e9cc3f70d Regen for r293511. 2016-01-09 15:40:44 +00:00
dchagin
4ed27590e5 MFC r283403:
Implement pselect6() system call.
2016-01-09 15:39:41 +00:00
dchagin
4992ef5f9d Regen for r293510. 2016-01-09 15:38:16 +00:00
dchagin
027f6631c0 MFC r283401:
Implement prlimit64() system call.
2016-01-09 15:37:10 +00:00
dchagin
a82405c150 Regen for r293508. 2016-01-09 15:35:57 +00:00
dchagin
b4d7be064f MFC r283399:
Implement dup3() system call.
2016-01-09 15:34:54 +00:00
dchagin
e327d1c9cc Regen for r293505. 2016-01-09 15:32:33 +00:00
dchagin
df59792813 MFC r283396:
Implement rt_sigqueueinfo() system call.
2016-01-09 15:31:15 +00:00
dchagin
4e3ae75e5e Regen for r293503. 2016-01-09 15:29:10 +00:00
dchagin
3c97a00938 MFC r283394:
Implement waitid() system call.
2016-01-09 15:28:05 +00:00
dchagin
a14064e328 MFC r283391:
To reduce code duplication introduce linux_copyout_rusage() method.
Use it in linux_wait4() system call and move linux_wait4() to the MI path.
While here add a prototype for the static bsd_to_linux_rusage().
2016-01-09 15:23:54 +00:00
dchagin
2646cf70a0 MFC r283385:
Some style(9) && whitespaces fixes. No functional changes.
2016-01-09 15:18:36 +00:00
dchagin
cb3b38d164 MFC r283383:
Switch linuxulator to use the native 1:1 threads.

The reasons:
1. Get rid of the stubs/quirks with process dethreading,
   process reparent when the process group leader exits and close
   to this problems on wait(), waitpid(), etc.
2. Reuse our kernel code instead of writing excessive thread
   managment routines in Linuxulator.

Implementation details:

1. The thread is created via kern_thr_new() in the clone() call with
   the CLONE_THREAD parameter. Thus, everything else is a process.
2. The test that the process has a threads is done via P_HADTHREADS
   bit p_flag of struct proc.
3. Per thread emulator state data structure is now located in the
   struct thread and freed in the thread_dtor() hook.
   Mandatory holdig of the p_mtx required when referencing emuldata
   from the other threads.
4. PID mangling has changed. Now Linux pid is the native tid
   and Linux tgid is the native pid, with the exception of the first
   thread in the process where tid and pid are one and the same.

Ugliness:

   In case when the Linux thread is the initial thread in the thread
   group thread id is equal to the process id. Glibc depends on this
   magic (assert in pthread_getattr_np.c). So for system calls that
   take thread id as a parameter we should use the special method
   to reference struct thread.
2016-01-09 15:16:13 +00:00
dchagin
2b83b41438 MFC r283382:
In preparation for switching linuxulator to the use the native 1:1
threads add a hook for cleaning thread resources before the thread die.
2016-01-09 14:53:08 +00:00
dchagin
c12aa632f3 Regen fro r293487. 2016-01-09 14:48:23 +00:00
dchagin
31e61f6749 MFC r283379:
Implement a Linux version of sched_getparam() && sched_setparam().
Temporarily use the first thread in proc.
2016-01-09 14:47:08 +00:00
dchagin
65d490113d MFC r283378:
Remove a now unused include.
2016-01-09 14:45:41 +00:00
dchagin
fd9d33be2a MFC r283374:
In preparation for switching linuxulator to the use the native 1:1
threads refactor kern_sched_rr_get_interval() and sys_sched_rr_get_interval().
Add a kern_sched_rr_get_interval() counterpart which takes a targettd
parameter to allow specify target thread directly by callee (new Linuxulator).

Linuxulator temporarily uses first thread in proc.

Move linux_sched_rr_get_interval() to the MI part.
2016-01-09 14:40:38 +00:00
dchagin
994d3d5889 Regen for r293478. 2016-01-09 14:34:29 +00:00
dchagin
e060fa6fed MFC r283370:
In preparation for switching linuxulator to the use the native 1:1
threads introduce linux_exit() stub instead of sys_exit() call
(which terminates process).
In the new linuxulator exit() system call terminates the calling
thread (not a whole process).
2016-01-09 14:33:10 +00:00
dchagin
358125d39c MFC r283369:
In preparation for switching linuxulator to the use the native 1:1
threads print the thread id in addition to the pid in debug messages.
2016-01-09 14:31:03 +00:00
dim
7295d680ea MFC r277735 (by royger):
amd64: allow base memory segment to start at address different than 0

Current code requires that the first physical memory segment starts at 0,
but this is not really needed. We only need to make sure the bootstrap code
and page tables for APs are allocated below 4GB.

This patch removes this requirement and allows booting a Dell R710 from
UEFI, where the first physical memory segment starts at 0x10000.

Sponsored by: Citrix Systems R&D
Reviewed by: jhb
Differential Revision: https://reviews.freebsd.org/D1417
2015-12-21 17:15:03 +00:00
kib
9167bbc1e5 MFC r291948:
Use ANSI C definition.
2015-12-14 07:54:45 +00:00
jhb
f91ade9d24 MFC 284325:
Report the values of x86 segment registers to remote debuggers.

While here, also report %eflags from the i386 trapframe.
2015-11-13 00:50:34 +00:00
jhb
d10f133720 MFC 285783:
Various changes to the registers displayed in DDB for x86.
- Fix segment registers to only display the low 16 bits.
- Remove unused handlers and entries for the debug registers.
- Display xcr0 (if valid) in 'show sysregs'.
- Add '0x' prefix to MSR values to match other values in 'show sysregs'.
- MFamd64: Display various MSRs in 'show sysregs'.
- Add a 'show dbregs' to display the value of debug registers.
- Dynamically size the column width for register values to properly
  align columns on 64-bit platforms.
- Display %gs for i386 in 'show registers'.
2015-11-12 23:49:47 +00:00
jhb
2285630285 MFC 285773,285775,285776:
Various fixes for stack unwinding in DDB on x86.

285773:
Remove some dead code from DDB's amd64 stack unwinder.

The amd64 port copied some code from i386 to fetch function arguments and
display them in backtraces. However, it was commented out and can't easily
be implemented since the function arguments are passed in
registers rather than on the stack in amd64. Remove it in preparation for
some bug fixes in this area.

285775:
Improve stack unwinding on i386 and amd64 after an IP fault.

If we can't find a symbol corresponding to the faulting instruction, assume
that the previously-executed function is a call and attempt to find the
calling function using the return address on the stack. Otherwise we end
up associating the last stack frame with the current call, which is
incorrect and causes the unwinder to skip printing of the calling function,
resulting in a confusing backtrace.

285776:
Let the unwinder handle faults during function prologues or epilogues.

The i386 and amd64 DDB stack unwinders contain code to detect and handle
the case where the first frame is not completely set up or torn down. This
code was accidentally unused however, since db_backtrace() was never called
with a non-NULL trap frame. This change fixes that.

Also remove get_rsp() from the amd64 code. It appears to have come from
i386, which needs to take into account whether the exception triggered a
CPL switch, since SS:ESP is only pushed onto the stack if so. On amd64,
SS:RSP is pushed regardless, so get_rsp() was doing the wrong thing for
kernel-mode exceptions. As a result, we can also remove custom print
functions for these registers.
2015-11-12 22:45:51 +00:00
kib
70c328a1bb MFC r289824:
Add CLFLUSHOPT instruction wrappers.

MFC r290188:
Fix prefix on i386.
2015-10-30 10:02:57 +00:00
avg
bef317767a MFC r261891: provide fast versions of ffsl and flsl for i386; ffsll and
flsll for amd64
2015-10-23 10:05:43 +00:00
kib
b44e8c1443 MFC r288000:
Add support for weak symbols to the kernel linkers.
2015-09-27 01:33:43 +00:00
rstone
26a0cf375a MFC r280957
Fix integer truncation bug in malloc(9)

  A couple of internal functions used by malloc(9) and uma truncated
  a size_t down to an int.  This could cause any number of issues
  (e.g. indefinite sleeps, memory corruption) if any kernel
  subsystem tried to allocate 2GB or more through malloc.  zfs would
  attempt such an allocation when run on a system with 2TB or more
  of RAM.
2015-09-17 23:31:44 +00:00
marcel
966727ca3b MFC r286808, r286809, r286867, r286868
-   Improve support for Macs that have a stride not equal to the
    horizonal resolution (width).
-   Support frame buffers that are larger than the default screen
    size.
-   Support large frame buffers: add 24 more page table pages we
    allocate on boot-up.

PR:		193745
2015-08-25 15:14:50 +00:00
marcel
87b09c366d MFC r286667 & r286723
Better support memory mapped console devices, such as VGA and EFI
frame buffers and memory mapped UARTs.

PR:		191564, 194952, 202276
2015-08-25 14:39:40 +00:00
kib
70c41a2cb1 MFC r286228:
Clear the IA32_MISC_ENABLE MSR bit on APs.
2015-08-17 18:33:16 +00:00
kib
984b7d731d MFC r285643:
When checking for the valid value of the frame pointer, verify that it
belongs to the kernel stack address range for the thread.
2015-08-07 04:31:02 +00:00
kib
83f30eda37 Implement x86 ptrace(2) requests PT_{GET,SET}{FS,GS}BASE.
MFC r284918:
Add helper fill_based_sd(9).

MFC r284919:
Add x86 PT_GETFSBASE, PT_GETGSBASE machine-depended ptrace requests to
obtain the thread %fs and %gs bases.  Add x86 PT_SETFSBASE and
PT_SETGSBASE requests to set the bases from debuggers.  The set
requests, similarly to the sysarch({I386,AMD64}_SET_FSBASE), override
the corresponding segment registers.

MFC r284965:
Document x86 machine-specific ptrace(2) requests.

MFC r285011:
Disallow a debugger on 64bit system to set fs/gs bases of the 32bit
process beyond the end of the process address space.

MFC r285104:
Grammar and language fixes.
2015-08-05 08:17:10 +00:00
kib
13079235af MFC r284921:
pcb_gs32sd is unused for long time, remove it.  Keep the padding in pcb.
2015-08-05 07:35:34 +00:00
kib
9428730d60 MFC r285041:
Use single instance of the identical INKERNEL() and PMC_IN_KERNEL()
macros on amd64 and i386.  On i386, correct the lowest kernel address.
2015-08-05 07:21:44 +00:00
gjb
ad05cf684a MFC r286131:
Pull pmspcv (pms(4)) from GENERIC.  It has PCI ID conflicts
 with ahd(4), mvs(4), and likely other drivers.

With hat:	re
Sponsored by:	The FreeBSD Foundation
2015-07-31 15:25:07 +00:00
scottl
ab97a72940 Merge driver for PMC Sierra's range of SAS/SATA HBAs.
Submitted by:   Achim Leubner <Achim.Leubner@pmcs.com>
Approved by: re
2015-07-23 05:26:09 +00:00
kib
83cf60b07d MFC r276439 (by alc):
Make the creation of the free lists dynamic, i.e., it is based on the
available physical memory at boot time. For amd64 systems with 64 GB
or more of physical memory, create free lists for managing pages with
physical addresses below 4 GB.

PR:	185727
Requested by:	alc
Approved by:	re (gjb)
2015-07-16 14:41:58 +00:00
neel
b396223254 MFC r284712:
Restore the host's GS.base before returning from 'svm_launch()' so the Dtrace
FBT provider works with vmm.ko on AMD.
2015-07-01 19:46:57 +00:00
neel
79b96fdbcb MFC r282209:
Emulate the 'bit test' instruction.

MFC r282259:
Re-implement RTC current time calculation to eliminate the possibility of
losing time.

MFC r282281:
Advertise the MTRR feature via CPUID and emulate the minimal set of MTRR MSRs.

MFC r282284:
When an instruction cannot be decoded just return to userspace so bhyve(8)
can dump the instruction bytes.

MFC r282287:
Don't require <sys/cpuset.h> to be always included before <machine/vmm.h>.

MFC r282296:
Emulate MSR_SYSCFG which is accessed by Linux on AMD cpus when MTRRs are
enabled.

MFC r282301:
Relax limits when transitioning a vector from the IRR to the ISR and also
when extinguishing it from the ISR in response to an EOI.

MFC r282335:
Advertise an additional memory BAR in the "dummy" device emulation.

MFC r282336:
Emulate machine check related MSRs to allow guest OSes like Windows to boot.

MFC r282351:
Don't advertise the Intel SMX capability to the guest.

MFC r282407:
Emulate the 'CMP r/m8, imm8' instruction.

MFC r282519:
Add macros for AMD-specific bits in MSR_EFER: LMSLE, FFXSR and TCE.

MFC r282520:
Emulate guest writes to EFER_MSR properly.

MFC r282558:
Deprecate the 3-way return values from vm_gla2gpa() and vm_copy_setup().

MFC r282571:
Check 'td_owepreempt' and yield the vcpu thread if it is set.

MFC r282595:
Allow byte reads of AHCI registers.

MFC r282784:
Handling indirect descriptors is a capability of the host and not one that
needs to be negotiated. Use the host capabilities field and not the negotiated
field when verifying that indirect descriptors are supported.

MFC r282788:
Allow configuration of the sector size advertised to the guest.

MFC r282865:
Set the subvendor field in config space to the vendor ID. This is required
by the Windows virtio drivers to correctly match a device.

MFC r282922:
Bump the size of the blockif scatter-gather list to 67.

MFC r283075:
Fix off-by-one in array index bounds check. bhyveload would allow you to
create 33 entries on an array that only has 32 slots

MFC r283168:
Temporarily revert r282922 which bumped the max descriptors.

MFC r283255:
Emulate the "CMP r/m, reg" instruction (opcode 39H).

MFC r283256:
Add an option "--get-vmcs-exit-inst-length" to display the instruction length
of the instruction that caused the VM-exit.

MFC r283264:
Change the header type of the emulated host-bridge from type 1 to type 0.

MFC r283293:
Don't rely on the 'VM-exit instruction length' field in the VMCS to always
have an accurate length on an EPT violation.

MFC r283299:
Remove bogus verification of instruction length after instruction decode.

MFC r283308:
Exceptions don't deliver an error code in real mode.

MFC r283657:
Fix non-deterministic delays when accessing a vcpu that was in "running" or
"sleeping" state.

MFC r283973:
Use tunable 'hw.vmm.svm.features' to disable specific SVM features even
though they might be available in hardware. Use tunable 'hw.vmm.svm.num_asids'
to limit the number of ASIDs used by the hypervisor.

MFC r284046:
Fix regression in 'verify_gla()' with the RIP-relative addressing mode.

MFC r284174:
Support guest writes to the TSC by enabling the "use TSC offsetting"
execution control.
2015-06-28 03:22:26 +00:00
neel
c85aee0195 MFC r279444:
Allow passthrough devices to be hinted.

MFC r279683:
When ICW1 is issued the edge sense circuit is reset which means that
following an initialization a low-to-high transistion is necesary to
generate an interrupt.

MFC r279925:
Add -p parameter to list PCI device to pass through to the guest.

MFC r281559:
Fix handling of BUS_PROBE_NOWILDCARD in 'device_probe_child()'.

MFC r280447:
When fetching an instruction in non-64bit mode, consider the value of the
code segment base address.

MFC r280725:
Move legacy interrupt allocation for virtio devices to common code.

MFC r280775:
Fix the RTC device model to operate correctly in 12-hour mode.

MFC r280929:
Fix "MOVS" instruction memory to MMIO emulation.

MFC r280968:
Display instruction bytes and %rip prior to aborting due to an instruction
emulation error.

MFC r281145:
Enhance the support for Group 1 Extended opcodes for CMP, AND, OR instructions.

MFC r281542:
Initialize 'error' before use (Coverity IDs 1249748, 1249747, 1249751, 1249749)

MFC r281561:
Prior to aborting due to an ioport error, it is always interesting to see what
the guest's %rip is.

MFC r281611:
If the number of guest vcpus is less than '1' then flag it as an error.

MFC r281612:
Prefer 'vcpu_should_yield()' over checking 'curthread->td_flags' directly.

MFC r281630:
Relax the check on which vectors can be delivered through the APIC. According
to the Intel SDM vectors 16 through 255 are allowed to be delivered via the
local APIC.

MFC r281879:
Missing break in switch case (Coverity ID 1292499)

MFC r281946:
Don't allow guest to modify readonly bits in the PCI config 'status' register.

MFC r281987:
STOS/STOSB/STOSW/STOSD/STOSQ instruction emulation.

MFC r282206:
Implement the century byte in the RTC.
2015-06-28 01:21:55 +00:00
neel
115742fae3 MFC r276428:
Replace bhyve's minimal RTC emulation with a fully featured one in vmm.ko.

MFC r276432:
Initialize all fields of 'struct vm_exception exception' before passing it
to vm_inject_exception().

MFC r276763:
Clear blocking due to STI or MOV SS in the hypervisor when an instruction is
emulated or when the vcpu incurs an exception.

MFC r277149:
Clean up usage of 'struct vm_exception' to only to communicate information
from userspace to vmm.ko when injecting an exception.

MFC r277168:
Fix typo (missing comma).

MFC r277309:
Make the error message explicit instead of just printing the usage if the
virtual machine name is not specified.

MFC r277310:
Simplify instruction restart logic in bhyve.

MFC r277359:
Fix a bug in libvmmapi 'vm_copy_setup()' where it would return success even
if the 'gpa' was in the guest MMIO region.

MFC r277360:
MOVS instruction emulation.

MFC r277626:
Add macro to identify AVIC capability (advanced virtual interrupt controller)
in AMD processors.

MFC r279220:
Don't close a block context if it couldn't be opened avoiding a null deref.

MFC r279225:
Add "-u" option to bhyve(8) to indicate that the RTC should maintain UTC time.

MFC r279227:
Emulate MSR 0xC0011024 when running on AMD processors.

MFC r279228:
Always emulate MSR_PAT on Intel processors and don't rely on PAT save/restore
capability of VT-x. This lets bhyve run nested in older VMware versions that
don't support the PAT save/restore capability.

MFC r279540:
Fix warnings/errors when building vmm.ko with gcc.
2015-06-27 22:48:22 +00:00
kib
ae7c0e0461 Revert part of the r283303 (by jhb):
Revert MFC of r270223, which bumped MAXCPU on amd64 from 64 to 256.
The cpuset_getaffinity(2) and cpuset_setaffinity(2) check minimum set
size, which now fails for binaries compiled on 10.0 with MAXCPU == 64.

Submitted by:	jhb
PR:	  200802
2015-06-23 06:30:36 +00:00
trasz
e1055c772b MFC r282213:
Add kern.racct.enable tunable and RACCT_DISABLED config option.
The point of this is to be able to add RACCT (with RACCT_DISABLED)
to GENERIC, to avoid having to rebuild the kernel to use rctl(8).

MFC r282901:

Build GENERIC with RACCT/RCTL support by default.  Note that it still
needs to be enabled by adding "kern.racct.enable=1" to /boot/loader.conf.

Note those two are MFC-ed together, because the latter one changes the
name of RACCT_DISABLED option to RACCT_DEFAULT_TO_DISABLED.  Should have
committed the renaming separately...

Relnotes:	yes
Sponsored by:	The FreeBSD Foundation
2015-06-21 06:28:26 +00:00
kib
f014bfc33c MFC r284104:
Updates from SDM rev. 55.
2015-06-13 07:31:50 +00:00