Commit Graph

7173 Commits

Author SHA1 Message Date
Mark Johnston
a5cbf8b9c0 Let the unwinder handle faults during function prologues or epilogues.
The i386 and amd64 DDB stack unwinders contain code to detect and handle
the case where the first frame is not completely set up or torn down. This
code was accidentally unused however, since db_backtrace() was never called
with a non-NULL trap frame. This change fixes that.

Also remove get_rsp() from the amd64 code. It appears to have come from
i386, which needs to take into account whether the exception triggered a
CPL switch, since SS:ESP is only pushed onto the stack if so. On amd64,
SS:RSP is pushed regardless, so get_rsp() was doing the wrong thing for
kernel-mode exceptions. As a result, we can also remove custom print
functions for these registers.

Reviewed by:	jhb
Sponsored by:	EMC / Isilon Storage Division
Differential Revision:	https://reviews.freebsd.org/D2881
2015-07-21 23:22:23 +00:00
Mark Johnston
f8a757d016 Improve stack unwinding on i386 and amd64 after an IP fault.
If we can't find a symbol corresponding to the faulting instruction, assume
that the previously-executed function is a call and attempt to find the
calling function using the return address on the stack. Otherwise we end
up associating the last stack frame with the current call, which is
incorrect and causes the unwinder to skip printing of the calling function,
resulting in a confusing backtrace.

Reviewed by:	jhb
Sponsored by:	EMC / Isilon Storage Division
Differential Revision:	https://reviews.freebsd.org/D2859
2015-07-21 23:13:11 +00:00
Mark Johnston
1a5bee0849 Remove some dead code from DDB's amd64 stack unwinder.
The amd64 port copied some code from i386 to fetch function arguments and
display them in backtraces. However, it was commented out and can't easily
be implemented since the function arguments are passed in
registers rather than on the stack in amd64. Remove it in preparation for
some bug fixes in this area.

Reviewed by:	jhb
Sponsored by:	EMC / Isilon Storage Division
Differential Revision:	https://reviews.freebsd.org/D2857
2015-07-21 23:03:21 +00:00
Ed Schouten
d0da90b198 Describe COMPAT_CLOUDABI64 in the amd64 configuration NOTES file. 2015-07-21 12:53:47 +00:00
Ed Schouten
21d30b29d5 Make thread creation work for CloudABI processes.
Summary:
Remove the stub system call that was put in place during the system call
import and replace it by a target-dependent version stored in sys/amd64.
Initialize the thread in a way similar to cpu_set_upcall_kse(). We
provide the entry point with two arguments: the thread ID and the
argument pointer.

Test Plan:
Thread creation still seems to work, both for FreeBSD and CloudABI
binaries.

Reviewers: dchagin, mjg, kib

Reviewed By: kib

Subscribers: imp

Differential Revision: https://reviews.freebsd.org/D3110
2015-07-21 12:47:15 +00:00
Ed Schouten
62c31cffae Make forking of CloudABI processes work.
Just like FreeBSD+Capsicum, CloudABI uses process descriptors. Return
the file descriptor number to the parent process.

To the child process we both return a special value for the file
descriptor number (CLOUDABI_PROCESS_CHILD). We also return the thread ID
of the new thread in the copied process, so the threading library can
reinitialize itself.

Obtained from:	https://github.com/NuxiNL/freebsd
2015-07-20 13:46:22 +00:00
Mark Johnston
32cd0147fa Implement the lockstat provider using SDT(9) instead of the custom provider
in lockstat.ko. This means that lockstat probes now have typed arguments and
will utilize SDT probe hot-patching support when it arrives.

Reviewed by:	gnn
Differential Revision:	https://reviews.freebsd.org/D2993
2015-07-19 22:14:09 +00:00
Benno Rice
eacbeb2b95 Merge driver for PMC Sierra's range of SAS/SATA HBAs.
Submitted by:	Achim Leubner <Achim.Leubner@pmcs.com>
Reviewed by:	scottl
2015-07-17 23:30:43 +00:00
Konstantin Belousov
888e282ab4 When checking for the valid value of the frame pointer, verify that it
belongs to the kernel stack address range for the thread.  Right now,
code checks that new frame is not farther then KSTACK_PAGES pages from
the current frame, which allows the address to point past the top of
the stack.

Reviewed by:	andrew, emaste, markj
Differential revision:	https://reviews.freebsd.org/D3108
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-07-16 19:40:18 +00:00
Ed Schouten
6e5fcd99df Add a sysentvec for CloudABI on x86-64.
Summary:
For CloudABI we need to put two things on the stack of new processes:
the argument data (a binary blob; not strings) and a startup data
structure. The startup data structure contains interesting things such
as a pointer to the ELF program header, the thread ID of the initial
thread, a stack smashing protection canary, and a pointer to the
argument data.

Fetching system call arguments and setting the return value is similar
to FreeBSD. The only differences are that system call 0 does not exist
and that we call into cloudabi_convert_errno() to convert the error
code. We also need this function in a couple of other places, so we'd
better reuse it here.

Reviewers: dchagin, kib

Reviewed By: kib

Subscribers: imp

Differential Revision: https://reviews.freebsd.org/D3098
2015-07-16 18:24:06 +00:00
Patrick Kelsey
2ec930efea Revert inadvertent change to amd64/GENERIC. 2015-07-15 01:04:54 +00:00
Patrick Kelsey
8aa7fdbd78 Add netmap support for ixgbe SRIOV VFs (that is, to if_ixv).
Differential Revision: https://reviews.freebsd.org/D2923
Reviewed by: erj, gnn
Approved by: jmallett (mentor)
Sponsored by: Norse Corp, Inc.
2015-07-15 01:02:01 +00:00
Christian Brueffer
f4c1eac7cd Spell crypto correctly. 2015-07-14 10:47:56 +00:00
John-Mark Gurney
e808e13b8b Now that aesni won't reuse fpu contexts (D3016), add seatbelts to the
fpu code to prevent other reuse of the contexts in the future...

Differential Revision:        https://reviews.freebsd.org/D3015
Reviewed by:	kib, gnn
2015-07-08 19:26:36 +00:00
Konstantin Belousov
8954a9a4e6 Add the atomic_thread_fence() family of functions with intent to
provide a semantic defined by the C11 fences with corresponding
memory_order.

atomic_thread_fence_acq() gives r | r, w, where r and w are read and
write accesses, and | denotes the fence itself.

atomic_thread_fence_rel() is r, w | w.

atomic_thread_fence_acq_rel() is the combination of the acquire and
release in single operation.  Note that reads after the acq+rel fence
could be made visible before writes preceeding the fence.

atomic_thread_fence_seq_cst() orders all accesses before/after the
fence, and the fence itself is globally ordered against other
sequentially consistent atomic operations.

Reviewed by:	alc
Discussed with:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
2015-07-08 18:12:24 +00:00
Achim Leubner
4e1bc9a039 Driver 'pmspcv' added. Supports PMC-Sierra PM8001/8081/8088/8089/8074/8076/8077 SAS/SATA HBA Controllers. 2015-07-07 13:17:02 +00:00
Neel Natu
5e4f29c037 Move the 'devmem' device nodes from /dev/vmm to /dev/vmm.io
Some external tools just do a 'ls /dev/vmm' to figure out the bhyve virtual
machines on the host. These tools break if the devmem device nodes also
appear in /dev/vmm.

Requested by:	grehan
2015-07-06 19:41:43 +00:00
George V. Neville-Neil
3839369c03 Enable IPSEC in all GENERIC kernels.
Universe and kernel build tests passed 4 July 2015

PR:		128030
Sponsored by:	Rubicon Communications (Netgate)
2015-07-04 17:37:00 +00:00
Konstantin Belousov
6fdfd88220 Use single instance of the identical INKERNEL() and PMC_IN_KERNEL()
macros on amd64 and i386.  Move the definition to machine/param.h.
kgdb defines INKERNEL() too, the conflict is resolved by renaming kgdb
version to PINKERNEL().

On i386, correct the lowest kernel address.  After the shared page was
introduced, USRSTACK no longer points to the last user address + 1 [*]

Submitted by:	Oliver Pinter [*]
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-07-02 14:37:21 +00:00
Konstantin Belousov
3ce8c94f29 Disallow a debugger on 64bit system to set fs/gs bases of the 32bit
process beyond the end of the process address space.  Such setting is
not dangerous to the kernel integrity, but it causes confusing
application misbehaviour.

Sponsored by:	The FreeBSD Foundation
MFC after:	12 days
2015-07-01 16:37:03 +00:00
Konstantin Belousov
3ac3c0f269 Add a comment about too strong semantic of atomic_load_acq() on x86.
Submitted by:	bde
MFC after:	2 weeks
2015-06-29 09:58:40 +00:00
Konstantin Belousov
d9008978c8 pcb_gs32sd is unused for long time, remove it. Keep the padding in pcb.
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-06-29 07:53:44 +00:00
Konstantin Belousov
1817023775 Add x86 PT_GETFSBASE, PT_GETGSBASE machine-depended ptrace requests to
obtain the thread %fs and %gs bases.  Add x86 PT_SETFSBASE and
PT_SETGSBASE requests to set the bases from debuggers.  The set
requests, similarly to the sysarch({I386,AMD64}_SET_FSBASE),
override the corresponding segment registers.

The main purpose of the operations is to retrieve and modify the tcb
address for debuggee.

Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-06-29 07:07:24 +00:00
Konstantin Belousov
7626d062c3 Remove unneeded data dependency, currently imposed by
atomic_load_acq(9), on it source, for x86.

Right now, atomic_load_acq() on x86 is sequentially consistent with
other atomics, code ensures this by doing store/load barrier by
performing locked nop on the source.  Provide separate primitive
__storeload_barrier(), which is implemented as the locked nop done on
a cpu-private variable, and put __storeload_barrier() before load, to
keep seq_cst semantic but avoid introducing false dependency on the
no-modification of the source for its later use.

Note that seq_cst property of x86 atomic_load_acq() is not documented
and not carried by atomics implementations on other architectures,
although some kernel code relies on the behaviour.  This commit does
not intend to change this.

Reviewed by:	alc
Discussed with:	bde
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-06-28 05:04:08 +00:00
Tycho Nightingale
ea587cd825 verify_gla() needs to account for non-zero segment base addresses.
Reviewed by:	neel
2015-06-26 18:00:29 +00:00
Roger Pau Monné
7e748038cd amd64: set the correct LMA values
The current linker script generates program headers with VMA == LMA:

Entry point 0xffffffff802e7000
There are 6 program headers, starting at offset 64

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  PHDR           0x0000000000000040 0xffffffff80200040 0xffffffff80200040
                 0x0000000000000150 0x0000000000000150  R E    8
  INTERP         0x0000000000000190 0xffffffff80200190 0xffffffff80200190
                 0x000000000000000d 0x000000000000000d  R      1
      [Requesting program interpreter: /red/herring]
  LOAD           0x0000000000000000 0xffffffff80200000 0xffffffff80200000
                 0x00000000010559b0 0x00000000010559b0  R E    200000
  LOAD           0x0000000001056000 0xffffffff81456000 0xffffffff81456000
                 0x0000000000132638 0x000000000052ecf8  RW     200000
  DYNAMIC        0x0000000001056000 0xffffffff81456000 0xffffffff81456000
                 0x00000000000000d0 0x00000000000000d0  RW     8
  GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000000000 0x0000000000000000  RWE    8

This is fine for the FreeBSD loader, because it completely ignores p_paddr
and instead uses p_vaddr with a hardcoded offset. Other loaders however
acknowledge p_paddr (like the Xen ELF loader), in which case they will try
to load the kernel at the wrong place. Fix this by adding an AT keyword to
the first section specifying the physical address, other sections will
follow suit, so it ends up looking like:

Entry point 0xffffffff802e7000
There are 6 program headers, starting at offset 64

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  PHDR           0x0000000000000040 0xffffffff80200040 0x0000000000200040
                 0x0000000000000150 0x0000000000000150  R E    8
  INTERP         0x0000000000000190 0xffffffff80200190 0x0000000000200190
                 0x000000000000000d 0x000000000000000d  R      1
      [Requesting program interpreter: /red/herring]
  LOAD           0x0000000000000000 0xffffffff80200000 0x0000000000200000
                 0x00000000010559b0 0x00000000010559b0  R E    200000
  LOAD           0x0000000001056000 0xffffffff81456000 0x0000000001456000
                 0x0000000000132638 0x000000000052ecf8  RW     200000
  DYNAMIC        0x0000000001056000 0xffffffff81456000 0x0000000001456000
                 0x00000000000000d0 0x00000000000000d0  RW     8
  GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000000000 0x0000000000000000  RWE    8

Tested on bare metal using the native FreeBSD loader and grub2 from TRUEOS.

Sponsored by: Citrix Systems R&D
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D2783
2015-06-26 07:12:17 +00:00
Neel Natu
90e528f838 Restore the host's GS.base before returning from 'svm_launch()'.
Previously this was done by the caller of 'svm_launch()' after it returned.
This works fine as long as no code is executed in the interim that depends
on pcpu data.

The dtrace probe 'fbt:vmm:svm_launch:return' broke this assumption because
it calls 'dtrace_probe()' which in turn relies on pcpu data.

Reported by:	avg
MFC after:	1 week
2015-06-23 02:17:23 +00:00
Neel Natu
9b1aa8d622 Restructure memory allocation in bhyve to support "devmem".
devmem is used to represent MMIO devices like the boot ROM or a VESA framebuffer
where doing a trap-and-emulate for every access is impractical. devmem is a
hybrid of system memory (sysmem) and emulated device models.

devmem is mapped in the guest address space via nested page tables similar
to sysmem. However the address range where devmem is mapped may be changed
by the guest at runtime (e.g. by reprogramming a PCI BAR). Also devmem is
usually mapped RO or RW as compared to RWX mappings for sysmem.

Each devmem segment is named (e.g. "bootrom") and this name is used to
create a device node for the devmem segment (e.g. /dev/vmm/testvm.bootrom).
The device node supports mmap(2) and this decouples the host mapping of
devmem from its mapping in the guest address space (which can change).

Reviewed by:	tychon
Discussed with:	grehan
Differential Revision:	https://reviews.freebsd.org/D2762
MFC after:	4 weeks
2015-06-18 06:00:17 +00:00
John Baldwin
5eb95e11ba Report the values of x86 segment registers to remote debuggers.
While here, also report %eflags from the i386 trapframe.

Differential Revision:	https://reviews.freebsd.org/D2743
Reviewed by:	kib
Obtained from:	1 month
2015-06-12 15:14:08 +00:00
Ruslan Bukin
4f4d15f0d0 Allow DTrace to be compiled-in to the kernel.
This will require for AArch64 as we dont have modules yet.

Sponsored by:	HEIF5
Sponsored by:	ARM Ltd.
Differential Revision:	https://reviews.freebsd.org/D1997
2015-06-10 15:53:39 +00:00
Mateusz Guzik
21de5aea6c Fixup the build after r284215.
Submitted by:	Ivan Klymenko <fidaj ukr.net> [slighly modified]
2015-06-10 12:39:01 +00:00
Mateusz Guzik
f6f6d24062 Implement lockless resource limits.
Use the same scheme implemented to manage credentials.

Code needing to look at process's credentials (as opposed to thred's) is
provided with *_proc variants of relevant functions.

Places which possibly had to take the proc lock anyway still use the proc
pointer to access limits.
2015-06-10 10:48:12 +00:00
Mateusz Guzik
4ea6a9a28f Generalised support for copy-on-write structures shared by threads.
Thread credentials are maintained as follows: each thread has a pointer to
creds and a reference on them. The pointer is compared with proc's creds on
userspace<->kernel boundary and updated if needed.

This patch introduces a counter which can be compared instead, so that more
structures can use this scheme without adding more comparisons on the boundary.
2015-06-10 10:43:59 +00:00
Alan Cox
65a9768f62 Account for superpage mappings that are created by pmap_copy(). 2015-06-09 18:04:28 +00:00
Tycho Nightingale
277bdd9950 Support guest writes to the TSC by enabling the "use TSC offsetting"
execution control and writing the difference between the host TSC and
the guest TSC into the TSC offset in the VMCS upon encountering a
write.

Reviewed by:	neel
2015-06-09 00:14:47 +00:00
Dmitry Chagin
c2bc5b15eb Futex is an aligned 32-bit integer. Use the proper instruction and
operand when dereferencing futex pointer.
2015-06-08 17:39:25 +00:00
Alan Cox
966272ca33 Retire VM_FREEPOOL_CACHE as the next step in eliminating PG_CACHE pages.
Differential Revision:	https://reviews.freebsd.org/D2712
Reviewed by:	kib
Sponsored by:	EMC / Isilon Storage Division
2015-06-08 04:59:32 +00:00
Konstantin Belousov
32a1e9e4a5 Update print_INTEL_TLB() by the tag values from the Intel SDM
rev. 55.  The modern CPUs cache and TLB descriptions looked quite
questionable without the update, e.g. Haswell i7 4770S reported:
	Data TLB: 4 KB pages, 4-way set associative, 64 entries
	L2 cache: 256 kbytes, 8-way associative, 64 bytes/line
After the update, the report is:
	Data TLB: 1 GByte pages, 4-way set associative, 4 entries
	Data TLB: 4 KB pages, 4-way set associative, 64 entries
	Instruction TLB: 2M/4M pages, fully associative, 8 entries
	Instruction TLB: 4KByte pages, 8-way set associative, 64 entries
	64-Byte prefetching
	Shared 2nd-Level TLB: 4 KByte/2MByte pages, 8-way associative, 1024 entries
	L2 cache: 256 kbytes, 8-way associative, 64 bytes/line
Some tags were apparently removed from the table 3-21, Vol. 2A.  Keep
them around, but add a comment stating the removal.

Update the format line for cpu_stdext_feature according to the bits
from the SDM rev.55.  It appears that Haswells do not store %cs and
%ds values in the FPU save area.

Store content of the %ecx register from the CPUID leaf 0x7
subleaf 0 as cpu_stdext_feature2 and print defined bits from it,
again acording to SDM rev. 55.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-06-06 22:03:24 +00:00
Neel Natu
647c87825c The 'verify_gla()' function is used to ensure that the effective address
after decoding the instruction matches the one provided by hardware.

Prior to r283293 'vie->num_valid' used to contain the actual length of
the instruction whereas now it contains the maximum instruction length
possible. This introduced a bug when calculating a RIP-relative base address.

Fix this by using 'vie->num_processed' rather than 'vie->num_valid' as the
length of the emulated instruction.

Reported and tested by:	tychon
MFC after:	1 week
2015-06-05 21:22:26 +00:00
Neel Natu
b14bd6ac9d Use tunable 'hw.vmm.svm.features' to disable specific SVM features even
though they might be available in hardware.

Use tunable 'hw.vmm.svm.num_asids' to limit the number of ASIDs used by
the hypervisor.

MFC after:	1 week
2015-06-04 02:12:23 +00:00
Dimitry Andric
38954a1d1c Remove unneeded NULL checks in amd64's trap_fatal().
Since td_name is an array member of struct thread, it can never be NULL,
so the check can be removed.  In addition, curproc can never be NULL,
so remove the if statement, and splice the two printfs() together.

While here, remove the u_long cast, and use the correct printf format
specifier curproc->p_pid.

Reviewed by:	kib
MFC after:	1 week
Differential Revision: https://reviews.freebsd.org/D2695
2015-06-01 06:50:39 +00:00
Konstantin Belousov
69baeadc31 Remove several write-only variables, all reported by the gcc 4.9
buildkernel run.

Some of them were write-only under some kernel options, e.g. variables
keeping values only used by CTR() macros.  It costs nothing to the
code readability and correctness to eliminate the warnings in those
cases too by removing the local cached values used only for
single-access.

Review:	https://reviews.freebsd.org/D2665
Reviewed by:	rodrigc
Looked at by:	bjk
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-05-29 13:24:17 +00:00
Neel Natu
248e6799e9 Fix non-deterministic delays when accessing a vcpu that was in "running" or
"sleeping" state. This is done by forcing the vcpu to transition to "idle"
by returning to userspace with an exit code of VM_EXITCODE_REQIDLE.

MFC after:      2 weeks
2015-05-28 17:37:01 +00:00
Konstantin Belousov
1c5f423183 Enabled rewritten PCID support by default.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 month
2015-05-27 09:50:18 +00:00
Dmitry Chagin
d707582f83 When I merged the lemul branch I missied kib@'s r282708 commit.
This is not the final fix as I need properly cleanup thread resources
before other threads suicide.

Tested by:	Ruslan Makhmatkhanov
2015-05-25 20:44:46 +00:00
Dmitry Chagin
5c5aac2d45 Regen for r283492. 2015-05-24 18:09:01 +00:00
Dmitry Chagin
9802eb9ebc Implement Linux specific syncfs() system call. 2015-05-24 18:08:01 +00:00
Dmitry Chagin
c532a88cfc Regen for r283488. 2015-05-24 18:05:21 +00:00
Dmitry Chagin
e1ff74c0f7 Implement recvmmsg() and sendmmsg() system calls. 2015-05-24 18:04:04 +00:00
Dmitry Chagin
b7aaa9fdb0 Reduce duplication between MD Linux code by moving msg related
struct definitions out into the compat/linux/linux_socket.h
2015-05-24 18:03:14 +00:00
Dmitry Chagin
3ce05165b1 Regen for r283484. 2015-05-24 18:02:17 +00:00
Dmitry Chagin
6e4c8004dc Implement epoll_pwait() system call. 2015-05-24 18:00:14 +00:00
Dmitry Chagin
ca045164a7 Regen for r283480. 2015-05-24 17:58:24 +00:00
Dmitry Chagin
19d8b461f4 Add utimensat() system call.
The patch developed by Jilles Tjoelker and Andrew Wilcox and
adopted for lemul branch by me.
2015-05-24 17:57:07 +00:00
Dmitry Chagin
0c38abc250 The kernel sends signals to the processes via ABI specific sv_sendsig method.
Native ABI do not need signal conversion, only emulators may want this. Usually
emulators implements its own sv_sendsig method. For now only ibcs2 emulator does
not have own sv_sendsig implementation and depends on native sendsig() method.
So, remove any extra attempts to convert signal numbers from native sendsig()
methods except from i386 where ibsc2 is living.
2015-05-24 17:56:02 +00:00
Dmitry Chagin
4ab7403bbd Rework signal code to allow using it by other modules, like linprocfs:
1. Linux sigset always 64 bit on all platforms. In order to move Linux
sigset code to the linux_common module define it as 64 bit int. Move
Linux sigset manipulation routines to the MI path.

2. Move Linux signal number definitions to the MI path. In general, they
are the same on all platforms except for a few signals.

3. Map Linux RT signals to the FreeBSD RT signals and hide signal conversion
tables to avoid conversion errors.

4. Emulate Linux SIGPWR signal via FreeBSD SIGRTMIN signal which is outside
of allowed on Linux signal numbers.

PR:		197216
2015-05-24 17:47:20 +00:00
Dmitry Chagin
a7ac457613 According to Linux man sigaltstack(3) shall return EINVAL if the ss
argument is not a null pointer, and the ss_flags member pointed to by ss
contains flags other than SS_DISABLE. However, in fact, Linux also
allows SS_ONSTACK flag which is simply ignored.

For buggy apps (at least mono) ignore other than SS_DISABLE
flags as a Linux do.

While here move MI part of sigaltstack code to the appropriate place.

Reported by:	abi at abinet dot ru
2015-05-24 17:44:08 +00:00
Dmitry Chagin
657100de57 Regen for r283467. 2015-05-24 17:39:18 +00:00
Dmitry Chagin
fcdffc03f8 Call nosys in case when the incorrect syscall number is specified.
Reported by:	trinity
2015-05-24 17:38:02 +00:00
Dmitry Chagin
8d939ad405 Regen for r283465. 2015-05-24 17:35:42 +00:00
Dmitry Chagin
b6aeb7d5dd Add preliminary fallocate system call implementation
to emulate posix_fallocate() function.

Differential Revision:	https://reviews.freebsd.org/D1523
Reviewed by:	emaste
2015-05-24 17:33:21 +00:00
Dmitry Chagin
274d2df2e5 Regen for r283451. 2015-05-24 17:00:43 +00:00
Dmitry Chagin
a6b40812ec Implement ppoll() system call.
Differential Revision:	https://reviews.freebsd.org/D1105
Reviewed by:	trasz
2015-05-24 16:59:25 +00:00
Dmitry Chagin
e8b026b37e Include opt_compat.h, so that COMPAT_LINUX32 is defined, and we can
access to the semop structs and functions.

Submitted by:	cognet@

Differential Revision:	https://reviews.freebsd.org/D1095
Reviewed by:	trasz
2015-05-24 16:51:04 +00:00
Dmitry Chagin
22f3dfdc12 Regen for r283444. 2015-05-24 16:50:17 +00:00
Dmitry Chagin
a31d76867d Implement eventfd system call.
Differential Revision:	https://reviews.freebsd.org/D1094
In collaboration with:	Jilles Tjoelker
2015-05-24 16:49:14 +00:00
Dmitry Chagin
3e89b64168 Put the correct value for the abi_nfdbits parameter of kern_select() for
all supported Linuxulators.

Differential Revision:	https://reviews.freebsd.org/D1093
Reviewed by:	trasz
2015-05-24 16:47:13 +00:00
Dmitry Chagin
28fb55359b Regen for r283441. 2015-05-24 16:42:49 +00:00
Dmitry Chagin
e16fe1c730 Implement epoll family system calls. This is a tiny wrapper
around kqueue() to implement epoll subset of functionality.
The kqueue user data are 32bit on i386 which is not enough for
epoll user data, so we keep user data in the proc emuldata.

Initial patch developed by rdivacky@ in 2007, then extended
by Yuri Victorovich @ r255672 and finished by me
in collaboration with mjg@ and jillies@.

Differential Revision:	https://reviews.freebsd.org/D1092
2015-05-24 16:41:39 +00:00
Dmitry Chagin
4d0f380d87 To avoid code duplication move open/fcntl definitions to the MI
header file.

Differential Revision:	https://reviews.freebsd.org/D1087
Reviewed by:	trasz
2015-05-24 16:31:44 +00:00
Dmitry Chagin
26c68e1fe5 Use the BSD_TO_LINUX_SIGNAL() wherever there is no need
to check the ABI as it is known.

Differential Revision:	https://reviews.freebsd.org/D1086
2015-05-24 16:30:23 +00:00
Dmitry Chagin
437c43c1cb Being exported through vdso the note.Linux section used by glibc
to determine the kernel version (this saves one uname call).
Temporarily disable the export of a note.Linux section until I figured
out how to change the kernel version in the note.Linux on the fly.

Differential Revision:	https://reviews.freebsd.org/D1081
Reviewed by:	trasz
2015-05-24 16:25:44 +00:00
Dmitry Chagin
4048f59cd0 Add AT_RANDOM and AT_EXECFN auxiliary vector entries which are used by
glibc. At list since glibc version 2.16 using AT_RANDOM is mandatory.

Differential Revision:	https://reviews.freebsd.org/D1080
2015-05-24 16:24:24 +00:00
Dmitry Chagin
0a1884d768 Regen for r283428. 2015-05-24 16:19:57 +00:00
Dmitry Chagin
baa232bbfd Change linux faccessat syscall definition to match actual linux one.
The AT_EACCESS and AT_SYMLINK_NOFOLLOW flags are actually implemented
within the glibc wrapper function for faccessat().  If either of these
flags are specified, then the wrapper function employs fstatat() to
determine access permissions.

Differential Revision:	https://reviews.freebsd.org/D1078
Reviewed by:	trasz
2015-05-24 16:18:03 +00:00
Dmitry Chagin
523be40fe4 Regen for r283424. 2015-05-24 16:11:21 +00:00
Dmitry Chagin
b2f587918d Add preliminary support for x86-64 Linux binaries.
Differential Revision:	https://reviews.freebsd.org/D1076
2015-05-24 16:07:11 +00:00
Dmitry Chagin
bc27367760 Refund the proc emuldata struct for future use. For now move flags from
thread emuldata to proc emuldata as it was originally intended.

As we can have both 64 & 32 bit Linuxulator running any eventhandler
can be called twice for us. To prevent this move eventhandlers code
from linux_emul.c to the linux_common.ko module.

Differential Revision:	https://reviews.freebsd.org/D1073
2015-05-24 15:54:58 +00:00
Dmitry Chagin
67d3974849 Introduce a new module linux_common.ko which is intended for the
following primary purposes:

1. Remove the dependency of linsysfs and linprocfs modules from linux.ko,
which will be architecture specific on amd64.

2. Incorporate into linux_common.ko general code for platforms on which
we'll support two Linuxulator modules (for both instruction set - 32 & 64 bit).

3. Move malloc(9) declaration to linux_common.ko, to enable getting memory
usage statistics properly.

Currently linux_common.ko incorporates a code from linux_mib.c and linux_util.c
and linprocfs, linsysfs and linux kernel modules depend on linux_common.ko.

Temporarily remove dtrace garbage from linux_mib.c and linux_util.c

Differential Revision:	https://reviews.freebsd.org/D1072
In collaboration with:	Vassilis Laganakos.

Reviewed by:	trasz
2015-05-24 15:51:18 +00:00
Dmitry Chagin
31eb438886 x86_64 Linux do not use multiplexing on ipc system calls.
Move struct ipc_perm definition to the MD path as it differs for 64 and
32 bit platform.

Differential Revision:	https://reviews.freebsd.org/D1068
Reviewed by:	trasz
2015-05-24 15:44:41 +00:00
Dmitry Chagin
26cf41d6ca Remove stale comment about a signal trampoline which
is moved to the shared page at r219609.

Differential Revision:	https://reviews.freebsd.org/D1063
Reviewed by:	trasz
2015-05-24 15:32:52 +00:00
Dmitry Chagin
0020bdf13a Put linux_platform into the vdso to avoid copying it onto the stack at
every exec.

Differential Revision:	https://reviews.freebsd.org/D1062
Reviewed by:	trasz
2015-05-24 15:30:52 +00:00
Dmitry Chagin
32084836c0 Eliminate a now unused global declaration of elf_linux_sysvec.
Differential Revision:	https://reviews.freebsd.org/D1061
Reviewed by:	trasz
2015-05-24 15:29:20 +00:00
Dmitry Chagin
bdc379344a Implement vdso - virtual dynamic shared object. Through vdso Linux
exposes functions from kernel with proper DWARF CFI information so that
it becomes easier to unwind through them.
Using vdso is a mandatory for a thread cancelation && cleanup
on a modern glibc.

Differential Revision:	https://reviews.freebsd.org/D1060
2015-05-24 15:28:17 +00:00
Dmitry Chagin
b2e0aad9e5 Regen for r283403. 2015-05-24 15:22:33 +00:00
Dmitry Chagin
ae50b4d7b5 Implement pselect6() system call.
Differential Revision:	https://reviews.freebsd.org/D1051
Reviewed by:	trasz
2015-05-24 15:21:25 +00:00
Dmitry Chagin
e7fa9de6eb Regen for r283401. 2015-05-24 15:19:44 +00:00
Dmitry Chagin
c3978c7bb1 Implement prlimit64() system call.
Differential Revision:	https://reviews.freebsd.org/D1050
Reviewed by:	emaste, trasz
2015-05-24 15:18:19 +00:00
Dmitry Chagin
737325a46d Regen for r283399. 2015-05-24 15:15:46 +00:00
Dmitry Chagin
254a937ee5 Implement dup3() system call.
Differential Revision:	https://reviews.freebsd.org/D1049
Reviewed by:	emaste
2015-05-24 15:14:51 +00:00
Dmitry Chagin
f680d990e8 Regen for r283396. 2015-05-24 15:12:38 +00:00
Dmitry Chagin
7ac9766db4 Implement rt_sigqueueinfo() system call.
Differential Revision:	https://reviews.freebsd.org/D1047
Reviewed by:	trasz
2015-05-24 15:11:32 +00:00
Dmitry Chagin
e4454275a5 Regen for r283394. 2015-05-24 15:08:25 +00:00
Dmitry Chagin
e5fe4ccf59 Implement waitid() system call.
Differential Revision:	https://reviews.freebsd.org/D1046
2015-05-24 15:06:39 +00:00
Dmitry Chagin
001398c4c5 To reduce code duplication introduce linux_copyout_rusage() method.
Use it in linux_wait4() system call and move linux_wait4() to the MI path.
While here add a prototype for the static bsd_to_linux_rusage().

Differential Revision:	https://reviews.freebsd.org/D2138
Reviewed by:	trasz
2015-05-24 15:03:09 +00:00
Dmitry Chagin
af682d487b Some style(9) && whitespaces fixes. No functional changes.
Differential Revision:	https://reviews.freebsd.org/D1041
Reviewed by:	emaste
2015-05-24 14:55:12 +00:00
Dmitry Chagin
81338031c4 Switch linuxulator to use the native 1:1 threads.
The reasons:
1. Get rid of the stubs/quirks with process dethreading,
   process reparent when the process group leader exits and close
   to this problems on wait(), waitpid(), etc.
2. Reuse our kernel code instead of writing excessive thread
   managment routines in Linuxulator.

Implementation details:

1. The thread is created via kern_thr_new() in the clone() call with
   the CLONE_THREAD parameter. Thus, everything else is a process.
2. The test that the process has a threads is done via P_HADTHREADS
   bit p_flag of struct proc.
3. Per thread emulator state data structure is now located in the
   struct thread and freed in the thread_dtor() hook.
   Mandatory holdig of the p_mtx required when referencing emuldata
   from the other threads.
4. PID mangling has changed. Now Linux pid is the native tid
   and Linux tgid is the native pid, with the exception of the first
   thread in the process where tid and pid are one and the same.

Ugliness:

   In case when the Linux thread is the initial thread in the thread
   group thread id is equal to the process id. Glibc depends on this
   magic (assert in pthread_getattr_np.c). So for system calls that
   take thread id as a parameter we should use the special method
   to reference struct thread.

Differential Revision:	https://reviews.freebsd.org/D1039
2015-05-24 14:53:16 +00:00
Dmitry Chagin
91d1786f65 In preparation for switching linuxulator to the use the native 1:1
threads add a hook for cleaning thread resources before the thread die.

Differential Revision:	https://reviews.freebsd.org/D1038
2015-05-24 14:51:29 +00:00
Dmitry Chagin
64cfe4dc38 Regen for r283379. 2015-05-24 14:47:00 +00:00
Dmitry Chagin
2003907d45 Implement a Linux version of sched_getparam() && sched_setparam().
Temporarily use the first thread in proc.

Differential Revision:	https://reviews.freebsd.org/D1036
Reviewed by:	trasz
2015-05-24 14:45:57 +00:00