freebsd-skq

Author	SHA1	Message	Date
Konstantin Belousov	8954a9a4e6	Add the atomic_thread_fence() family of functions with intent to provide a semantic defined by the C11 fences with corresponding memory_order. atomic_thread_fence_acq() gives r \| r, w, where r and w are read and write accesses, and \| denotes the fence itself. atomic_thread_fence_rel() is r, w \| w. atomic_thread_fence_acq_rel() is the combination of the acquire and release in single operation. Note that reads after the acq+rel fence could be made visible before writes preceeding the fence. atomic_thread_fence_seq_cst() orders all accesses before/after the fence, and the fence itself is globally ordered against other sequentially consistent atomic operations. Reviewed by: alc Discussed with: bde Sponsored by: The FreeBSD Foundation MFC after: 3 weeks	2015-07-08 18:12:24 +00:00
Neel Natu	5e4f29c037	Move the 'devmem' device nodes from /dev/vmm to /dev/vmm.io Some external tools just do a 'ls /dev/vmm' to figure out the bhyve virtual machines on the host. These tools break if the devmem device nodes also appear in /dev/vmm. Requested by: grehan	2015-07-06 19:41:43 +00:00
George V. Neville-Neil	3839369c03	Enable IPSEC in all GENERIC kernels. Universe and kernel build tests passed 4 July 2015 PR: 128030 Sponsored by: Rubicon Communications (Netgate)	2015-07-04 17:37:00 +00:00
Konstantin Belousov	6fdfd88220	Use single instance of the identical INKERNEL() and PMC_IN_KERNEL() macros on amd64 and i386. Move the definition to machine/param.h. kgdb defines INKERNEL() too, the conflict is resolved by renaming kgdb version to PINKERNEL(). On i386, correct the lowest kernel address. After the shared page was introduced, USRSTACK no longer points to the last user address + 1 [] Submitted by: Oliver Pinter [] Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-07-02 14:37:21 +00:00
Konstantin Belousov	3ce8c94f29	Disallow a debugger on 64bit system to set fs/gs bases of the 32bit process beyond the end of the process address space. Such setting is not dangerous to the kernel integrity, but it causes confusing application misbehaviour. Sponsored by: The FreeBSD Foundation MFC after: 12 days	2015-07-01 16:37:03 +00:00
Konstantin Belousov	3ac3c0f269	Add a comment about too strong semantic of atomic_load_acq() on x86. Submitted by: bde MFC after: 2 weeks	2015-06-29 09:58:40 +00:00
Konstantin Belousov	d9008978c8	pcb_gs32sd is unused for long time, remove it. Keep the padding in pcb. Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2015-06-29 07:53:44 +00:00
Konstantin Belousov	1817023775	Add x86 PT_GETFSBASE, PT_GETGSBASE machine-depended ptrace requests to obtain the thread %fs and %gs bases. Add x86 PT_SETFSBASE and PT_SETGSBASE requests to set the bases from debuggers. The set requests, similarly to the sysarch({I386,AMD64}_SET_FSBASE), override the corresponding segment registers. The main purpose of the operations is to retrieve and modify the tcb address for debuggee. Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2015-06-29 07:07:24 +00:00
Konstantin Belousov	7626d062c3	Remove unneeded data dependency, currently imposed by atomic_load_acq(9), on it source, for x86. Right now, atomic_load_acq() on x86 is sequentially consistent with other atomics, code ensures this by doing store/load barrier by performing locked nop on the source. Provide separate primitive __storeload_barrier(), which is implemented as the locked nop done on a cpu-private variable, and put __storeload_barrier() before load, to keep seq_cst semantic but avoid introducing false dependency on the no-modification of the source for its later use. Note that seq_cst property of x86 atomic_load_acq() is not documented and not carried by atomics implementations on other architectures, although some kernel code relies on the behaviour. This commit does not intend to change this. Reviewed by: alc Discussed with: bde Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2015-06-28 05:04:08 +00:00
Tycho Nightingale	ea587cd825	verify_gla() needs to account for non-zero segment base addresses. Reviewed by: neel	2015-06-26 18:00:29 +00:00
Roger Pau Monné	7e748038cd	amd64: set the correct LMA values The current linker script generates program headers with VMA == LMA: Entry point 0xffffffff802e7000 There are 6 program headers, starting at offset 64 Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flags Align PHDR 0x0000000000000040 0xffffffff80200040 0xffffffff80200040 0x0000000000000150 0x0000000000000150 R E 8 INTERP 0x0000000000000190 0xffffffff80200190 0xffffffff80200190 0x000000000000000d 0x000000000000000d R 1 [Requesting program interpreter: /red/herring] LOAD 0x0000000000000000 0xffffffff80200000 0xffffffff80200000 0x00000000010559b0 0x00000000010559b0 R E 200000 LOAD 0x0000000001056000 0xffffffff81456000 0xffffffff81456000 0x0000000000132638 0x000000000052ecf8 RW 200000 DYNAMIC 0x0000000001056000 0xffffffff81456000 0xffffffff81456000 0x00000000000000d0 0x00000000000000d0 RW 8 GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 RWE 8 This is fine for the FreeBSD loader, because it completely ignores p_paddr and instead uses p_vaddr with a hardcoded offset. Other loaders however acknowledge p_paddr (like the Xen ELF loader), in which case they will try to load the kernel at the wrong place. Fix this by adding an AT keyword to the first section specifying the physical address, other sections will follow suit, so it ends up looking like: Entry point 0xffffffff802e7000 There are 6 program headers, starting at offset 64 Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flags Align PHDR 0x0000000000000040 0xffffffff80200040 0x0000000000200040 0x0000000000000150 0x0000000000000150 R E 8 INTERP 0x0000000000000190 0xffffffff80200190 0x0000000000200190 0x000000000000000d 0x000000000000000d R 1 [Requesting program interpreter: /red/herring] LOAD 0x0000000000000000 0xffffffff80200000 0x0000000000200000 0x00000000010559b0 0x00000000010559b0 R E 200000 LOAD 0x0000000001056000 0xffffffff81456000 0x0000000001456000 0x0000000000132638 0x000000000052ecf8 RW 200000 DYNAMIC 0x0000000001056000 0xffffffff81456000 0x0000000001456000 0x00000000000000d0 0x00000000000000d0 RW 8 GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 RWE 8 Tested on bare metal using the native FreeBSD loader and grub2 from TRUEOS. Sponsored by: Citrix Systems R&D Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D2783	2015-06-26 07:12:17 +00:00
Neel Natu	90e528f838	Restore the host's GS.base before returning from 'svm_launch()'. Previously this was done by the caller of 'svm_launch()' after it returned. This works fine as long as no code is executed in the interim that depends on pcpu data. The dtrace probe 'fbt:vmm:svm_launch:return' broke this assumption because it calls 'dtrace_probe()' which in turn relies on pcpu data. Reported by: avg MFC after: 1 week	2015-06-23 02:17:23 +00:00
Neel Natu	9b1aa8d622	Restructure memory allocation in bhyve to support "devmem". devmem is used to represent MMIO devices like the boot ROM or a VESA framebuffer where doing a trap-and-emulate for every access is impractical. devmem is a hybrid of system memory (sysmem) and emulated device models. devmem is mapped in the guest address space via nested page tables similar to sysmem. However the address range where devmem is mapped may be changed by the guest at runtime (e.g. by reprogramming a PCI BAR). Also devmem is usually mapped RO or RW as compared to RWX mappings for sysmem. Each devmem segment is named (e.g. "bootrom") and this name is used to create a device node for the devmem segment (e.g. /dev/vmm/testvm.bootrom). The device node supports mmap(2) and this decouples the host mapping of devmem from its mapping in the guest address space (which can change). Reviewed by: tychon Discussed with: grehan Differential Revision: https://reviews.freebsd.org/D2762 MFC after: 4 weeks	2015-06-18 06:00:17 +00:00
John Baldwin	5eb95e11ba	Report the values of x86 segment registers to remote debuggers. While here, also report %eflags from the i386 trapframe. Differential Revision: https://reviews.freebsd.org/D2743 Reviewed by: kib Obtained from: 1 month	2015-06-12 15:14:08 +00:00
Ruslan Bukin	4f4d15f0d0	Allow DTrace to be compiled-in to the kernel. This will require for AArch64 as we dont have modules yet. Sponsored by: HEIF5 Sponsored by: ARM Ltd. Differential Revision: https://reviews.freebsd.org/D1997	2015-06-10 15:53:39 +00:00
Mateusz Guzik	21de5aea6c	Fixup the build after r284215. Submitted by: Ivan Klymenko <fidaj ukr.net> [slighly modified]	2015-06-10 12:39:01 +00:00
Mateusz Guzik	f6f6d24062	Implement lockless resource limits. Use the same scheme implemented to manage credentials. Code needing to look at process's credentials (as opposed to thred's) is provided with *_proc variants of relevant functions. Places which possibly had to take the proc lock anyway still use the proc pointer to access limits.	2015-06-10 10:48:12 +00:00
Mateusz Guzik	4ea6a9a28f	Generalised support for copy-on-write structures shared by threads. Thread credentials are maintained as follows: each thread has a pointer to creds and a reference on them. The pointer is compared with proc's creds on userspace<->kernel boundary and updated if needed. This patch introduces a counter which can be compared instead, so that more structures can use this scheme without adding more comparisons on the boundary.	2015-06-10 10:43:59 +00:00
Alan Cox	65a9768f62	Account for superpage mappings that are created by pmap_copy().	2015-06-09 18:04:28 +00:00
Tycho Nightingale	277bdd9950	Support guest writes to the TSC by enabling the "use TSC offsetting" execution control and writing the difference between the host TSC and the guest TSC into the TSC offset in the VMCS upon encountering a write. Reviewed by: neel	2015-06-09 00:14:47 +00:00
Dmitry Chagin	c2bc5b15eb	Futex is an aligned 32-bit integer. Use the proper instruction and operand when dereferencing futex pointer.	2015-06-08 17:39:25 +00:00
Alan Cox	966272ca33	Retire VM_FREEPOOL_CACHE as the next step in eliminating PG_CACHE pages. Differential Revision: https://reviews.freebsd.org/D2712 Reviewed by: kib Sponsored by: EMC / Isilon Storage Division	2015-06-08 04:59:32 +00:00
Konstantin Belousov	32a1e9e4a5	Update print_INTEL_TLB() by the tag values from the Intel SDM rev. 55. The modern CPUs cache and TLB descriptions looked quite questionable without the update, e.g. Haswell i7 4770S reported: Data TLB: 4 KB pages, 4-way set associative, 64 entries L2 cache: 256 kbytes, 8-way associative, 64 bytes/line After the update, the report is: Data TLB: 1 GByte pages, 4-way set associative, 4 entries Data TLB: 4 KB pages, 4-way set associative, 64 entries Instruction TLB: 2M/4M pages, fully associative, 8 entries Instruction TLB: 4KByte pages, 8-way set associative, 64 entries 64-Byte prefetching Shared 2nd-Level TLB: 4 KByte/2MByte pages, 8-way associative, 1024 entries L2 cache: 256 kbytes, 8-way associative, 64 bytes/line Some tags were apparently removed from the table 3-21, Vol. 2A. Keep them around, but add a comment stating the removal. Update the format line for cpu_stdext_feature according to the bits from the SDM rev.55. It appears that Haswells do not store %cs and %ds values in the FPU save area. Store content of the %ecx register from the CPUID leaf 0x7 subleaf 0 as cpu_stdext_feature2 and print defined bits from it, again acording to SDM rev. 55. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-06-06 22:03:24 +00:00
Neel Natu	647c87825c	The 'verify_gla()' function is used to ensure that the effective address after decoding the instruction matches the one provided by hardware. Prior to r283293 'vie->num_valid' used to contain the actual length of the instruction whereas now it contains the maximum instruction length possible. This introduced a bug when calculating a RIP-relative base address. Fix this by using 'vie->num_processed' rather than 'vie->num_valid' as the length of the emulated instruction. Reported and tested by: tychon MFC after: 1 week	2015-06-05 21:22:26 +00:00
Neel Natu	b14bd6ac9d	Use tunable 'hw.vmm.svm.features' to disable specific SVM features even though they might be available in hardware. Use tunable 'hw.vmm.svm.num_asids' to limit the number of ASIDs used by the hypervisor. MFC after: 1 week	2015-06-04 02:12:23 +00:00
Dimitry Andric	38954a1d1c	Remove unneeded NULL checks in amd64's trap_fatal(). Since td_name is an array member of struct thread, it can never be NULL, so the check can be removed. In addition, curproc can never be NULL, so remove the if statement, and splice the two printfs() together. While here, remove the u_long cast, and use the correct printf format specifier curproc->p_pid. Reviewed by: kib MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D2695	2015-06-01 06:50:39 +00:00
Konstantin Belousov	69baeadc31	Remove several write-only variables, all reported by the gcc 4.9 buildkernel run. Some of them were write-only under some kernel options, e.g. variables keeping values only used by CTR() macros. It costs nothing to the code readability and correctness to eliminate the warnings in those cases too by removing the local cached values used only for single-access. Review: https://reviews.freebsd.org/D2665 Reviewed by: rodrigc Looked at by: bjk Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-05-29 13:24:17 +00:00
Neel Natu	248e6799e9	Fix non-deterministic delays when accessing a vcpu that was in "running" or "sleeping" state. This is done by forcing the vcpu to transition to "idle" by returning to userspace with an exit code of VM_EXITCODE_REQIDLE. MFC after: 2 weeks	2015-05-28 17:37:01 +00:00
Konstantin Belousov	1c5f423183	Enabled rewritten PCID support by default. Sponsored by: The FreeBSD Foundation MFC after: 1 month	2015-05-27 09:50:18 +00:00
Dmitry Chagin	d707582f83	When I merged the lemul branch I missied kib@'s r282708 commit. This is not the final fix as I need properly cleanup thread resources before other threads suicide. Tested by: Ruslan Makhmatkhanov	2015-05-25 20:44:46 +00:00
Dmitry Chagin	5c5aac2d45	Regen for r283492.	2015-05-24 18:09:01 +00:00
Dmitry Chagin	9802eb9ebc	Implement Linux specific syncfs() system call.	2015-05-24 18:08:01 +00:00
Dmitry Chagin	c532a88cfc	Regen for r283488.	2015-05-24 18:05:21 +00:00
Dmitry Chagin	e1ff74c0f7	Implement recvmmsg() and sendmmsg() system calls.	2015-05-24 18:04:04 +00:00
Dmitry Chagin	b7aaa9fdb0	Reduce duplication between MD Linux code by moving msg related struct definitions out into the compat/linux/linux_socket.h	2015-05-24 18:03:14 +00:00
Dmitry Chagin	3ce05165b1	Regen for r283484.	2015-05-24 18:02:17 +00:00
Dmitry Chagin	6e4c8004dc	Implement epoll_pwait() system call.	2015-05-24 18:00:14 +00:00
Dmitry Chagin	ca045164a7	Regen for r283480.	2015-05-24 17:58:24 +00:00
Dmitry Chagin	19d8b461f4	Add utimensat() system call. The patch developed by Jilles Tjoelker and Andrew Wilcox and adopted for lemul branch by me.	2015-05-24 17:57:07 +00:00
Dmitry Chagin	0c38abc250	The kernel sends signals to the processes via ABI specific sv_sendsig method. Native ABI do not need signal conversion, only emulators may want this. Usually emulators implements its own sv_sendsig method. For now only ibcs2 emulator does not have own sv_sendsig implementation and depends on native sendsig() method. So, remove any extra attempts to convert signal numbers from native sendsig() methods except from i386 where ibsc2 is living.	2015-05-24 17:56:02 +00:00
Dmitry Chagin	4ab7403bbd	Rework signal code to allow using it by other modules, like linprocfs: 1. Linux sigset always 64 bit on all platforms. In order to move Linux sigset code to the linux_common module define it as 64 bit int. Move Linux sigset manipulation routines to the MI path. 2. Move Linux signal number definitions to the MI path. In general, they are the same on all platforms except for a few signals. 3. Map Linux RT signals to the FreeBSD RT signals and hide signal conversion tables to avoid conversion errors. 4. Emulate Linux SIGPWR signal via FreeBSD SIGRTMIN signal which is outside of allowed on Linux signal numbers. PR: 197216	2015-05-24 17:47:20 +00:00
Dmitry Chagin	a7ac457613	According to Linux man sigaltstack(3) shall return EINVAL if the ss argument is not a null pointer, and the ss_flags member pointed to by ss contains flags other than SS_DISABLE. However, in fact, Linux also allows SS_ONSTACK flag which is simply ignored. For buggy apps (at least mono) ignore other than SS_DISABLE flags as a Linux do. While here move MI part of sigaltstack code to the appropriate place. Reported by: abi at abinet dot ru	2015-05-24 17:44:08 +00:00
Dmitry Chagin	657100de57	Regen for r283467.	2015-05-24 17:39:18 +00:00
Dmitry Chagin	fcdffc03f8	Call nosys in case when the incorrect syscall number is specified. Reported by: trinity	2015-05-24 17:38:02 +00:00
Dmitry Chagin	8d939ad405	Regen for r283465.	2015-05-24 17:35:42 +00:00
Dmitry Chagin	b6aeb7d5dd	Add preliminary fallocate system call implementation to emulate posix_fallocate() function. Differential Revision: https://reviews.freebsd.org/D1523 Reviewed by: emaste	2015-05-24 17:33:21 +00:00
Dmitry Chagin	274d2df2e5	Regen for r283451.	2015-05-24 17:00:43 +00:00
Dmitry Chagin	a6b40812ec	Implement ppoll() system call. Differential Revision: https://reviews.freebsd.org/D1105 Reviewed by: trasz	2015-05-24 16:59:25 +00:00
Dmitry Chagin	e8b026b37e	Include opt_compat.h, so that COMPAT_LINUX32 is defined, and we can access to the semop structs and functions. Submitted by: cognet@ Differential Revision: https://reviews.freebsd.org/D1095 Reviewed by: trasz	2015-05-24 16:51:04 +00:00
Dmitry Chagin	22f3dfdc12	Regen for r283444.	2015-05-24 16:50:17 +00:00

1 2 3 4 5 ...

7108 Commits