Commit Graph

7204 Commits

Author SHA1 Message Date
John Baldwin
7a2c1d8c60 Enable interrupts on the BSP once all PICs are initialized.
This moves the enabling of interrupts slightly earlier (the old location
was still before devices were enumerated and probed) and does it in the
interrupt code (rather than in the device configuration code).  This
also avoids tripping over an assertion on the first TLB shootdown with
earlier AP startup.

Reviewed by:	kib
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D5710
2016-03-24 00:24:07 +00:00
Dmitry Chagin
351cf753eb Regen for r297061 (fstatfs64 Linux syscall).
MFC after:	1 week
2016-03-20 13:23:01 +00:00
Dmitry Chagin
99546279d6 Implement fstatfs64 system call.
PR:		181012
Submitted by:	John Wehle
MFC after:	1 week
2016-03-20 13:21:20 +00:00
Gleb Smirnoff
e33c2e6b06 Due to invalid use of a signed intermediate value in the bounds checking
during argument validity verification, unbound zero'ing of the process LDT
and adjacent memory can be initiated from usermode.

Submitted by:	CORE Security
Patch by:	kib
Security:	SA-16:15
2016-03-16 22:33:12 +00:00
Konstantin Belousov
3ef966c4c0 The PKRU state size is 4 bytes, its support makes the XSAVE area size
non-multiple of 64 bytes.  Thereafter, the user state save area is
misaligned, which triggers assertion in the debugging kernels, or
segmentation violation on accesses for non-debugging configs.

Force the desired alignment of the user save area as the fix
(workaround is to disable bit 9 in the hw.xsave_mask loader tunable).
This correction is required for booting on the upcoming Intel' Purley
platform.

Reported and tested by:	"Pieper, Jeffrey E" <jeffrey.e.pieper@intel.com>,
	jimharris
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2016-03-15 15:42:53 +00:00
John Baldwin
6fc8053f1a Fix reporting of the CloudABI ABI in kdump.
- Advertise the word size for CloudABI ABIs via the SV_LP64 flag.  All of
  the other ABIs include either SV_ILP32 or SV_LP64.
- Fix kdump to not assume a 32-bit ABI if the ABI flags field is non-zero
  but SV_LP64 isn't set.  Instead, only assume a 32-bit ABI if SV_ILP32 is
  set and fallback to the unknown value of "00" if neither SV_LP64 nor
  SV_ILP32 is set.

Reviewed by:	kib, ed
Differential Revision:	https://reviews.freebsd.org/D5560
2016-03-09 18:38:30 +00:00
Marcel Moolenaar
6bcf245ebc Bump VM_MAX_MEMSEGS from 2 to 3 to match the number of VM segment
identifiers present in vmmapi.h. In particular, it's now possible
to create a VM_FRAMEBUFFER segment.
2016-02-26 16:18:47 +00:00
Konstantin Belousov
abb8f08388 Return dst as the result from memcpy(9) on amd64.
PR:	207422
MFC after:	1 week
2016-02-24 11:58:15 +00:00
Svatopluk Kraus
b352b10400 As <machine/vm.h> is included from <vm/vm.h>, there is no need to
include it explicitly when <vm/vm.h> is already included.

Reviewed by:	alc, kib
Differential Revision:	https://reviews.freebsd.org/D5380
2016-02-22 09:10:23 +00:00
Svatopluk Kraus
35a0bc1260 As <machine/vmparam.h> is included from <vm/vm_param.h>, there is no
need to include it explicitly when <vm/vm_param.h> is already included.

Suggested by:	alc
Reviewed by:	alc
Differential Revision:	https://reviews.freebsd.org/D5379
2016-02-22 09:08:04 +00:00
Svatopluk Kraus
a1e1814d76 As <machine/pmap.h> is included from <vm/pmap.h>, there is no need to
include it explicitly when <vm/pmap.h> is already included.

Reviewed by:	alc, kib
Differential Revision:	https://reviews.freebsd.org/D5373
2016-02-22 09:02:20 +00:00
Gleb Smirnoff
b28cc462ad Include sys/_task.h into uma_int.h, so that taskqueue.h isn't a
requirement for uma_int.h.

Suggested by:	jhb
2016-02-09 20:22:35 +00:00
Gleb Smirnoff
e60b2fcbeb Redo r292484. Embed task(9) into zone, so that uz_maxaction is called
in a context that can sleep, allowing consumers of the KPI to run their
drain routines without any extra measures.

Discussed with:	jtl
2016-02-03 23:30:17 +00:00
John Baldwin
aa949be551 Convert ss_sp in stack_t and sigstack to void *.
POSIX requires these members to be of type void * rather than the
char * inherited from 4BSD.  NetBSD and OpenBSD both changed their
fields to void * back in 1998.  No new build failures were reported
via an exp-run.

PR:		206503 (exp-run)
Reviewed by:	kib
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D5092
2016-01-27 17:55:01 +00:00
Xin LI
669414e4fb Implement AT_SECURE properly.
AT_SECURE auxv entry has been added to the Linux 2.5 kernel to pass a
boolean flag indicating whether secure mode should be enabled. 1 means
that the program has changes its credentials during the execution.
Being exported AT_SECURE used by glibc issetugid() call.

Submitted by:	imp, dchagin
Security:	FreeBSD-SA-16:10.linux
Security:	CVE-2016-1883
2016-01-27 07:20:55 +00:00
Dmitry Chagin
9dba79fb66 Remove obsolete comment.
MFC after:	3 days
2016-01-23 08:08:06 +00:00
Dmitry Chagin
f138999141 Fix a typo.
MFC after:	3 days
2016-01-23 08:04:29 +00:00
Hans Petter Selasky
c1ecb7e114 Add missing atomic wrapper macro.
Reviewed by:	alfred @
Sponsored by:	Mellanox Technologies
MFC after:	1 week
2016-01-21 18:22:50 +00:00
Konstantin Belousov
f132cd0547 Use ANSI definitions. Wrap long line.
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2016-01-19 08:08:08 +00:00
Konstantin Belousov
b57e68141f Clear whole XMM register file instead of only XMM0. Also clear x87
registers.  This brings amd64 on par with i386, providing consistent
initial FPU state.

Note that we do not clear any extended state, at least because kernel
does not understand extended state structure and consequences of zero
overwrite after fninit()/fpusave().

Submitted by:	joss.upton@yahoo.com
PR:	206370
MFC after:	2 weeks
2016-01-19 08:04:02 +00:00
Gleb Smirnoff
de44d808ef Regen after r293907. 2016-01-14 10:15:21 +00:00
Gleb Smirnoff
037f750877 Change linux get_robust_list system call to match actual linux one.
The set_robust_list system call request the kernel to record the head
of the list of robust futexes owned by the calling thread. The head
argument is the list head to record.
The get_robust_list system call should return the head of the robust
list of the thread whose thread id is specified in pid argument.
The list head should be stored in the location pointed to by head
argument.

In contrast, our implemenattion of get_robust_list system call copies
the known portion of memory pointed by recorded in set_robust_list
system call pointer to the head of the robust list to the location
pointed by head argument.

So, it is possible for a local attacker to read portions of kernel
memory, which may result in a privilege escalation.

Submitted by:	mjg
Security:	SA-16:03.linux
2016-01-14 10:13:58 +00:00
Jung-uk Kim
4ec1c9bfac Remove dead code when the target processor has POPCNT instruction. 2016-01-13 19:19:50 +00:00
Dmitry Chagin
038c720553 Implement vsyscall hack. Prior to 2.13 glibc uses vsyscall
instead of vdso. An upcoming linux_base-c6 needs it.

Differential Revision:  https://reviews.freebsd.org/D1090

Reviewed by:	kib, trasz
MFC after:	1 week
2016-01-09 20:18:53 +00:00
Ed Maste
0e42ee5dd8 Move amd64 metadata.h to x86 and share with i386
MFC after:	1 week
2016-01-07 19:47:26 +00:00
Ian Lepore
69dcb7e771 Make the 'env' directive described in config(5) work on all architectures,
providing compiled-in static environment data that is used instead of any
data passed in from a boot loader.

Previously 'env' worked only on i386 and arm xscale systems, because it
required the MD startup code to examine the global envmode variable and
decide whether to use static_env or an environment obtained from the boot
loader, and set the global kern_envp accordingly.  Most startup code wasn't
doing so.  Making things even more complex, some mips startup code uses an
alternate scheme that involves calling init_static_kenv() to pass an empty
buffer and its size, then uses a series of kern_setenv() calls to populate
that buffer.

Now all MD startup code calls init_static_kenv(), and that routine provides
a single point where envmode is checked and the decision is made whether to
use the compiled-in static_kenv or the values provided by the MD code.

The routine also continues to serve its original purpose for mips; if a
non-zero buffer size is passed the routine installs the empty buffer ready
to accept kern_setenv() values.  Now if the size is zero, the provided buffer
full of existing env data is installed.  A NULL pointer can be passed if the
boot loader provides no env data; this allows the static env to be installed
if envmode is set to do so.

Most of the work here is a near-mechanical change to call the init function
instead of directly setting kern_envp.  A notable exception is in xen/pv.c;
that code was originally installing a buffer full of preformatted env data
along with its non-zero size (like mips code does), which would have allowed
kern_setenv() calls to wipe out the preformatted data.  Now it passes a zero
for the size so that the buffer of data it installs is treated as
non-writeable.
2016-01-02 02:53:48 +00:00
John Baldwin
9e8d8b4b0c Move shared variables from {amd64,i386}/initcpu.c to x86/identcpu.c.
While here, move the common bits of <machine/cputypes.h> to
<x86/cputypes.h> as well.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D4670
2015-12-23 21:41:42 +00:00
Enji Cooper
418629d81d Remove redundant ctx_switch_xsave declaration in sys/amd64/include/md_var.h
This variable was added to sys/x86/include/x86_var.h recently.

This unbreaks building kernel source that #includes both md_var.h and x86_var.h
with gcc 4.2.1 on amd64

Differential Revision: https://reviews.freebsd.org/D4686
Reviewed by: kib
X-MFC with: r291949
Sponsored by: EMC / Isilon Storage Division
2015-12-22 20:08:32 +00:00
Warner Losh
2fca0f2dd4 Save the physical address passed into the kernel of the UEFI system
table.
2015-12-19 19:01:43 +00:00
Konstantin Belousov
7c958a41fe Merge common parts of i386 and amd64 md_var.h and smp.h into
new headers x86/include x86_var.h and x86_smp.h.

Reviewed by:	emaste, jhb
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D4358
2015-12-07 17:41:20 +00:00
Konstantin Belousov
49e806677c Use ANSI C definition.
MFC after:	1 week
2015-12-07 17:24:55 +00:00
Conrad Meyer
10386b56ad pmap_invalidate_range: For very large ranges, flush the whole TLB
Typical TLBs have 40-512 entries available.  At some point, iterating
every single page in a requested invalidation range and issuing invlpg
on it is more expensive than flushing the TLB and allowing it to reload
on demand.

Broadwell CPUs have 1536 L2 TLB entries, so I've picked the arbitrary
number 4096 entries as a hueristic at which point we flush TLB rather
than invalidating every single potential page.

Reviewed by:	alc
Feedback from:	jhb, kib
MFC notes:	Depends on r291688
Sponsored by:	EMC / Isilon Storage Division
Differential Revision:	https://reviews.freebsd.org/D4280
2015-12-06 17:39:13 +00:00
Konstantin Belousov
27691a24ab For amd64 non-PCID machines, and for i386 machines with support for
the PG_G global pte flag, pmap_invalidate_all() fails to flush global
TLB entries [*].  This is because TLB shootdown handler for such
configs reloads CR3, and on i386 pmap_invalidate_all() does the same
for the initiating CPU.  Note that current code does not issue total
invalidation requests for the kernel_pmap.

Rename amd64 function invltlb_globpcid() to invltlb_glob(), it is not
specific for PCID for quite some time, and implement the same
functionality for i386.  Use the function instead of invltlb() in
shootdown handlers and in i386 pmap_invalidate_all(), but only for the
kernel pmap (which maps pages with the PG_G attribute set), which
takes care of PG_G TLB entries on flush.

To detect the affected pmap in i386 TLB shootdown handler, pmap should
be passed to the smp_masked_invltlb() function, which makes amd64 and
i386 TLB shootdown code almost identical.  Merge the code under x86/.

Noted by:	jhb [*]
Reviewed by:	cem, jhb, pho
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D4346
2015-12-03 11:14:14 +00:00
Konstantin Belousov
724f4b62b0 Remove sv_prepsyscall, sv_sigsize and sv_sigtbl members of the struct
sysent.

sv_prepsyscall is unused.

sv_sigsize and sv_sigtbl translate signal number from the FreeBSD
namespace into the ABI domain.  It is only utilized on i386 for iBCS2
binaries.  The issue with this approach is that signals for iBCS2 were
delivered with the FreeBSD signal frame layout, which does not follow
iBCS2.  The same note is true for any other potential user if
sv_sigtbl.  In other words, if ABI needs signal number translation, it
really needs custom sv_sendsig method instead.

Sponsored by:	The FreeBSD Foundation
2015-11-28 08:49:07 +00:00
Ed Maste
2e0002c18e Fix whitespace on addition of IPSEC option 2015-11-26 21:35:50 +00:00
Konstantin Belousov
5e27d79314 Split kerne timekeep ABI structure vdso_sv_tk out of the struct
sysentvec.  This allows the timekeep data to be shared between similar
ABIs which cannot share sysentvec.

Make the timekeep_push_vdso() tick callback to the timekeep structures
instead of sysentvecs.  If several sysentvec share the vdso_sv_tk
structure, we would update the userspace data several times on each
tick, without the change.

Only allocate vdso_sv_tk in the exec_sysvec_init() sysinit when
sysentvec is marked with the new SV_TIMEKEEP flag.  This saves
allocation and update of unneeded vdso_sv_tk for ABIs which do not
provide userspace gettimeofday yet, which are PowerPCs arches right
now.

Make vdso_sv_tk allocator public, namely split out and export
alloc_sv_tk() and alloc_sv_tk_compat32().  ABIs which share timekeep
data now can allocate it manually and share as appropriate.

Requested by:	nwhitehorn
Tested by:	nwhitehorn, pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-11-23 07:09:35 +00:00
Mark Johnston
7672ca059a Remove unneeded includes of opt_kdtrace.h.
As of r258541, KDTRACE_HOOKS is defined in opt_global.h, so opt_kdtrace.h
is not needed when defining SDT(9) probes.
2015-11-22 02:01:01 +00:00
John Baldwin
645743ea99 Export various helper variables describing the layout and size of
certain kernel structures for use by debuggers. This mostly aids
in examining cores from a kernel without debug symbols as a debugger
can infer these values if debug symbols are available.

One set of variables describes the layout of 'struct linker_file' to
walk the list of loaded kernel modules.

A second set of variables describes the layout of 'struct proc' and
'struct thread' to walk the list of processes in the kernel and the
threads in each process.

The 'pcb_size' variable is used to index into the stoppcbs[] array.

The 'vm_maxuser_address' is used to distinguish kernel virtual addresses
from user addresses. This doesn't have to be perfect, and
'vm_maxuser_address' is a cheap and simple way to differentiate kernel
pointers from simple values like TIDs and PIDs.

While here, annotate the fields in struct pcb used by kgdb on amd64
and i386 to note that their ABI should be preserved.  Annotations for
other platforms will be added in the future.

Reviewed by:	kib
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D3773
2015-11-12 22:00:59 +00:00
Conrad Meyer
0e5d2011ae pmap_change_attr: Only fixup DMAP for DMAPed ranges
pmap_change_attr must change the memory type of both the requested KVA
and the corresponding DMAP mappings (if such mappings exist), to satisfy
an Intel requirement that two or more mappings to the same physical
pages must have the same memory type.

However, not all kernel mapped pages have corresponding DMAP mappings --
for example, 64-bit BARs.  Skip fixing up the DMAP for out-of-bounds
addresses.

Submitted by:	Steve Wahl <steve_wahl@dell.com>
Reviewed by:	alc, jhb
Sponsored by:	Dell Compellent
Differential Revision:	https://reviews.freebsd.org/D4030
2015-10-29 19:07:00 +00:00
John Baldwin
2219c44a1f Update for LINUX32 rename. The assembler didn't complain about undefined
symbols but just used 0 after the rename.
2015-10-29 15:20:47 +00:00
John Baldwin
6cea44a704 Fix build with DEBUG defined.
Reported by:	hselasky
2015-10-29 15:16:47 +00:00
Kirk McKusick
a57418a761 Bring the tags and links entries for amd64 up to date.
Based on how out of date it is, I doubt that anyone
other than me and my code-reading students still use it.
2015-10-27 22:59:24 +00:00
Konstantin Belousov
af95bbf5bf Intel SDM before revision 56 described the CLFLUSH instruction as only
ordered with the MFENCE instruction.  Similar weak guarantees are also
specified by the AMD APM vol. 3 rev. 3.22.  x86 pmap methods
pmap_invalidate_cache_range() and pmap_invalidate_cache_pages() braced
CLFLUSH loop with MFENCE both before and after the loop.

In the revision 56 of SDM, Intel stated that all existing
implementations of CLFLUSH are strict, CLFLUSH instructions execution
is ordered WRT other CLFLUSH and writes.  Also, the strict behaviour
is made architectural.

A new instruction CLFLUSHOPT (which was documented for some time in
the Instruction Set Extensions Programming Reference) provides the
weak behaviour which was previously attributed to CLFLUSH.

Use CLFLUSHOPT when available.  When CLFLUSH is used on Intel CPUs, do
not execute MFENCE before and after the flushing loop.

Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
2015-10-24 21:37:47 +00:00
Konstantin Belousov
3f8e071052 Add CLFLUSHOPT instruction wrappers.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-10-23 11:45:38 +00:00
John Baldwin
f2264f5048 Regen for linux32 rename and linux64 systrace. 2015-10-22 21:33:37 +00:00
John Baldwin
2f99bcce1e Rename remaining linux32 symbols such as linux_sysent[] and
linux_syscallnames[] from linux_* to linux32_* to avoid conflicts with
linux64.ko.  While here, add support for linux64 binaries to systrace.
- Update NOPROTO entries in amd64/linux/syscalls.master to match the
  main table to fix systrace build.
- Add a special case for union l_semun arguments to the systrace
  generation.
- The systrace_linux32 module now only builds the systrace_linux32.ko.
  module on amd64.
- Add a new systrace_linux module that builds on both i386 and amd64.
  For i386 it builds the existing systrace_linux.ko.  For amd64 it
  builds a systrace_linux.ko for 64-bit binaries.

Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D3954
2015-10-22 21:28:20 +00:00
John Baldwin
5047105b71 Merge r289055 to amd64/linux32:
linux: fix handling of out-of-bounds syscall attempts

Due to an off by one the code would read an entry past the table, as
opposed to the last entry which contains the nosys handler.
2015-10-22 21:23:58 +00:00
Ed Schouten
b78ef4bd86 Refactoring: move out generic bits from cloudabi64_sysvec.c.
In order to make it easier to support CloudABI on ARM64, move out all of
the bits from the AMD64 cloudabi_sysvec.c into a new file
cloudabi_module.c that would otherwise remain identical. This reduces
the AMD64 specific code to just ~160 lines.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D3974
2015-10-22 09:07:53 +00:00
Roger Pau Monné
6a306bff7f x86/xen: Consolidate xen-os.h in a single place
amd64 and i386 platform code contain very similar xen/xen-os.h

The only differences are:
 - Functions/variables/types which were unused in i386/xen/xen-os.h:
    * xen_xchg
    * __xchg_dummy
    * __xg
    * __xchg
    * atomic_t
    * atomic_inc
    * rdtscll

The functions/variables/types unused in xen-os.h can be dropped and there
is no more differences betwen amd64 and i386.

The new header is placed in x86/include/xen and each platform will have
dummy headers include x86/xen/*.h. This is to be able to include
machine/xen/*.h in the PV drivers.

Submitted by:		Julien Grall <julien.grall@citrix.com>
Reviewed by:		royger
Differential Revision:	https://reviews.freebsd.org/D3880
Sponsored by:		Citrix Systems R&D
2015-10-21 10:04:35 +00:00
Alexander Motin
4a3760bae6 Remove compatibility shims for legacy ATA device names.
We got new ATA stack in FreeBSD 8.x, switched to it at 9.x, completely
removed old stack at 10.x, so at 11.x it is time to remove compat shims.
2015-10-11 13:01:51 +00:00