Commit Graph

2720 Commits

Author SHA1 Message Date
Hans Petter Selasky
8600ba1aa9 Make sure the selrecord() function is only called from within system
polling contexts in the LinuxKPI.

After the kqueue() support was added to the LinuxKPI in r319409 the
Linux poll file operation will be used outside the system file polling
callback function, which can cause a NULL-pointer panic inside
selrecord() because curthread->td_sel is set to NULL. This patch moves
the selrecord() call away from poll_wait() and to the system file poll
callback function in the LinuxKPI, which essentially wraps the Linux
one. This is similar to what the cuse(3) module is currently doing.
Refer to sys/fs/cuse/*.[ch] for more details.

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-06-01 16:49:48 +00:00
Hans Petter Selasky
328c75d621 Translate the ERESTARTSYS error code into ERESTART in the LinuxKPI
ioctl(), read() and write() system call handlers. This error code is
internal to the kernel and should not be seen by user-space programs
according to Linux.

Submitted by:		Yanko Yankulov <yanko.yankulov@gmail.com>
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-06-01 09:53:55 +00:00
Hans Petter Selasky
a6b28ee02a Add generic kqueue() and kevent() support to the LinuxKPI character
devices. The implementation allows read and write filters to be
created and piggybacks on the poll() file operation to determine when
a filter should trigger. The piggyback mechanism is simply to check
for the EWOULDBLOCK or EAGAIN return code from read(), write() or
ioctl() system calls and then update the kqueue() polling state bits.
The implementation is similar to the one found in the cuse(3) module.
Refer to sys/fs/cuse/*.[ch] for more details.

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-06-01 09:34:51 +00:00
Hans Petter Selasky
c2676069cb Implement print_hex_dump(), print_hex_dump_bytes() and
printk_ratelimited() in the LinuxKPI.

While at it fix the inclusion guard of printk.h to be similar to the
rest of the LinuxKPI header files.

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-05-31 16:24:02 +00:00
Hans Petter Selasky
427cefde27 Properly implement idr_preload() and idr_preload_end() in the
LinuxKPI.

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-05-31 16:08:30 +00:00
Hans Petter Selasky
dff36e69a1 Implement in_atomic() function in the LinuxKPI.
Obtained from:		kmacy @
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-05-31 15:05:44 +00:00
Hans Petter Selasky
90b30e6560 Properly set the .d_name field in the cdevsw structure for the
LinuxKPI.

Obtained from:		kmacy @
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-05-31 13:11:06 +00:00
Hans Petter Selasky
d56f1ed887 Make sure the VMAP's "vm_file" field is referenced in a Linux
compatible way by the linux_dev_mmap_single() function in the
LinuxKPI.

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-05-31 13:07:05 +00:00
Hans Petter Selasky
cca15f28c5 Remove the VMA handle from its list before calling the LinuxKPI VMA
close operation to prevent other threads from reusing the VM object
handle pointer.

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-05-31 13:05:54 +00:00
Hans Petter Selasky
68b9f2f00c Don't acquire a reference on the VM-space when allocating the LinuxKPI
task structure to avoid deadlock when tearing down the VM object
during a process exit.

Found by:		markj @
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-05-31 13:01:27 +00:00
Hans Petter Selasky
ea67550be0 Fix a reference count leak in the LinuxKPI due to calling VM open when
it shouldn't be called.

Background:
The Linux VM open operation is called when a new VMA is
created on top of the current VMA. This is done through either mremap
flow or split_vma, usually due to mlock, madvise, munmap and so
on. This is currently not supported by the LinuxKPI.

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-05-31 12:08:25 +00:00
Hans Petter Selasky
f5a9867b7d Fixes for refcounting "struct linux_file" in the LinuxKPI.
- Allow "struct linux_file" to be refcounted when its "_file" member
  is NULL by using its "f_count" field. The reference counts are
  transferred to the file structure when the file descriptor is
  installed.

- Add missing vdrop() calls for error cases during open().

- Set the "_file" member of "struct linux_file" during open. This
allows use of refcounting through get_file() and fput() with LinuxKPI
character devices.

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-05-31 12:02:59 +00:00
Hans Petter Selasky
3f743d782a Make sure the thread's priority is restored for all three cases inside
linux_synchronize_rcu_cb() in the LinuxKPI.

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-05-31 10:01:15 +00:00
Mark Johnston
cb564d2436 Add some miscellaneous definitions to support DRM drivers.
Reviewed by:	hselasky
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D10985
2017-05-30 17:16:08 +00:00
Dmitry Chagin
9ecc1abca3 On success, getrandom() Linux system call returns the number of bytes that
were copied to the buffer supplied by the user.

Also fix getrandom() if Linuxulator modules are built without the kernel.

PR:		219464
Submitted by:	Maciej Pasternacki
Reported by:	Maciej Pasternacki
MFC after:	1 week
2017-05-28 07:40:09 +00:00
Allan Jude
c20feae640 Followup to r318765 (capsicumize cpuset_*affinity)
Update *sysent files
2017-05-24 01:01:57 +00:00
Allan Jude
f299c47b52 Allow cpuset_{get,set}affinity in capabilities mode
bhyve was recently sandboxed with capsicum, and needs to be able to
control the CPU sets of its vcpu threads

Reviewed by:	emaste, oshogbo, rwatson
MFC after:	2 weeks
Sponsored by:	ScaleEngine Inc.
Differential Revision:	https://reviews.freebsd.org/D10170
2017-05-24 00:58:30 +00:00
Konstantin Belousov
ec95c622ff Regen. 2017-05-23 09:30:42 +00:00
Konstantin Belousov
6992112349 Commit the 64-bit inode project.
Extend the ino_t, dev_t, nlink_t types to 64-bit ints.  Modify
struct dirent layout to add d_off, increase the size of d_fileno
to 64-bits, increase the size of d_namlen to 16-bits, and change
the required alignment.  Increase struct statfs f_mntfromname[] and
f_mntonname[] array length MNAMELEN to 1024.

ABI breakage is mitigated by providing compatibility using versioned
symbols, ingenious use of the existing padding in structures, and
by employing other tricks.  Unfortunately, not everything can be
fixed, especially outside the base system.  For instance, third-party
APIs which pass struct stat around are broken in backward and
forward incompatible ways.

Kinfo sysctl MIBs ABI is changed in backward-compatible way, but
there is no general mechanism to handle other sysctl MIBS which
return structures where the layout has changed. It was considered
that the breakage is either in the management interfaces, where we
usually allow ABI slip, or is not important.

Struct xvnode changed layout, no compat shims are provided.

For struct xtty, dev_t tty device member was reduced to uint32_t.
It was decided that keeping ABI compat in this case is more useful
than reporting 64-bit dev_t, for the sake of pstat.

Update note: strictly follow the instructions in UPDATING.  Build
and install the new kernel with COMPAT_FREEBSD11 option enabled,
then reboot, and only then install new world.

Credits: The 64-bit inode project, also known as ino64, started life
many years ago as a project by Gleb Kurtsou (gleb).  Kirk McKusick
(mckusick) then picked up and updated the patch, and acted as a
flag-waver.  Feedback, suggestions, and discussions were carried
by Ed Maste (emaste), John Baldwin (jhb), Jilles Tjoelker (jilles),
and Rick Macklem (rmacklem).  Kris Moore (kris) performed an initial
ports investigation followed by an exp-run by Antoine Brodin (antoine).
Essential and all-embracing testing was done by Peter Holm (pho).
The heavy lifting of coordinating all these efforts and bringing the
project to completion were done by Konstantin Belousov (kib).

Sponsored by:	The FreeBSD Foundation (emaste, kib)
Differential revision:	https://reviews.freebsd.org/D10439
2017-05-23 09:29:05 +00:00
Gleb Smirnoff
33c6ba0c65 Fix regression in ndis(4) after r286410. This adds a bunch of checks for
whether this is a Ethernet or 802.11 device and does proper dereferencing.

PR:		213237
Submitted by:	<ota j.email.ne.jp>
MFC after:	2 weeks
2017-05-22 20:00:01 +00:00
Ed Maste
bd309b323a Regen sysent after r318634, no open(2) in capability mode
Sponsored by:	The FreeBSD Foundation
2017-05-22 11:45:45 +00:00
Ed Maste
68fc8f3934 disallow open(2) in capability mode
Previously open(2) was allowed in capability mode, with a comment that
suggested this was likely the case to facilitate debugging. The system
call would still fail later on, but it's better to disallow the syscall
altogether.

We now have the kern.trap_enotcap sysctl or PROC_TRAPCAP_CTL proccontrol
to aid in debugging.

In any case libc has translated open() to the openat syscall since
r277032.

Reviewed by:	kib, rwatson
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D10850
2017-05-22 11:43:19 +00:00
Mark Johnston
d6c8335623 Add get_cpu() and put_cpu().
MFC after:	1 week
2017-05-21 00:06:36 +00:00
Mark Johnston
02fb845bbf Fix a few uses of kern_yield() in the TTM and the LinuxKPI.
kern_yield(0) effectively causes the calling thread to be rescheduled
immediately since it resets the thread's priority to the highest possible
value. This can cause livelocks when the pattern
"while (!trylock()) kern_yield(0);" is used since the thread holding the
lock may linger on the runqueue for the CPU on which the looping thread is
running.

MFC after:	1 week
2017-05-18 18:35:14 +00:00
Hans Petter Selasky
d8e073a985 Fix init order in the LinuxKPI for RCU support.
CPU_FOREACH() is not available until SI_SUB_CPU at SI_ORDER_ANY
when the LinuxKPI is loaded as part of the kernel.

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-05-09 12:51:42 +00:00
Mahdi Mokhtari
906ba87284 Fix linprocfs_docpuinfo() output regarding to what newer Linux apps expect
Reviewed by:	trasz
Approved by:	trasz
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D10274
2017-05-06 17:37:01 +00:00
Brooks Davis
e9f32d1dc4 Regent post r317845.
MFC after:	1 week
MFC with:	r317845
Sponsored by:	DARPA, AFRL
2017-05-05 18:50:22 +00:00
Brooks Davis
f19351aad8 Provide a freebsd32 implementation of sigqueue()
The previous misuse of sys_sigqueue() was sending random register or
stack garbage to 64-bit targets.  The freebsd32 implementation preserves
the sival_int member of value when signaling a 64-bit process.

Document the mixed ABI implementation of union sigval and the
incompability of sival_ptr with pointer integrity schemes.

Reviewed by:	kib, wblock
MFC after:	1 week
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D10605
2017-05-05 18:49:39 +00:00
Mark Johnston
6c10623340 Use pmap_invalidate_cache() to implement wbinvd_on_all_cpus().
Suggested by:	jhb
X-MFC with:	r317651
2017-05-05 17:22:00 +00:00
Hans Petter Selasky
6796081682 Fix for use after free in the LinuxKPI.
Background:
The same VM object might be shared by multiple processes and the
mm_struct is usually freed when a process exits.

Grab a reference on the mm_struct while the vmap is in the
linux_vma_head list in case the first process which inserted a VM
object has exited.

Tested by:		kwm @
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-05-05 14:09:44 +00:00
Mark Johnston
c12488bbe0 Add on_each_cpu() and wbinvd_on_all_cpus().
Reviewed by:	hselasky
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D10550
2017-05-01 16:32:28 +00:00
Dmitry Chagin
2ca5d34d20 Fix NULL pointer dereference in futex_wake_op() in case when the same
address specified for arguments uaddr and uaddr2.

PR:		218987
Reported by:	luke.tw gmail
MFC after:	1 week
2017-05-01 12:25:37 +00:00
Dmitry Chagin
c4f2941e0b Fix symlinkat() which use the newdfd argument to look up the old path,
while it should use it for the new path instead.

Reported by:	trasz@
MFC after:	1 month
2017-04-30 05:56:57 +00:00
Hans Petter Selasky
a8c348db51 Prefer to use real virtual address over direct map address in the
linux_page_address() function in the LinuxKPI. This solves an issue
where the return value from linux_page_address() is passed to
kmem_free().

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-04-27 14:29:21 +00:00
Dmitry Chagin
25ada63736 Map Linux CLOCK_BOOTTIME to native CLOCK_UPTIME.
MFC after:	1 week
2017-04-23 07:57:30 +00:00
Dmitry Chagin
6e1d05bbd7 Add Evdev ioctl handler to the Linuxulator.
PR:		218627
Submitted by:	Jan Kokemüller
Reported by:	Jan Kokemüller
MFC after:	1 week
2017-04-23 07:43:50 +00:00
Mark Johnston
b602c283b3 Drop Giant before sleeping in linux_wait_for_{timeout_,}common().
Reported and tested by:	Pete Wright <pete@nomadlogic.org>
Reviewed by:	hselasky (previous version)
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D10414
2017-04-19 16:12:02 +00:00
Hans Petter Selasky
a1be2ead3a Use __typeof() instead of typeof() in some RCU related macros in the LinuxKPI.
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-04-19 13:04:34 +00:00
Hans Petter Selasky
f3de9af633 Fix problem regarding priority inversion when using the concurrency
kit, CK, in the LinuxKPI.

When threads are pinned to a CPU core or when there is only one CPU,
it can happen that a higher priority thread can call the CK
synchronize function while a lower priority thread holds the read
lock. Because the CK's synchronize is a simple wait loop this can lead
to a deadlock situation. To solve this problem use the recently
introduced CK's wait callback function.

When detecting a CK blocking condition figure out the lowest priority
among the blockers and update the calling thread's priority and
yield. If another CPU core is holding the read lock, pin the thread to
the blocked CPU core and update the priority. The calling threads
priority and CPU bindings are restored before return.

If a thread holding a CK read lock is detected to be sleeping, pause()
will be used instead of yield().

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-04-19 13:03:29 +00:00
Hans Petter Selasky
7a742c41cf Zero number of CPUs should be translated into the default number of
CPUs when allocating a LinuxKPI workqueue. This also ensures that the
created taskqueue always have a non-zero number of worker threads.

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-04-19 11:38:07 +00:00
Ed Maste
f0e56c1f62 Remove trailing whitespace from r317061 2017-04-17 18:57:26 +00:00
Gleb Smirnoff
83c9dea1ba - Remove 'struct vmmeter' from 'struct pcpu', leaving only global vmmeter
in place.  To do per-cpu stats, convert all fields that previously were
  maintained in the vmmeters that sit in pcpus to counter(9).
- Since some vmmeter stats may be touched at very early stages of boot,
  before we have set up UMA and we can do counter_u64_alloc(), provide an
  early counter mechanism:
  o Leave one spare uint64_t in struct pcpu, named pc_early_dummy_counter.
  o Point counter(9) fields of vmmeter to pcpu[0].pc_early_dummy_counter,
    so that at early stages of boot, before counters are allocated we already
    point to a counter that can be safely written to.
  o For sparc64 that required a whole dummy pcpu[MAXCPU] array.

Further related changes:
- Don't include vmmeter.h into pcpu.h.
- vm.stats.vm.v_swappgsout and vm.stats.vm.v_swappgsin changed to 64-bit,
  to match kernel representation.
- struct vmmeter hidden under _KERNEL, and only vmstat(1) is an exclusion.

This is based on benno@'s 4-year old patch:
https://lists.freebsd.org/pipermail/freebsd-arch/2013-July/014471.html

Reviewed by:	kib, gallatin, marius, lidl
Differential Revision:	https://reviews.freebsd.org/D10156
2017-04-17 17:34:47 +00:00
Gleb Smirnoff
9ed01c32e0 All these files need sys/vmmeter.h, but now they got it implicitly
included via sys/pcpu.h.
2017-04-17 17:07:00 +00:00
Gleb Smirnoff
6286dc78d4 Remove unneeded include of vm_phys.h. 2017-04-17 16:51:04 +00:00
Conrad Meyer
c6943f3abc linux_ioctl: Refactor some v4l2 struct converters
According to the C standard, it is invalid to copy beyond the end of an
object, even if that object is obviously a member of a larger object (a
struct, in this case).

Appease the standard and Coverity by refactoring the copy in a
straightforward way.  No functional change.

Reported by:	Coverity (CWE-120)
CIDs:		1007819, 1007820, 1007821, 1007822, 1009668, 1009669
Security:	no (false positive detection)
Sponsored by:	Dell EMC Isilon
2017-04-13 17:34:51 +00:00
Olivier Houchard
7e8cd4e1af Import CK as of commit 6b141c0bdd21ce8b3e14147af8f87f22b20ecf32
This brings us changes we needed in ck_epoch.
2017-04-09 21:02:05 +00:00
Tai-hwa Liang
113bb55f71 Adding SIOCGIFNAME support in Linuxulator. This should silence the console warning associated
with linux-opera:
	linux: pid 23492 (opera): ioctl fd=5, cmd=0x8910 ('\M^I',16) is not implemented
	linux: pid 23492 (opera): ioctl fd=28, cmd=0x8910 ('\M^I',16) is not implemented
	...

Reviewed by:	kib, marcel, dchagin
Tested with:	linux-opera-12.16_3
MFC after:	1 month
2017-04-09 15:27:04 +00:00
Hans Petter Selasky
76fe8c9330 Fix compilation of LinuxKPI for PowerPC.
Found by:		emaste @
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-04-09 14:31:41 +00:00
Hans Petter Selasky
22cbd6ef2e Create the LinuxKPI current task structure on the fly if it doesn't
exist when the current macro is used.

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-04-07 14:43:28 +00:00
Hans Petter Selasky
99e690772a The __stringify() macro in the LinuxKPI should expand any macros
before stringifying.

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-04-07 12:27:49 +00:00