freebsd-skq

Author	SHA1	Message	Date
Matt Macy	3f6cab079c	import linux debugfs support Reviewed by: hps@ MFC after: 1 week Sponsored by: iX Systems Differential Revision: https://reviews.freebsd.org/D19258	2019-02-23 20:56:41 +00:00
Matt Macy	2ce1771c12	linux/fs: simplify interop and correct definition of loff_t - offsets can be negative, loff_t needs to be signed, it also simplifies interop with the rest of the code base to use off_t than the actual linux definition "long long" - don't rely on the defining "file" to "linux_file" in interface definitions as that causes heartache with includes Reviewed by: hps@ MFC after: 1 week Sponsored by: iX Systems Differential Revision: https://reviews.freebsd.org/D19274	2019-02-23 20:45:45 +00:00
Matt Macy	983ed4f9f1	lkpi: allow late binding of linux_alloc_current Some consumers may be loosely coupled with the lkpi. This allows them to call linux_alloc_current without having a static dependency. Reviewed by: hps@ MFC after: 1 week Sponsored by: iX Systems Differential Revision: https://reviews.freebsd.org/D19257	2019-02-22 23:15:32 +00:00
Marius Strobl	f855ec814d	Make taskqgroup_attach{,_cpu}(9) work across architectures So far, intr_{g,s}etaffinity(9) take a single int for identifying a device interrupt. This approach doesn't work on all architectures supported, as a single int isn't sufficient to globally specify a device interrupt. In particular, with multiple interrupt controllers in one system as found on e. g. arm and arm64 machines, an interrupt number as returned by rman_get_start(9) may be only unique relative to the bus and, thus, interrupt controller, a certain device hangs off from. In turn, this makes taskqgroup_attach{,_cpu}(9) and - internal to the gtaskqueue implementation - taskqgroup_attach_deferred{,_cpu}() not work across architectures. Yet in turn, iflib(4) as gtaskqueue consumer so far doesn't fit architectures where interrupt numbers aren't globally unique. However, at least for intr_setaffinity(..., CPU_WHICH_IRQ, ...) as employed by the gtaskqueue implementation to bind an interrupt to a particular CPU, using bus_bind_intr(9) instead is equivalent from a functional point of view, with bus_bind_intr(9) taking the device and interrupt resource arguments required for uniquely specifying a device interrupt. Thus, change the gtaskqueue implementation to employ bus_bind_intr(9) instead and intr_{g,s}etaffinity(9) to take the device and interrupt resource arguments required respectively. This change also moves struct grouptask from <sys/_task.h> to <sys/gtaskqueue.h> and wraps struct gtask along with the gtask_fn_t typedef into #ifdef _KERNEL as userland likes to include <sys/_task.h> or indirectly drags it in - for better or worse also with _KERNEL defined -, which with device_t and struct resource dependencies otherwise is no longer as easily possible now. The userland inclusion problem probably can be improved a bit by introducing a _WANT_TASK (as well as a _WANT_MOUNT) akin to the existing _WANT_PRISON etc., which is orthogonal to this change, though, and likely needs an exp-run. While at it: - Change the gt_cpu member in the grouptask structure to be of type int as used elswhere for specifying CPUs (an int16_t may be too narrow sooner or later), - move the gtaskqueue_enqueue_fn typedef from <sys/gtaskqueue.h> to the gtaskqueue implementation as it's only used and needed there, - change the GTASK_INIT macro to use "gtask" rather than "task" as argument given that it actually operates on a struct gtask rather than a struct task, and - let subr_gtaskqueue.c consistently use __func__ to print functions names. Reported by: mmel Reviewed by: mmel Differential Revision: https://reviews.freebsd.org/D19139	2019-02-12 21:23:59 +00:00
Konstantin Belousov	fa50a3552d	Implement Address Space Layout Randomization (ASLR) With this change, randomization can be enabled for all non-fixed mappings. It means that the base address for the mapping is selected with a guaranteed amount of entropy (bits). If the mapping was requested to be superpage aligned, the randomization honours the superpage attributes. Although the value of ASLR is diminshing over time as exploit authors work out simple ASLR bypass techniques, it elimintates the trivial exploitation of certain vulnerabilities, at least in theory. This implementation is relatively small and happens at the correct architectural level. Also, it is not expected to introduce regressions in existing cases when turned off (default for now), or cause any significant maintaince burden. The randomization is done on a best-effort basis - that is, the allocator falls back to a first fit strategy if fragmentation prevents entropy injection. It is trivial to implement a strong mode where failure to guarantee the requested amount of entropy results in mapping request failure, but I do not consider that to be usable. I have not fine-tuned the amount of entropy injected right now. It is only a quantitive change that will not change the implementation. The current amount is controlled by aslr_pages_rnd. To not spoil coalescing optimizations, to reduce the page table fragmentation inherent to ASLR, and to keep the transient superpage promotion for the malloced memory, locality clustering is implemented for anonymous private mappings, which are automatically grouped until fragmentation kicks in. The initial location for the anon group range is, of course, randomized. This is controlled by vm.cluster_anon, enabled by default. The default mode keeps the sbrk area unpopulated by other mappings, but this can be turned off, which gives much more breathing bits on architectures with small address space, such as i386. This is tied with the question of following an application's hint about the mmap(2) base address. Testing shows that ignoring the hint does not affect the function of common applications, but I would expect more demanding code could break. By default sbrk is preserved and mmap hints are satisfied, which can be changed by using the kern.elf{32,64}.aslr.honor_sbrk sysctl. ASLR is enabled on per-ABI basis, and currently it is only allowed on FreeBSD native i386 and amd64 (including compat 32bit) ABIs. Support for additional architectures will be added after further testing. Both per-process and per-image controls are implemented: - procctl(2) adds PROC_ASLR_CTL/PROC_ASLR_STATUS; - NT_FREEBSD_FCTL_ASLR_DISABLE feature control note bit makes it possible to force ASLR off for the given binary. (A tool to edit the feature control note is in development.) Global controls are: - kern.elf{32,64}.aslr.enable - for non-fixed mappings done by mmap(2); - kern.elf{32,64}.aslr.pie_enable - for PIE image activation mappings; - kern.elf{32,64}.aslr.honor_sbrk - allow to use sbrk area for mmap(2); - vm.cluster_anon - enables anon mapping clustering. PR: 208580 (exp runs) Exp-runs done by: antoine Reviewed by: markj (previous version) Discussed with: emaste Tested by: pho MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D5603	2019-02-10 17:19:45 +00:00
Konstantin Belousov	a7f67facdf	Normalize the declaration of i386_read_exec variable. It is currently re-declared in sys/sysent.h which is a wrong place for MD variable. Which causes redeclaration error with gcc when sys/sysent.h and machine/md_var.h are included both. Remove it from sys/sysent.h and instead include machine/md_var.h when needed, under #ifdef for both i386 and amd64. Reported and tested by: bde Sponsored by: The FreeBSD Foundation MFC after: 1 week	2019-02-09 03:51:51 +00:00
Andriy Voskoboinyk	a99bdc110b	Fix compilation with 'option NDISAPI + device ndis' and without 'device pccard' in the kernel config file. PR: 171532 Reported by: Robert Bonomi <bonomi@host128.r-bonomi.com> MFC after: 1 week	2019-01-30 11:40:12 +00:00
Hans Petter Selasky	232028b34e	Add full support for PCI_ANY_ID when matching PCI IDs in the LinuxKPI. MFC after: 1 week Sponsored by: Mellanox Technologies	2019-01-25 20:13:28 +00:00
Oleksandr Tymoshenko	1715256316	[ndis] Fix unregistered use of FPU by NDIS in kernel on amd64 amd64 miniport drivers are allowed to use FPU which triggers "Unregistered use of FPU in kernel" panic. Wrap all variants of MSCALL with fpu_kern_enter/fpu_kern_leave. To reduce amount of allocations/deallocations done via fpu_kern_alloc_ctx/fpu_kern_free_ctx maintain cache of fpu_kern_ctx elements. Based on the patch by Paul B Mahol PR: 165622 Submitted by: Vlad Movchan <vladislav.movchan@gmail.com> MFC after: 1 month	2019-01-22 03:53:42 +00:00
Ed Maste	347a8ed1bf	linuxulator: fix stack memory disclosure in linux_sigaltstack Most siginfo_to_lsiginfo callers already zeroed the l_siginfo_t before callit it, but linux_waitid did not. Instead of zeroing in the called function to address linux_waitid (as in commit 2e6ebe70), just do it in linux_waitid. admbugs: 765 Reported by: Vlad Tsyrklevich <vlad@tsyrklevich.net> Reviewed by: Andrew MFC after: 1 day Security: Kernel stack memory disclosure Sponsored by: The FreeBSD Foundation	2019-01-21 17:12:16 +00:00
Ed Maste	9866e7bbae	linuxulator: fix stack memory disclosure in linux_ioctl_termio admbugs: 765 Reported by: Vlad Tsyrklevich <vlad@tsyrklevich.net> Reviewed by: andrew MFC after: 1 day Security: Kernel stack memory disclosure Sponsored by: The FreeBSD Foundation	2019-01-21 16:21:03 +00:00
Ed Maste	4308a37410	linuxulator: fix stack memory disclosure in linux_ioctl_v4l admbugs: 765 Reported by: Vlad Tsyrklevich <vlad@tsyrklevich.net> Reviewed by: andrew MFC after: 1 day Security: Kernel stack memory disclosure Sponsored by: The FreeBSD Foundation	2019-01-21 16:19:02 +00:00
Kirk McKusick	88640c0e8b	Create new EINTEGRITY error with message "Integrity check failed". An integrity check such as a check-hash or a cross-correlation failed. The integrity error falls between EINVAL that identifies errors in parameters to a system call and EIO that identifies errors with the underlying storage media. EINTEGRITY is typically raised by intermediate kernel layers such as a filesystem or an in-kernel GEOM subsystem when they detect inconsistencies. Uses include allowing the mount(8) command to return a different exit value to automate the running of fsck(8) during a system boot. These changes make no use of the new error, they just add it. Later commits will be made for the use of the new error number and it will be added to additional manual pages as appropriate. Reviewed by: gnn, dim, brueffer, imp Discussed with: kib, cem, emaste, ed, jilles Differential Revision: https://reviews.freebsd.org/D18765	2019-01-17 06:35:45 +00:00
Gleb Smirnoff	396694153f	Fix compilation failures on different arches that have vm_machdep.c not aware of counter_u64_t by including counter.h into uma_int.h. I'm not happy about this inclusion, but it fixes compilation ASAP.	2019-01-15 19:33:47 +00:00
Gleb Smirnoff	2efcc8cbca	Make uz_allocs, uz_frees and uz_fails counter(9). This removes some atomic updates and reduces amount of data protected by zone lock. During startup point these fields to EARLY_COUNTER. After startup allocate them for all early zones. Tested by: pho	2019-01-15 18:24:34 +00:00
Olivier Houchard	21fb66241a	Regenerate sysent files after having modified syscalls.master.	2019-01-13 00:38:55 +00:00
Olivier Houchard	2ca357528f	amd64 is the only arch that doesn't require padding for 32bits syscalls, so instead of listing every arch thar requires it, just exclude amd64.	2019-01-13 00:37:31 +00:00
Gleb Smirnoff	a68cc38879	Mechanical cleanup of epoch(9) usage in network stack. - Remove macros that covertly create epoch_tracker on thread stack. Such macros a quite unsafe, e.g. will produce a buggy code if same macro is used in embedded scopes. Explicitly declare epoch_tracker always. - Unmask interface list IFNET_RLOCK_NOSLEEP(), interface address list IF_ADDR_RLOCK() and interface AF specific data IF_AFDATA_RLOCK() read locking macros to what they actually are - the net_epoch. Keeping them as is is very misleading. They all are named FOO_RLOCK(), while they no longer have lock semantics. Now they allow recursion and what's more important they now no longer guarantee protection against their companion WLOCK macros. Note: INP_HASH_RLOCK() has same problems, but not touched by this commit. This is non functional mechanical change. The only functionally changed functions are ni6_addrs() and ni6_store_addrs(), where we no longer enter epoch recursively. Discussed with: jtl, gallatin	2019-01-09 01:11:19 +00:00
Mark Johnston	bb376a990c	Specify the correct option level when emulating SO_PEERCRED. Our equivalent to SO_PEERCRED, LOCAL_PEERCRED, is implemented at socket option level 0, not SOL_SOCKET. PR: 234722 Submitted by: Dániel Bakai <bakaidl@gmail.com> MFC after: 2 weeks	2019-01-08 17:21:59 +00:00
Conrad Meyer	85f2a00b34	linuxkpi: Remove extraneous NULL check on M_WAITOK allocation The check was not introduced in r342628, but the subsequent unchecked access to refs was added then, prompting a Coverity warning about "Null pointer dereferences (FORWARD_NULL)." The warning is bogus due to M_WAITOK, but so is the NULL check that hints it, so just remove it. CID: 1398588 Reported by: Coverity	2019-01-01 19:56:49 +00:00
Konstantin Belousov	9362b6a394	Fix 32bit gcc builds after r342625. MFC after: 1 week Sponsored by: Mellanox Technologies	2018-12-30 16:39:26 +00:00
Konstantin Belousov	f823a36e83	Fix linux_destroy_dev() behaviour when there are still files open from the destroying cdev. Currently linux_destroy_dev() waits for the reference count on the linux cdev to drain, and each open file hold the reference. Practically it means that linux_destroy_dev() is blocked until all userspace processes that have the cdev open, exit. FreeBSD devfs does not have such problem, because device refcount only prevents freeing of the cdev memory, and separate 'active methods' counter blocks destroy_dev() until all threads leave the cdevsw methods. After that, attempts to enter cdevsw methods are refused with an error. Implement somewhat similar mechanism for LinuxKPI cdevs. Demote cdev refcount to only mean a hold on the linux cdev memory. Add sirefs count to track both number of threads inside the cdev methods, and for single-bit indicator that cdev is being destroyed. In the later case, the call is redirected to the dummy cdev. Reviewed by: markj Discussed with: hselasky Tested by: zeising MFC after: 1 week Sponsored by: Mellanox Technologies Differential revision: https://reviews.freebsd.org/D18606	2018-12-30 15:46:45 +00:00
Konstantin Belousov	e5a3393a15	Implement zap_vma_ptes() for managed device objects. Reviewed by: markj Discussed with: hselasky Tested by: zeising MFC after: 1 week Sponsored by: Mellanox Technologies Differential revision: https://reviews.freebsd.org/D18606	2018-12-30 15:38:07 +00:00
Konstantin Belousov	069598b941	Use IDX_TO_OFF(). Reviewed by: markj Discussed with: hselasky Tested by: zeising MFC after: 1 week Sponsored by: Mellanox Technologies Differential revision: https://reviews.freebsd.org/D18606	2018-12-30 15:28:31 +00:00
Mateusz Guzik	628888f0e0	Remove iBCS2, part2: general kernel Reviewed by: kib (previous version) Sponsored by: The FreeBSD Foundation	2018-12-19 21:57:58 +00:00
Brooks Davis	10f7b12c13	const poison the `new` pointer of __sysctl. Reviewed by: kib Obtained from: CheriBSD Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D18444	2018-12-18 12:44:38 +00:00
Mateusz Guzik	cc426dd319	Remove unused argument to priv_check_cred. Patch mostly generated with cocinnelle: @@ expression E1,E2; @@ - priv_check_cred(E1,E2,0) + priv_check_cred(E1,E2) Sponsored by: The FreeBSD Foundation	2018-12-11 19:32:16 +00:00
Hans Petter Selasky	ca487c1888	Remove no longer needed ifdefs in the LinuxKPI, after r341787. Differential Revision: https://reviews.freebsd.org/D18450 Reviewed by: kib@ MFC after: 3 days Sponsored by: Mellanox Technologies	2018-12-10 13:41:33 +00:00
Konstantin Belousov	fd52edaf70	Regen.	2018-12-07 15:19:00 +00:00
Konstantin Belousov	d1fd400a80	Add new file handle system calls. Namely, getfhat(2), fhlink(2), fhlinkat(2), fhreadlink(2). The syscalls are provided for a NFS userspace server (nfs-ganesha). Submitted by: Jack Halford <jack@gandi.net> Sponsored by: Gandi.net Tested by: pho Feedback from: brooks, markj MFC after: 1 week Differential revision: https://reviews.freebsd.org/D18359	2018-12-07 15:17:29 +00:00
Hans Petter Selasky	52da588961	Remove redundant declaration after r341517. MFC after: 1 week Sponsored by: Mellanox Technologies	2018-12-05 15:56:44 +00:00
Hans Petter Selasky	6da0d28e6a	Fix some build of LinuxKPI on some platforms after r341518. MFC after: 1 week Sponsored by: Mellanox Technologies	2018-12-05 15:53:34 +00:00
Slava Shwartsman	31c3f64819	mlx5: Fix driver version location Driver description should be set by core and not by the Ethernet driver. Approved by: hselasky (mentor) MFC after: 1 week Sponsored by: Mellanox Technologies	2018-12-05 13:47:10 +00:00
Slava Shwartsman	a9c20af23d	ibcore: ip6_dev_find() needs to know the scope ID. Else the wrong network device can be returned for link-local addresses. Submitted by: hselasky@ Approved by: hselasky (mentor) MFC after: 1 week Sponsored by: Mellanox Technologies	2018-12-05 13:24:43 +00:00
Slava Shwartsman	452d59e130	linuxkpi: Really check if PCI is offline Currently we always return false if for PCI offline query. Try to read PCI config, if the return value if 0xffff probably the PCI is offline. Approved by: hselasky (mentor) MFC after: 1 week Sponsored by: Mellanox Technologies	2018-12-05 13:17:45 +00:00
Slava Shwartsman	92cbd83001	linuxkpi: properly implement netif_carrier_ok(). Submitted by: kib@ Approved by: hselasky (mentor) MFC after: 1 week Sponsored by: Mellanox Technologies	2018-12-05 13:17:15 +00:00
Slava Shwartsman	9c7b53cc65	linuxkpi: Fix for use-after-free when tearing down character devices. Make sure we hold a reference on the character device for every opened file to prevent the character device to be freed prematurely. Submitted by: hselasky@ Approved by: hselasky (mentor) MFC after: 1 week Sponsored by: Mellanox Technologies	2018-12-05 13:16:39 +00:00
Slava Shwartsman	be34cfc587	linuxkpi: implement idr_is_empty() and ida_is_empty(). Submitted by: kib@ Approved by: hselasky (mentor) MFC after: 1 week Sponsored by: Mellanox Technologies	2018-12-05 13:15:57 +00:00
Konstantin Belousov	f186340011	Improve procstat reporting for the linux cdev file descriptors. If there is a vnode attached to the linux file, use it to fill kinfo_file. Otherwise, report a new KF_TYPE_DEV file type, without supplying any type-specific information. KF_TYPE_DEV is supposed to be used by most devfs-specific file types. Sponsored by: Mellanox Technologies MFC after: 1 week	2018-12-03 23:39:45 +00:00
Brooks Davis	f373437a01	Add helper functions to copy strings into struct image_args. Given a zeroed struct image_args with an allocated buf member, exec_args_add_fname() must be called to install a file name (or NULL). Then zero or more calls to exec_args_add_env() followed by zero or more calls to exec_args_add_env(). exec_args_adjust_args() may be called after args and/or env to allow an interpreter to be prepended to the argument list. To allow code reuse when adding arg and env variables, begin_envv should be accessed with the accessor exec_args_get_begin_envv() which handles the case when no environment entries have been added. Use these functions to simplify exec_copyin_args() and freebsd32_exec_copyin_args(). Reviewed by: kib Obtained from: CheriBSD Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D15468	2018-11-29 21:00:56 +00:00
Mark Johnston	792843c38f	Pass malloc flags directly through kevent(2) subroutines. Some kevent functions have a boolean "waitok" parameter for use when calling malloc(9). Replace them with the corresponding malloc() flags: the desired behaviour is known at compile-time, so this eliminates a couple of conditional branches, and makes the code easier to read. No functional change intended. Reviewed by: kib MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D18318	2018-11-24 17:06:01 +00:00
Ben Widawsky	f82dd310bb	linuxkpi: Use pageproc instead of vmproc According to markj@: pageproc contains the page daemon and laundry threads, which are responsible for managing the LRU page queues and writing back dirty pages. vmproc's main task is to swap out kernel stacks when the system is under memory pressure, and swap them back in when necessary. It's a somewhat legacy component of the system and isn't required. You can build a kernel without it by specifying "options NO_SWAPPING" (which is a somewhat misleading name), in which vm_swapout_dummy.c is compiled instead of vm_swapout.c. Based on this, we want pageproc to emulate kswapd, not vmproc. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D18061	2018-11-21 04:34:18 +00:00
Ben Widawsky	5a46107832	linuxkpi: Remove duplicated text Somehow this got botched while moving from git -> svn	2018-11-20 23:05:09 +00:00
Ben Widawsky	c3f4f28c63	linuxkpi: Add some basic swap functions These are used by kms-drm to determine various heuristics relate memory conditions. The number of free swap pages is just a variable, and it can be much cheaper by either adding a new getter, or simply extern'ing swap_total. However, this patch opts to use the more expensive, existing interface - since this isn't an operation in a high per path. This allows us to remove some more gpl linuxkpi and do the follo kms-drm: git rm linuxkpi/gplv2/include/linux/swap.h Reviewed by: mmacy, Johannes Lundberg <johalun0@gmail.com> Approved by: emaste (mentor) Differential Revision: https://reviews.freebsd.org/D18052	2018-11-20 22:49:19 +00:00
Tijl Coosemans	7df0e7beb7	Fix another user address dereference in linux_sendmsg syscall. This was hidden behind the LINUX_CMSG_NXTHDR macro which dereferences its second argument. Stop using the macro as well as LINUX_CMSG_FIRSTHDR. Use the size field of the kernel copy of the control message header to obtain the next control message. PR: 217901 MFC after: 2 days X-MFC-With: r340631	2018-11-20 14:18:57 +00:00
Tijl Coosemans	e3b385fc95	Do proper copyin of control message data in the Linux sendmsg syscall. Instead of calling m_append with a user address, allocate an mbuf cluster and copy data into it using copyin. For the SCM_CREDS case, instead of zeroing a stack variable and appending that to the mbuf, zero part of the mbuf cluster directly. One mbuf cluster is also the size limit used by the FreeBSD sendmsg syscall (uipc_syscalls.c:sockargs()). PR: 217901 Reviewed by: kib MFC after: 3 days	2018-11-19 15:31:54 +00:00
Mateusz Guzik	2c054ce924	proc: always store parent pid in p_oppid Doing so removes the dependency on proctree lock from sysctl process list export which further reduces contention during poudriere -j 128 runs. Reviewed by: kib (previous version) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D17825	2018-11-16 17:07:54 +00:00
Hans Petter Selasky	0df8bab666	Define asm macro in the LinuxKPI. Submitted by: Johannes Lundberg <johalun0@gmail.com> MFC after: 1 week Sponsored by: Mellanox Technologies	2018-11-16 16:23:45 +00:00
Hans Petter Selasky	1799873e3a	Implement ktime_get_ts64() function macro in the LinuxKPI. Submitted by: Johannes Lundberg <johalun0@gmail.com> MFC after: 1 week Sponsored by: Mellanox Technologies	2018-11-16 16:19:16 +00:00
Brooks Davis	5b1df30051	Use the main capabilities.conf for freebsd32. Allow the location of capabilities.conf to be configured. Also allow a per-abi syscall prefix to be configured with the abi_func_prefix syscalls.conf variable and check syscalls against entries in capabilities.conf with and without the prefix amended. Take advantage of these two features to allow use shared capabilities.conf between the default syscall vector and the freebsd32 compatability layer. We've been inconsistent about keeping the two in sync as evidenced by the bugs fixed in r340294. This eliminates that problem going forward. Reviewed by: kib Obtained from: CheriBSD Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D17932	2018-11-14 00:46:02 +00:00

1 2 3 4 5 ...

3158 Commits