freebsd-skq

Author	SHA1	Message	Date
mjg	e3c2d3a906	Eliminate false sharing in malloc due to statistic collection Currently stats are collected in a MAXCPU-sized array which is not aligned and suffers enormous false-sharing. Fix the problem by utilizing per-cpu allocation. The counter(9) API is not used here as it is too incomplete and does not provide a win over per-cpu zone sized for malloc stats struct. In particular stats are being reported for each cpu separately by just copying what is supposed to be an array element for given cpu. This eliminates significant false-sharing during malloc-heavy tests e.g. on Skylake. See the review for details. Reviewed by: markj Approved by: re (kib) Differential Revision: https://reviews.freebsd.org/D17289	2018-09-23 19:00:06 +00:00
mjg	00ab82889c	select: stop doing zero-sized memsets Approved by: re (kib)	2018-09-21 13:20:41 +00:00
markj	c030a808b9	Ensure that imports into per-domain kmem arenas are KVA_QUANTUM-aligned. The old code appears to assume that vmem_alloc() would import size-aligned KVA chunks from the parent kernel_arena, but vmem doesn't provide this guarantee. Also remove the unused global RWX arena and add comments explaining why we have per-domain arenas. Reported by: alc Reviewed by: alc, kib (previous version) Approved by: re (gjb) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D17249	2018-09-20 18:29:55 +00:00
mjg	846a8dd029	vfs: remove lookup_shared tunable Reviewed by: kib, jhb Approved by: re (gjb) Differential Revision: https://reviews.freebsd.org/D17253	2018-09-20 18:25:26 +00:00
mjg	4279599452	fd: prevent inlining of _fdrop thorough kern_descrip.c fdrop is used in several places in the file and almost never has to call _fdrop. Thus inlining it is a pure waste of space. Approved by: re (kib)	2018-09-20 13:32:40 +00:00
kib	610ca65f57	Fix state of dquot-less vnodes after failed quotaoff. UFS quotaoff iterates over all mp vnodes, and derefences and clears the pointers to corresponding dquots. If SU work items transiently reference some of dquots,quotaoff() would eventually fail, but all processed vnodes are already stripped from dquots. The state is problematic, since quotas are left enabled, but there is no dquots where blocks and inodes can be accounted. The result is assertion failures and NULL pointer dereferences. Fix it by suspending writes around quotaoff() call. Since the filesystem is synced, no dandling references to dquots from SU workitems can left behind, which means that quotaoff succeeds. The complication there is that quotaoff VFS op is performed with the mount point busied, while to suspend, we need to start write on the mp. If vn_start_write() is called on busied mp, system might deadlock against parallel unmount request. Handle this by unbusy-ing mp before starting write, which in turn requires changing the quotaoff() interface to return with the mount point not busied, same as was done for quotaon(). Reviewed by: mckusick Reported and tested by: pho Sponsored by: The FreeBSD Foundation Approved by: re (gjb) MFC after: 1 week Differential revision: https://reviews.freebsd.org/D17208	2018-09-19 14:36:57 +00:00
gordon	8f59931436	Correct ELF header parsing code to prevent invalid ELF sections from disclosing memory. Submitted by: markj Reported by: Thomas Barabosch, Fraunhofer FKIE Approved by: re (implicit) Approved by: so Security: FreeBSD-SA-18:12.elf Security: CVE-2018-6924 Sponsored by: The FreeBSD Foundation	2018-09-12 04:57:34 +00:00
markj	02b7bd84a2	Rename hardclock_cnt() to hardclock() and remove the old implementation. Also remove some related and unused subroutines. They have long been replaced by variants that handle multiple coalesced events with a single call. No functional change intended. Reviewed by: cem, kib Approved by: re (gjb) Differential Revision: https://reviews.freebsd.org/D17029	2018-09-06 02:10:59 +00:00
markj	b1bab99c84	Correct the condition under which we allocate a terminator node. We will have last_block < blocks if the block count is divisible by BLIST_BMAP_RADIX, but a terminator node is still needed if the tree isn't balanced. In this case we were overruning the blist array by 16 bytes during initialization. While here, add a check for the invalid blocks == 0 case. PR: 231116 Reviewed by: alc, kib (previous version), Doug Moore <dougm@rice.edu> Approved by: re (gjb) MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D17020	2018-09-05 19:05:30 +00:00
kib	0a635fe513	Add amd64 mdthread fields needed for the upcoming EFI RT exception handling. This is split into a separate commit from the main change to make it easier to handle possible revert after upcoming KBI freeze. Reviewed by: kevans Sponsored by: The FreeBSD Foundation MFC after: 1 week Approved by: re (rgrimes) Differential revision: https://reviews.freebsd.org/D16972	2018-09-02 21:16:43 +00:00
kib	c277253350	Improve error messages from clock_if.m method failures. Print error message in verbose mode when CLOCK_SETTIME() clock_if.m method failed. For EFIRT RTC clock, add error code for the failure of CLOCK_GETTIME() report. Reviewed by: kevans Sponsored by: The FreeBSD Foundation MFC after: 1 week Approved by: re (rgrimes) Differential revision: https://reviews.freebsd.org/D16972	2018-09-02 20:17:51 +00:00
markm	d8723e8b03	Remove the Yarrow PRNG algorithm option in accordance with due notice given in random(4). This includes updating of the relevant man pages, and no-longer-used harvesting parameters. Ensure that the pseudo-unit-test still does something useful, now also with the "other" algorithm instead of Yarrow. PR: 230870 Reviewed by: cem Approved by: so(delphij,gtetlow) Approved by: re(marius) Differential Revision: https://reviews.freebsd.org/D16898	2018-08-26 12:51:46 +00:00
alc	3799d78beb	Eliminate the arena parameter to kmem_free(). Implicitly this corrects an error in the function hypercall_memfree(), where the wrong arena was being passed to kmem_free(). Introduce a per-page flag, VPO_KMEM_EXEC, to mark physical pages that are mapped in kmem with execute permissions. Use this flag to determine which arena the kmem virtual addresses are returned to. Eliminate UMA_SLAB_KRWX. The introduction of VPO_KMEM_EXEC makes it redundant. Update the nearby comment for UMA_SLAB_KERNEL. Reviewed by: kib, markj Discussed with: jeff Approved by: re (marius) Differential Revision: https://reviews.freebsd.org/D16845	2018-08-25 19:38:08 +00:00
imp	2608f6dbcf	Add a new device flag: DF_ATTACHED_ONCE This flag is set once the device has been successfully attached. When set, it inhibits devmatch from trying to match the device. This in turn allows kldunload to work as expected. Prior to the change, the driver would immediately reload because devmatch had no notion that the driver had once been attached, and therefore shouldn't participate in further matching. Differential Revision: https://reviews.freebsd.org/D16735	2018-08-23 05:06:16 +00:00
imp	485cde8df8	Create devctl freeze/thaw. This adds it to devctl, libdevctl, defines the two IOCTLs and implements the kernel bits. causes any new drivers that are added via kldload to be deferred until a 'thaw' comes in. These do not stack: it is an error to freeze while frozen, or thaw while thawed. Differential Revision: https://reviews.freebsd.org/D16735	2018-08-23 05:05:47 +00:00
cem	c843990202	devstat(9): Constify function parameters that can be const No functional change. When attempting to document the changed argument types in devstat.9, I discovered the 20 year old manual page severely mismatched reality even prior to my simple change. So I took a first cut pass cleaning that up to match reality. I'm sure I've missed some things; the goal was just to leave it better than when I started. Sponsored by: Dell EMC Isilon	2018-08-23 01:42:45 +00:00
cem	ffa2c7d287	KASSERT: Make runtime optionality optional Add an option, KASSERT_PANIC_OPTIONAL, that allows runtime KASSERT() behavior changes. When this option is not enabled, code that allows KASSERTs to become optional is not enabled, and all violated assertions cause termination. The runtime KASSERT behavior was added in r243980. One important distinction here is that panic has __dead2 ("attribute((noreturn))"), while kassert_panic does not. Static analyzers like Coverity understand __dead2. Without it, KASSERTs go misunderstood, resulting in many false positives that result from violation of program invariants. Reviewed by: jhb, jtl, np, vangyzen Relnotes: yes Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D16835	2018-08-22 22:19:42 +00:00
markj	1d715a5168	Prepare the kernel linker to handle PC-relative ifunc relocations. The boot-time ifunc resolver assumes that it only needs to apply IRELATIVE relocations to PLT entries. With an upcoming optimization, this assumption no longer holds, so add the support required to handle PC-relative relocations targeting GNU_IFUNC symbols. - Provide a custom symbol lookup routine that can be used in early boot. The default lookup routine uses kobj, which is not functional at that point. - Apply all existing relocations during boot rather than filtering IRELATIVE relocations. - Ensure that we continue to apply ifunc relocations in a second pass when loading a kernel module. Reviewed by: kib MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D16749	2018-08-22 20:44:30 +00:00
tuexen	3e0ad7d794	Add SOL_SOCKET level socket option with name SO_DOMAIN to get the domain of a socket. This is helpful when testing and Solaris and Linux have the same socket option using the same name. Reviewed by: bcr@, rrs@ Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D16791	2018-08-21 14:04:30 +00:00
alc	71b5b012c4	Eliminate kmem_alloc_contig()'s unused arena parameter. Reviewed by: hselasky, kib, markj Discussed with: jeff Differential Revision: https://reviews.freebsd.org/D16799	2018-08-20 15:57:27 +00:00
kevans	ad8cc64e2b	res_find: Fix fallback logic The fallback logic was broken if hints were found in multiple environments. If we found a hint in either the loader environment or the static environment, fallback would be incremented excessively when we returned to the environment-selection bits. These checks should have also been guarded by the fbacklvl checks. As a result, fbacklvl could quickly get to a point where we skip either the static environment and/or the static hints depending on which environments contained valid hints. The impact of this bug is minimal, mostly affecting mips boards that use static hints and may have hints in either the loader environment or the static environment. There may be better ways to express the searchable environments and describing their characteristics (immutable, already searched, etc.) but this may be revisited after 12 branches. Reported by: Dan Nelson <dnelson_1901@yahoo.com> Triaged by: Dan Nelson <dnelson_1901@yahoo.com> MFC after: 3 days	2018-08-18 19:45:56 +00:00
delphij	2acd1f2a25	Regen after r337998.	2018-08-18 06:33:51 +00:00
delphij	4f62d03ca0	getrandom(2) should not be restricted in capability mode.	2018-08-18 06:31:49 +00:00
markj	d1a00acf4d	Typo. X-MFC with: r337974	2018-08-17 16:07:06 +00:00
markj	4e68a99c04	Add INVARIANTS-only fences around lockless vnode refcount updates. Some internal KASSERTs access the v_iflag field without the vnode interlock held after such a refcount update. The fences are needed for the assertions to be correct in the face of store reordering. Reported and tested by: jhibbits Reviewed by: kib, mjg MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D16756	2018-08-17 15:41:01 +00:00
oshogbo	1aa9b1400a	capsicum: allow the setproctitle(3) function in capability mode Capsicum in past allowed to change the process title. This was broken with r335939. PR: 230584 Submitted by: Yuichiro NAITO <naito.yuichiro@gmail.com> Reported by: ian@niw.com.au MFC after: 1 week	2018-08-17 14:35:10 +00:00
kevans	645507f98f	subr_prf: Don't write kern.boot_tag if it's empty This change allows one to set kern.boot_tag="" and not get a blank line preceding other boot messages. While this isn't super critical- blank lines are easy to filter out both mentally and in processing dmesg later- it allows for a mode of operation that matches previous behavior. I intend to MFC this whole series to stable/11 by the end of the month with boot_tag empty by default to make this effectively a nop in the stable branch.	2018-08-17 03:42:57 +00:00
jamie	6b9aac38ce	Revert r337922, except for some documention-only bits. This needs to wait until user is changed to stop using jail(2). Differential Revision: D14791	2018-08-16 19:09:43 +00:00
jamie	94a36bb7c1	Put jail(2) under COMPAT_FREEBSD11. It has been the "old" way of creating jails since FreeBSD 7. Along with the system call, put the various security.jail.allow_foo and security.jail.foo_allowed sysctls partly under COMPAT_FREEBSD11 (or BURN_BRIDGES). These sysctls had two disparate uses: on the system side, they were global permissions for jails created via jail(2) which lacked fine-grained permission controls; inside a jail, they're read-only descriptions of what the current jail is allowed to do. The first use is obsolete along with jail(2), but keep them for the second-read-only use. Differential Revision: D14791	2018-08-16 18:40:16 +00:00
trasz	5d015ffd7a	In the help message at the mountroot prompt, suggest something that actually works and matches the bsdinstall(8) default. MFC after: 2 weeks Sponsored by: DARPA, AFRL	2018-08-15 12:12:21 +00:00
alc	d20c62f3cb	Eliminate a redundant assignment. MFC after: 1 week	2018-08-11 19:21:53 +00:00
kevans	d0635b16cc	subr_prf: remove think-o that had returned to local patch Reported by: cognet	2018-08-10 15:35:02 +00:00
kevans	46cff726c1	boot tagging: minor fixes msgbufinit may be called multiple times as we initialize the msgbuf into a progressively larger buffer. This doesn't happen as of now on head, but it may happen in the future and we generally support this. As such, only print the boot tag if we've just initialized the buffer for the first time. The boot tag also now has a newline appended to it for better visibility, and has been switched to a normal printf, by requesto f bde, after we've denoted that the msgbuf is mapped.	2018-08-10 15:29:06 +00:00
kevans	135e1f225d	subr_prf: style(9) the sizeof Reported by: jkim, ian	2018-08-09 19:09:06 +00:00
kevans	a4d7516115	subr_prf: Use "sizeof current_boot_tag" instead	2018-08-09 17:53:18 +00:00
kevans	d2718a67f3	BOOT_TAG: Make a config(5) option, expose as sysctl and loader tunable BOOT_TAG lived shortly in sys/msgbuf.h, but this wasn't necessarily great for changing it or removing it. Move it into subr_prf.c and add options for it to opt_printf.h. One can specify both the BOOT_TAG and BOOT_TAG_SZ (really, size of the buffer that holds the BOOT_TAG). We expose it as kern.boot_tag and also add a loader tunable by the same name that we'll fetch upon initialization of the msgbuf. This allows for flexibility and also ensures that there's a consistent way to figure out the boot tag of the running kernel, rather than relying on headers to be in-sync. Prodded super-super-lightly by: imp	2018-08-09 17:47:47 +00:00
kevans	3aecd7a21e	msgbuf: Light detailing (const'ify and bool'itize)	2018-08-09 17:42:27 +00:00
luporl	2f30606f2f	[ppc] Fix kernel panic when using BOOTP_NFSROOT On PowerPC (and possibly other architectures), that doesn't use EARLY_AP_STARTUP, the config task queue may be used initialized. This was observed while trying to mount the root fs from NFS, as reported here: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=230168. This patch has 2 main changes: 1- Perform a basic initialization of qgroup_config, similar to what is done in taskqgroup_adjust, but simpler. This makes qgroup_config ready to be used during NFS root mount. 2- When EARLY_AP_STARTUP is not used, call inm_init() and in6m_init() right before SI_SUB_ROOT_CONF, because bootp needs to send multicast packages to request an IP. PR: Bug 230168 Reported by: sbruno Reviewed by: jhibbits, mmacy, sbruno Approved by: jhibbits Differential Revision: D16633	2018-08-09 14:04:51 +00:00
mmacy	33829b496c	epoch_block_wait: don't check TD_RUNNING struct epoch_thread is not type safe (stack allocated) and thus cannot be dereferenced from another CPU Reported by: novel@	2018-08-09 05:18:27 +00:00
kevans	47b7c5f8ae	kern: Add a BOOT_TAG marker at the beginning of boot dmesg From the "newly licensed to drive" PR department, add a BOOT_TAG marker (by default, --<<BOOT>>--, to the beginning of each boot's dmesg. This makes it easier to do textproc magic to locate the start of each boot and, of particular interest to some, the dmesg of the current boot. The PR has a dmesg(8) component as well that I've opted not to include for the moment- it was the more contentious part of this PR. bde@ also made the statement that this boot tag should be written with an ordinary printf, which I've- for the moment- declined to change about this patch to keep it more transparent to observer of the boot process. PR: 43434 Submitted by: dak <aurelien.nephtali@wanadoo.fr> (basically rewritten) MFC after: maybe never	2018-08-09 01:32:09 +00:00
kib	4ef74e60b2	Followup to r337430: only call elf_reloc_ifunc on x86. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2018-08-07 20:43:50 +00:00
kib	df3033467d	Add missed handling of local relocs against ifunc target in the obj modules. Reported and tested by: wulf Sponsored by: The FreeBSD Foundation MFC after: 1 week	2018-08-07 18:26:46 +00:00
markj	7a979485ab	Improve handling of control message truncation. If a recvmsg(2) or recvmmsg(2) caller doesn't provide sufficient space for all control messages, the kernel sets MSG_CTRUNC in the message flags to indicate truncation of the control messages. In the case of SCM_RIGHTS messages, however, we were failing to dispose of the rights that had already been externalized into the recipient's file descriptor table. Add a new function and mbuf type to handle this cleanup task, and use it any time we fail to copy control messages out to the recipient. To simplify cleanup, control message truncation is now only performed at control message boundaries. The change also fixes a few related bugs: - Rights could be leaked to the recipient process if an error occurred while copying out a message's contents. - We failed to set MSG_CTRUNC if the truncation occurred on a control message boundary, e.g., if the caller received two control messages and provided only the exact amount of buffer space needed for the first. PR: 131876 Reviewed by: ed (previous version) MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D16561	2018-08-07 16:36:48 +00:00
kib	4d78ed404a	Swap in WKILLED processes. Swapped-out process that is WKILLED must be swapped in as soon as possible. The reason is that such process can be killed by OOM and its pages can be only freed if the process exits. To exit, the kernel stack of the process must be mapped. When allocating pages for the stack of the WKILLED process on swap in, use VM_ALLOC_SYSTEM requests to increase the chance of the allocation to succeed. Add counter of the swapped out processes to avoid unneeded iteration over the allprocs list when there is no work to do, reducing the allproc_lock ownership. Reviewed by: alc, markj (previous version) Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D16489	2018-08-04 20:45:43 +00:00
markj	75b64fe9d9	Don't check rcv sockbuf limits when sending on a unix stream socket. sosend_generic() performs an initial comparison of the amount of data (including control messages) to be transmitted with the send buffer size. When transmitting on a unix socket, we then compare the amount of data being sent with the amount of space in the receive buffer size; if insufficient space is available, sbappendcontrol() returns an error and the data is lost. This is easily triggered by sending control messages together with an amount of data roughly equal to the send buffer size, since the control message size may change in uipc_send() as file descriptors are internalized. Fix the problem by removing the space check in sbappendcontrol(), whose only consumer is the unix sockets code. The stream sockets code uses the SB_STOP mechanism to ensure that senders will block if the receive buffer fills up. PR: 181741 MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D16515	2018-08-04 20:26:54 +00:00
markj	c0f926949d	Style.	2018-08-04 20:16:36 +00:00
avg	8d0848ba39	safer wait-free iteration of shared interrupt handlers The code that iterates a list of interrupt handlers for a (shared) interrupt, whether in the ISR context or in the context of an interrupt thread, does so in a lock-free fashion. Thus, the routines that modify the list need to take special steps to ensure that the iterating code has a consistent view of the list. Previously, those routines tried to play nice only with the code running in the ithread context. The iteration in the ISR context was left to a chance. After commit r336635 atomic operations and memory fences are used to ensure that ie_handlers list is always safe to navigate with respect to inserting and removal of list elements. There is still a question of when it is safe to actually free a removed element. The idea of this change is somewhat similar to the idea of the epoch based reclamation. There are some simplifications comparing to the general epoch based reclamation. All writers are serialized using a mutex, so we do not need to worry about concurrent modifications. Also, all read accesses from the open context are serialized too. So, we can get away just two epochs / phases. When a thread removes an element it switches the global phase from the current phase to the other and then drains the previous phase. Only after the draining the removed element gets actually freed. The code that iterates the list in the ISR context takes a snapshot of the global phase and then increments the use count of that phase before iterating the list. The use count (in the same phase) is decremented after the iteration. This should ensure that there should be no iteration over the removed element when its gets freed. This commit also simplifies the coordination with the interrupt thread context. Now we always schedule the interrupt thread when removing one of handlers for its interrupt. This makes the code both simpler and safer as the interrupt thread masks the interrupt thus ensuring that there is no interaction with the ISR context. P.S. This change matters only for shared interrupts and I realize that those are becoming a thing of the past (and quickly). I also understand that the problem that I am trying to solve is extremely rare. PR: 229106 Reviewed by: cem Discussed with: Samy Al Bahra MFC after: 5 weeks Differential Revision: https://reviews.freebsd.org/D15905	2018-08-03 14:27:28 +00:00
asomers	fabe732b5e	Fix LOCAL_PEERCRED with socketpair(2) Enable the LOCAL_PEERCRED socket option for unix domain stream sockets created with socketpair(2). Previously, it only worked with unix domain stream sockets created with socket(2)/listen(2)/connect(2)/accept(2). PR: 176419 Reported by: Nicholas Wilson <nicholas@nicholaswilson.me.uk> MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D16350	2018-08-03 01:37:00 +00:00
avg	d9bc6597cf	fix a typo resulting in a wrong variable in kern_syscall_deregister The difference is between sysent, a global, and sysents, a function parameter.	2018-08-02 09:41:55 +00:00
markj	3a415f27d2	Remove a redundant check. MFC after: 3 days Sponsored by: The FreeBSD Foundation	2018-07-30 17:58:41 +00:00

1 2 3 4 5 ...

16279 Commits