freebsd-skq

Author	SHA1	Message	Date
trasz	aea1562f16	Add new unmount(2) flag, MNT_NONBUSY, to check whether there are any open vnodes before proceeding. Make autounmound(8) use this flag. Without it, even an unsuccessfull unmount causes filesystem flush, which interferes with normal operation. Reviewed by: kib@ Approved by: re (gjb@) MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D7047	2016-07-07 09:03:57 +00:00
nwhitehorn	89d01c24d1	Replace a number of conflations of mp_ncpus and mp_maxid with either mp_maxid or CPU_FOREACH() as appropriate. This fixes a number of places in the kernel that assumed CPU IDs are dense in [0, mp_ncpus) and would try, for example, to run tasks on CPUs that did not exist or to allocate too few buffers on systems with sparse CPU IDs in which there are holes in the range and mp_maxid > mp_ncpus. Such circumstances generally occur on systems with SMT, but on which SMT is disabled. This patch restores system operation at least on POWER8 systems configured in this way. There are a number of other places in the kernel with potential problems in these situations, but where sparse CPU IDs are not currently known to occur, mostly in the ARM machine-dependent code. These will be fixed in a follow-up commit after the stable/11 branch. PR: kern/210106 Reviewed by: jhb Approved by: re (glebius)	2016-07-06 14:09:49 +00:00
glebius	2b2f782761	The paradigm of a callout is that it has three consequent states: not scheduled -> scheduled -> running -> not scheduled. The API and the manual page assume that, some comments in the code assume that, and looks like some contributors to the code also did. The problem is that this paradigm isn't true. A callout can be scheduled and running at the same time, which makes API description ambigouous. In such case callout_stop() family of functions/macros should return 1 and 0 at the same time, since it successfully unscheduled future callout but the current one is running. Before this change we returned 1 in such a case, with an exception that if running callout was migrating we returned 0, unless CS_MIGRBLOCK was specified. With this change, we now return 0 in case if future callout was unscheduled, but another one is still in action, indicating to API users that resources are not yet safe to be freed. However, the sleepqueue code relies on getting 1 return code in that case, and there already was CS_MIGRBLOCK flag, that covered one of the edge cases. In the new return path we will also use this flag, to keep sleepqueue safe. Since the flag CS_MIGRBLOCK doesn't block migration and now isn't limited to migration edge case, rename it to CS_EXECUTING. This change fixes panics on a high loaded TCP server. Reviewed by: jch, hselasky, rrs, kib Approved by: re (gjb) Differential Revision: https://reviews.freebsd.org/D7042	2016-07-05 18:47:17 +00:00
glebius	0417c9be8f	Compile in the kassert_panic() function with INVARIANT_SUPPORT option, not INVARIANTS. The function is required if we want to load in a module that is compiled with INVARIANTS. Reviewed by: jhb Approved by: re (gjb)	2016-07-05 18:34:34 +00:00
markj	d6ae30f932	Ensure that spinlock sections are balanced even after a panic. vpanic() uses spinlock_enter() to disable interrupts before dumping core. However, when the scheduler is stopped and INVARIANTS is not configured, thread_lock() does not acquire a spinlock section, while thread_unlock() releases one. This can result in interrupts staying enabled while the kernel dumps core, complicating post-mortem analysis of the crash. Approved by: re (gjb) MFC after: 1 week Sponsored by: EMC / Isilon Storage Division	2016-07-05 17:59:04 +00:00
rwatson	0696f7cf29	Call audit hooks to capture vnode attributes for three file-descriptor method implementations: fstat(2), close(2), and poll(2). This change synchronises auditing here with similar auditing for VFS-specific system calls such as stat(2) that audit more complete vnode information. Sponsored by: DARPA, AFRL Approved by: re (kib) MFC after: 1 week	2016-07-05 16:37:01 +00:00
emaste	c769866ed1	add description for debug.elf{32,64}_legacy_coredump sysctl Approved by: re (kib) MFC after: 1 week Sponsored by: The FreeBSD Foundation	2016-07-05 14:46:06 +00:00
kib	2ff8e9c636	Provide helper macros to detect 'non-silent SBDRY' state and to calculate appropriate return value for stops. Simplify the code by using them. Fix typo in sig_suspend_threads(). The thread which sleep must be aborted is td2. () In issignal(), when handling stopping signal for thread in TD_SBDRY_INTR state, do not stop, this is wrong and fires assert. This is yet another place where execution should be forced out of SBDRY-protected region. For such case, return -1 from issignal() and translate it to corresponding error code in sleepq_catch_signals(). Assert that other consumers of cursig() are not affected by the new return value. () Micro-optimize, mostly VFS and VOP methods, by avoiding calling the functions when SIGDEFERSTOP_NOP non-change is requested. (*) Reported and tested by: pho () Requested by: bde (**) Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Approved by: re (gjb)	2016-07-03 18:19:48 +00:00
kib	c31bb3499e	Remove racy assert. The thread which changes vnode usecount from 0 to 1 does it under the vnode interlock, but the interlock is not owned by the asserting thread. As result, we might read increased use counter but also still see VI_OWEINACT. In collaboration with: nwhitehorn Hardware donated by: IBM LTC Sponsored by: The FreeBSD Foundation (kib) Approved by: re (gjb)	2016-07-03 01:56:48 +00:00
kib	38d067a317	When a process knote was attached to the process which is already exiting, the knote is activated immediately. If the exit1() later activates knotes, such knote is attempted to be activated second time. Detect the condition by zeroed kn_ptr.p_proc pointer, and avoid excessive activation. Before r302235, such knotes were removed from the knlist immediately upon activation. Reported by: truckman Sponsored by: The FreeBSD Foundation Approved by: re (gjb)	2016-07-01 20:11:28 +00:00
kib	42d92058c3	Currently the ntptime code and resettodr() are Giant-locked. In particular, the Giant is supposed to protect against parallel ntp_adjtime(2) invocations. But, for instance, sys_ntp_adjtime() does copyout(9) under Giant and then examines time_status to return syscall result. Since copyout(9) could sleep, the syscall result might be inconsistent. Another and more important issue is that if PPS is configured, hardpps(9) is executed without any protection against the parallel top-level code invocation. Potentially, this may result in the inconsistent state of the ntptime state variables, but I cannot say how serious such distortion is. The non-functional splclock() call in sys_ntp_adjtime() protected against clock interrupts calling hardpps() in the pre-SMP era. Modernize the locking. A mutex protects ntptime data. Due to the hardpps() KPI legitimately serving from the interrupt filters (and e.g. uart(4) does call it from filter), the lock cannot be sleepable mutex if PPS_SYNC is defined. Otherwise, use normal sleepable mutex to reduce interrupt latency. Reviewed by: imp, jhb Sponsored by: The FreeBSD Foundation Approved by: re (gjb) Differential revision: https://reviews.freebsd.org/D6825	2016-06-28 16:43:23 +00:00
kib	8543a46c50	Do not use Giant to prevent parallel calls to CLOCK_SETTIME(). Use private mtx in resettodr(), no implementation of CLOCK_SETTIME() is allowed to sleep. Reviewed by: imp, jhb Sponsored by: The FreeBSD Foundation Approved by: re (gjb) X-Differential revision: https://reviews.freebsd.org/D6825	2016-06-28 16:42:40 +00:00
kib	fb3ea99e64	Complete r302215. TDF_SBDRY \| TDF_SERESTART and TDF_SBDRY \| TDF_SEINTR flags values, unlike TDF_SBDRY, must be treated almost as if TDF_SBDRY is not set for STOP signal delivery. The only difference is that sig_suspend_threads() should abort the sleep instead of doing immediate suspension. Reported by: ngie Sponsored by: The FreeBSD Foundation MFC after: 12 days Approved by: re (gjb)	2016-06-28 16:41:50 +00:00
kib	8dad8cb1ad	Fix userspace build after r302235: do not expose bool field of the structure, change it to int. The real fix is to sanitize user-visible definitions in sys/event.h, e.g. the affected struct knlist is of no use for userspace programs. Reported and tested by: jkim Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Approved by: re (gjb)	2016-06-27 23:34:53 +00:00
kib	15841a7afe	When filt_proc() removes event from the knlist due to the process exiting (NOTE_EXIT->knlist_remove_inevent()), two things happen: - knote kn_knlist pointer is reset - INFLUX knote is removed from the process knlist. And, there are two consequences: - KN_LIST_UNLOCK() on such knote is nop - there is nothing which would block exit1() from processing past the knlist_destroy() (and knlist_destroy() resets knlist lock pointers). Both consequences result either in leaked process lock, or dereferencing NULL function pointers for locking. Handle this by stopping embedding the process knlist into struct proc. Instead, the knlist is allocated together with struct proc, but marked as autodestroy on the zombie reap, by knlist_detach() function. The knlist is freed when last kevent is removed from the list, in particular, at the zombie reap time if the list is empty. As result, the knlist_remove_inevent() is no longer needed and removed. Other changes: In filt_procattach(), clear NOTE_EXEC and NOTE_FORK desired events from kn_sfflags for knote registered by kernel to only get NOTE_CHILD notifications. The flags leak resulted in excessive NOTE_EXEC/NOTE_FORK reports. Fix immediate note activation in filt_procattach(). Condition should be either the immediate CHILD_NOTE activation, or immediate NOTE_EXIT report for the exiting process. In knote_fork(), do not perform racy check for KN_INFLUX before kq lock is taken. Besides being racy, it did not accounted for notes just added by scan (KN_SCAN). Some minor and incomplete style fixes. Analyzed and tested by: Eric Badger <eric@badgerio.us> Reviewed by: jhb Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Approved by: re (gjb) Differential revision: https://reviews.freebsd.org/D6859	2016-06-27 21:52:17 +00:00
kib	13c308de0c	When sleeping waiting for either local or remote advisory lock, interrupt sleeps with the ERESTART on the suspension attempts. Otherwise, single-threading requests are deferred until the locks are granted for NFS files, which causes hangs. When retrying local registration of the remotely-granted adv lock, allow full suspension and check for suspension, for usual reasons. Reported by: markj, pho Reviewed by: jilles Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Approved by: re (gjb)	2016-06-26 20:08:42 +00:00
kib	2b281bf08f	Rewrite sigdeferstop(9) and sigallowstop(9) into more flexible framework allowing to set the suspension policy for the dynamic block. Extend the currently possible policies of stopping on interruptible sleeps and ignoring such sleeps by two more: do not suspend at interruptible sleeps, but interrupt them with either EINTR or ERESTART. Reviewed by: jilles Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Approved by: re (gjb)	2016-06-26 20:07:24 +00:00
kib	72cb649e6e	Do not clear robust lists pointers on fork. The forked child thread lists must be functional. Reported by: Daniel Engberg <daniel.engberg.lists@pyret.net>, Guy Yur <guyyur@gmail.com> Tested by: Guy Yur <guyyur@gmail.com> Sponsored by: The FreeBSD Foundation Approved by: re (gjb), including the KBI change	2016-06-25 11:31:25 +00:00
jilles	0c4cfbe0d1	posixshm: Fix lock leak when mac_posixshm_check_read rejects read. While reading the code, I noticed that shm_read() returns without unlocking foffset and rangelock if mac_posixshm_check_read() rejects the read. Reviewed by: kib, jhb, rwatson Approved by: re (gjb) MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D6927	2016-06-23 20:59:13 +00:00
brooks	251cde9f5a	Generate syscall tables and update pipe() implementation after r302094. Mark the pipe() system call as COMPAT10. As of r302092 libc uses pipe2() with a zero flags value instead of pipe(). Approved by: re (gjb) Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D6816	2016-06-22 21:18:19 +00:00
brooks	e4ef2142c8	Mark the pipe() system call as COMPAT10. As of r302092 libc uses pipe2() with a zero flags value instead of pipe(). Commit with regenerated files and implementation to follow. Approved by: re (gjb) Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D6816	2016-06-22 21:15:59 +00:00
brooks	a11d117db1	Add support for COMPAT10 keywords in syscalls.master. Approved by: re (gjb) Sponsored by: DARPA, AFRL	2016-06-22 21:12:53 +00:00
jhb	a6270457f8	Account for AIO socket operations in thread/process resource usage. File and disk-backed I/O requests store counts of read/written disk blocks in each AIO job so that they can be charged to the thread that completes an AIO request via aio_return() or aio_waitcomplete(). This change extends AIO jobs to store counts of received/sent messages and updates socket backends to set these counts accordingly. Note that the socket backends are careful to only charge a single messages for each AIO request even though a single request on a blocking socket might invoke sosend or soreceive multiple times. This is to mimic the resource accounting of synchronous read/write. Adjust the UNIX socketpair AIO test to verify that the message resource usage counts update accordingly for aio_read and aio_write. Approved by: re (hrs) Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D6911	2016-06-21 22:19:06 +00:00
bz	7a1c0b1ad1	Get closer to a VIMAGE network stack teardown from top to bottom rather than removing the network interfaces first. This change is rather larger and convoluted as the ordering requirements cannot be separated. Move the pfil(9) framework to SI_SUB_PROTO_PFIL, move Firewalls and related modules to their own SI_SUB_PROTO_FIREWALL. Move initialization of "physical" interfaces to SI_SUB_DRIVERS, move virtual (cloned) interfaces to SI_SUB_PSEUDO. Move Multicast to SI_SUB_PROTO_MC. Re-work parts of multicast initialisation and teardown, not taking the huge amount of memory into account if used as a module yet. For interface teardown we try to do as many of them as we can on SI_SUB_INIT_IF, but for some this makes no sense, e.g., when tunnelling over a higher layer protocol such as IP. In that case the interface has to go along (or before) the higher layer protocol is shutdown. Kernel hhooks need to go last on teardown as they may be used at various higher layers and we cannot remove them before we cleaned up the higher layers. For interface teardown there are multiple paths: (a) a cloned interface is destroyed (inside a VIMAGE or in the base system), (b) any interface is moved from a virtual network stack to a different network stack ("vmove"), or (c) a virtual network stack is being shut down. All code paths go through if_detach_internal() where we, depending on the vmove flag or the vnet state, make a decision on how much to shut down; in case we are destroying a VNET the individual protocol layers will cleanup their own parts thus we cannot do so again for each interface as we end up with, e.g., double-frees, destroying locks twice or acquiring already destroyed locks. When calling into protocol cleanups we equally have to tell them whether they need to detach upper layer protocols ("ulp") or not (e.g., in6_ifdetach()). Provide or enahnce helper functions to do proper cleanup at a protocol rather than at an interface level. Approved by: re (hrs) Obtained from: projects/vnet Reviewed by: gnn, jhb Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D6747	2016-06-21 13:48:49 +00:00
kib	06125ebef5	Fix typo. Note that atomic is still required even for interlocked case. Sponsored by: The FreeBSD Foundation Approved by: re (marius)	2016-06-20 15:45:50 +00:00
mjg	ed393257f0	vfs: ifdef out noop vop_* primitives on !DEBUG_VFS_LOCKS kernels This removes calls to empty functions like vop_lock_{pre/post} from common vfs routines. Approved by: re (gjb)	2016-06-17 19:41:30 +00:00
kib	40488ecf84	Add VFS interface to flush specified amount of free vnodes belonging to mount points with the given filesystem type, specified by mount vfs_ops pointer. Based on patch by: mckusick Reviewed by: avg, mckusick Tested by: allanjude, madpilot Sponsored by: The FreeBSD Foundation Approved by: re (gjb)	2016-06-17 17:33:25 +00:00
kib	496a3b1f65	Update comments for the MD functions managing contexts for new threads, to make it less confusing and using modern kernel terms. Rename the functions to reflect current use of the functions, instead of the historic KSE conventions: cpu_set_fork_handler -> cpu_fork_kthread_handler (for kthreads) cpu_set_upcall -> cpu_copy_thread (for forks) cpu_set_upcall_kse -> cpu_set_upcall (for new threads creation) Reviewed by: jhb (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 week Approved by: re (hrs) Differential revision: https://reviews.freebsd.org/D6731	2016-06-16 12:05:44 +00:00
kib	2c44599f9c	Remove XXX comments from kern_thread.c. In one case, there is no reason for it in modern times. In the other case, expand the comment stating instead of doubting. Reviewed by: jhb Sponsored by: The FreeBSD Foundation MFC after: 1 week Approved by: re (hrs) X-Differential revision: https://reviews.freebsd.org/D6731	2016-06-16 12:01:11 +00:00
kib	aea5f52dbc	Remove code duplication. Reviewed by: jhb Sponsored by: The FreeBSD Foundation MFC after: 1 week Approved by: re (hrs) X-Differential revision: https://reviews.freebsd.org/D6731	2016-06-16 11:58:46 +00:00
jhb	593d3fb218	Move backend-specific fields of kaiocb into a union. This reduces the size of kaiocb slightly. I've also added some generic fields that other backends can use in place of the BIO-specific fields. Change the socket and Chelsio DDP backends to use 'backend3' instead of abusing _aiocb_private.status directly. This confines the use of _aiocb_private to the AIO internals in vfs_aio.c. Reviewed by: kib (earlier version) Approved by: re (gjb) Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D6547	2016-06-15 20:56:45 +00:00
kib	22f8dd6d9b	Do not assume that we own the use reference on the covered vnode until we set MNTK_UNMOUNT flag on the mp. Otherwise parallel unmount which wins race with us could dereference the covered vnode, and we are left with the locked freed memory. Reported and tested by: pho Sponsored by: The FreeBSD Foundation Approved by: re (gjb) MFC after: 1 week	2016-06-15 15:56:03 +00:00
jamie	aad828cb49	Fix a vnode leak when giving a child jail a too-long path when debug.disablefullpath=1.	2016-06-09 21:59:11 +00:00
jamie	786e18e298	Re-order some jail parameter reading to prevent a vnode leak.	2016-06-09 20:43:14 +00:00
jamie	91dc125e2d	Clean up some logic in jail error messages, replacing a missing test and a redundant test with a single correct test.	2016-06-09 20:39:57 +00:00
oshogbo	57dc7171d0	Define tunable instead of using CTLFLAG_RWTUN flag with kern.corefile. The allproc_lock lock used in the sysctl_kern_corefile function is initialized in the procinit function which is called after setting sysctl values at boot. That means if we set kern.corefile at boot we will be trying to use lock with is uninitialized and machine will crash. If we define kern.corefile as tunable instead of using CTFLAG_RWTUN we will not call the sysctl_kern_corefile function and we will not use an uninitialized lock. When machine will boot then we will start using function depending on the lock. Reviewed by: pjd	2016-06-09 20:23:30 +00:00
cem	a9972a75af	Add DDB command "kldstat" It prints much the same information as kldstat(8) without any arguments. Suggested by: jhibbits Sponsored by: EMC / Isilon Storage Division	2016-06-09 18:27:41 +00:00
cem	5ebc37ce13	kvprintf: Pad %c to width, like %s Sponsored by: EMC / Isilon Storage Division	2016-06-09 18:24:51 +00:00
jamie	debe558799	Make sure the OSD methods for jail set and remove can't run concurrently, by holding allprison_lock exclusively (even if only for a moment before downgrading) on all paths that call PR_METHOD_REMOVE. Since they may run on a downgraded lock, it's still possible for them to run concurrently with PR_METHOD_GET, which will need to use the prison lock.	2016-06-09 16:41:41 +00:00
jamie	88eec3fa5b	Remove a comment that was part of copied code, and is misleading in the new location.	2016-06-09 15:34:33 +00:00
markj	cdf94f56b0	Fix some cosmetic issues in kern_fail.c omitted from r296927. Obtained from: Matthew Bryan <matthew.bryan@isilon.com>	2016-06-09 13:17:08 +00:00
kib	ef4ada7b76	Old process credentials for setuid execve must not be dereferenced when the process credentials were not changed. This can happen if an error occured trying to activate the setuid binary. And on error, if new credentials were not yet assigned, they must be freed to not create the leak. Use oldcred == NULL as the predicate to detect credential reassignment. Reported and tested by: pho Sponsored by: The FreeBSD Foundation	2016-06-08 04:37:03 +00:00
oshogbo	17ff1e5b26	Introduce the PD_CLOEXEC for pdfork(2). Reviewed by: mjg	2016-06-08 02:09:14 +00:00
skra	707fd866b9	Remove temporary solution for storing interrupt mapping data as it's not needed after r301451 and follow-ups r301453, r301539. This makes INTRNG clean of all additions related to various buses.	2016-06-07 09:03:27 +00:00
mmel	f66cc0d4f8	INTRNG: As follow up of r301451, implement mapping and configuration of gpio pin interrupts by new way. Note: This removes last consumer of intr_ddata machinery and we remove it in separate commit.	2016-06-07 05:08:24 +00:00
bz	2919ef7927	Implement a `show panic` command to DDB which will helpfully print the panic string again if set, in case it scrolled out of the active window. This avoids having to remember the symbol name. Also add a show callout <addr> command to DDB in order to inspect some struct callout fields in case of panics in the callout code. This may help to see if there was memory corruption or to further ease debugging problems. Obtained from: projects/vnet MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Reviewed by: jhb (comment only on the show panic initally) Differential Revision: https://reviews.freebsd.org/D4527	2016-06-06 20:57:24 +00:00
kib	ef5f88c357	Get rid of struct proc p_sched and struct thread td_sched pointers. p_sched is unused. The struct td_sched is always co-allocated with the struct thread, except for the thread0. Avoid useless indirection, instead calculate td_sched location using simple pointer arithmetic in td_get_sched(9). For thread0, which is statically allocated, create a structure to emulate layout of the dynamic allocation. Reviewed by: jhb (previous version) Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D6711	2016-06-05 17:04:03 +00:00
kib	78ad2d6695	Use ANSI function definition. Sponsored by: The FreeBSD Foundation	2016-06-05 16:55:55 +00:00
skra	612e6958e4	INTRNG - change the way how an interrupt mapping data are provided to the framework in OFW (FDT) case. This is a follow-up to r301451. Differential Revision: https://reviews.freebsd.org/D6634	2016-06-05 16:20:12 +00:00
skra	833dcace9c	(1) Add a new bus method to get a mapping data for an interrupt. BUS_MAP_INTR() is used to get an interrupt mapping data according to provided hints. The hints could be modified afterwards, but only if mapping data was allocated. This method is intended to be called before BUS_ALLOC_RESOURCE(). An interrupt mapping data describes an interrupt - hardware number, type, configuration, cpu binding, and whatever is needed to setup it. (2) Introduce a method which allows storing of an additional data in struct resource to be available for bus drivers. This method is convenient in two ways: - there is no need to rework existing bus drivers as they can simply be extended to provide an additional data, - there is no need to modify any existing bus methods as struct resource is already passed to them as argument and thus stored data is simply accessible by other bus drivers. For now, implement this method only for INTRNG. This is motivated by needs of modern SOCs where hardware initialization is not straightforward and resources descriptions are complex, opaque for everyone but provider, and may vary from SOC to SOC. Typical situation is that one bus driver can fetch a resource description for its child device, but it's opaque for this driver. Another bus driver knows a provider for this kind of resource and can pass this resource description to it. In fact, something like device IVARS would be perfect for that if implemented generally enough. Unfortunatelly, IVARS are usable only by their owners now. Only owner knows its IVARS layout, thus other bus drivers are not able to use them. Differential Revision: https://reviews.freebsd.org/D6632	2016-06-05 16:07:57 +00:00

1 2 3 4 5 ...

14946 Commits