freebsd-skq

Author	SHA1	Message	Date
mjg	c5b9ed7f33	vfs: implement usecount implying holdcnt vnodes have 2 reference counts - holdcnt to keep the vnode itself from getting freed and usecount to denote it is actively used. Previously all operations bumping usecount would also bump holdcnt, which is not necessary. We can detect if usecount is already > 1 (in which case holdcnt is also > 1) and utilize it to avoid bumping holdcnt on our own. This saves on atomic ops. Reviewed by: kib Tested by: pho (previous version) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21471	2019-09-03 15:42:11 +00:00
imp	854a74e65c	Implement nvme suspend / resume for pci attachment When we suspend, we need to properly shutdown the NVME controller. The controller may go into D3 state (or may have the power removed), and to properly flush the metadata to non-volatile RAM, we must complete a normal shutdown. This consists of deleting the I/O queues and setting the shutodown bit. We have to do some extra stuff to make sure we reset the software state of the queues as well. On resume, we have to reset the card twice, for reasons described in the attach funcion. Once we've done that, we can restart the card. If any of this fails, we'll fail the NVMe card, just like we do when a reset fails. Set is_resetting for the duration of the suspend / resume. This keeps the reset taskqueue from running a concurrent reset, and also is needed to prevent any hw completions from queueing more I/O to the card. Pass resetting flag to nvme_ctrlr_start. It doesn't need to get that from the global state of the ctrlr. Wait for any pending reset to finish. All queued I/O will get sent to the hardware as part of nvme_ctrlr_start(), though the upper layers shouldn't send any down. Disabling the qpairs is the other failsafe to ensure all I/O is queued. Rename nvme_ctrlr_destory_qpairs to nvme_ctrlr_delete_qpairs to avoid confusion with all the other destroy functions. It just removes the queues in hardware, while the other _destroy_ functions tear down driver data structures. Split parts of the hardware reset function up so that I can do part of the reset in suspsend. Split out the software disabling of the qpairs into nvme_ctrlr_disable_qpairs. Finally, fix a couple of spelling errors in comments related to this. Relnotes: Yes MFC After: 1 week Reviewed by: scottl@ (prior version) Differential Revision: https://reviews.freebsd.org/D21493	2019-09-03 15:26:11 +00:00
markj	0c6a19ddb1	Add preliminary support for atomic updates of per-page queue state. Queue operations on a page use the page lock when updating the page to reflect the desired queue state, and the page queue lock when physically enqueuing or dequeuing a page. Multiple pages share a given page lock, but queue state is per-page; this false sharing results in heavy lock contention. Take a small step towards the use of atomic_cmpset to synchronize updates to per-page queue state by introducing vm_page_pqstate_cmpset() and using it in the page daemon. In the longer term the plan is to stop using the page lock to protect page identity and rely only on the object and page busy locks. However, since the page daemon avoids acquiring the object lock except when necessary, some synchronization with a concurrent free of the page is required. vm_page_pqstate_cmpset() can be used to ensure that queue state updates are successful only if the page is not scheduled for a dequeue, which is sufficient for the page daemon. Add vm_page_swapqueue(), which moves a page from one queue to another using vm_page_pqstate_cmpset(). Use it in the active queue scan, which does not use the object lock. Modify vm_page_dequeue_deferred() to use vm_page_pqstate_cmpset() as well. Reviewed by: kib Discussed with: jeff Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D21257	2019-09-03 14:29:58 +00:00
markj	e4ce801724	Map the vm_page array into KVA on amd64. r351198 allows the kernel to use domain-local memory to back the vm_page array (up to 2MB boundaries) and reserves a separate PML4 entry for that purpose. One consequence of that change is that the vm_page array is no longer present in minidumps, which only adds pages mapped above VM_MIN_KERNEL_ADDRESS. To avoid the friction caused by having kernel data structures mapped below VM_MIN_KERNEL_ADDRESS, map the vm_page array starting at VM_MIN_KERNEL_ADDRESS instead of using a dedicated PML4 entry. Reviewed by: kib Discussed with: jeff Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21491	2019-09-03 13:18:51 +00:00
mjg	3782852571	pseudofs: fix a LOR pfs_node vs pidhash (sleepable after non-sleepable) Sponsored by: The FreeBSD Foundation	2019-09-03 12:54:51 +00:00
avg	d352d0911c	superio: fix the copyright block and update the year MFC after: 2 weeks	2019-09-03 12:40:58 +00:00
mjg	286ae5bd6b	Add sysctlbyname system call Previously userspace would issue one syscall to resolve the sysctl and then another one to actually use it. Do it all in one trip. Fallback is provided in case newer libc happens to be running on an older kernel. Submitted by: Pawel Biernacki Reported by: kib, brooks Differential Revision: https://reviews.freebsd.org/D17282	2019-09-03 04:16:30 +00:00
markj	116c38c27d	Add a sysctl to dump kernel mappings and their properties on amd64. The sysctl is called vm.pmap.kernel_maps. It dumps address ranges and their corresponding protection and mapping mode, as well as counts of 2MB and 1GB pages in the range. Reviewed by: kib MFC after: 2 weeks Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D21380	2019-09-02 21:57:57 +00:00
markj	628e9ea4a8	Replace PMAP_LARGEMAP_MAX_ADDRESS() with a more general predicate. No functional change intended. Reviewed by: kib MFC after: 1 week Sponsored by: The FreeBSD Foundation	2019-09-02 21:54:08 +00:00
tuexen	040eca9a1d	This patch improves the DSACK handling to conform with RFC 2883. The lowest SACK block is used when multiple Blocks would be elegible as DSACK blocks ACK blocks get reordered - while maintaining the ordering of SACK blocks not relevant in the DSACK context is maintained. Reviewed by: rrs@, tuexen@ Obtained from: Richard Scheffenegger MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D21038	2019-09-02 19:04:02 +00:00
trasz	bce0301312	Bump Linux version to 3.2.0. Without it, binaries linked against glibc 2.24 and up (eg Ubuntu 19.04) fail with "FATAL: kernel too old". This alone is not enough to make newer binaries actually work; fix/hack/workaround is pending review at https://reviews.freebsd.org/D20687. Reviewed by: emaste MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D20757	2019-09-02 18:10:35 +00:00
imp	2d33613528	In nvme_completion_poll, add a sanity check to make sure that we complete the polling within a second. Panic if we don't. All the commands that use this interface should typically complete within a few tens to hundreds of microseconds. Panic rather than return ETIMEDOUT because if the command somehow does later complete, it will randomly corrupt memory. Also, it helps to get a traceback from where the unexpected failure happens, rather than an infinite loop.	2019-09-02 17:11:32 +00:00
imp	2672b80261	In all the places that we use the polled for completion interface, except crash dump support code, move the while loop into an inline function. These aren't done in the fast path, so if the compiler choses to not inline, any performance hit is tiny.	2019-09-02 17:11:27 +00:00
imp	f720d421de	Add a brief comment explaining why we can return ETIMEDOUT from the call to the polled interface. Normally this would have the potential to corrupt stack memory because the completion routines would run after we return. In this case, however, we're doing a dump so it's safe for reasons explained in the comment.	2019-09-02 17:10:46 +00:00
trasz	efac9cd1ec	Relax compat.linux.osrelease checks. This way one can do eg 'compat.linux.osrelease=3.10.0-957.12.1.el7.x86_64', which corresponds to CentOS 7. MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D20685	2019-09-02 16:57:42 +00:00
mjg	3613b12638	vfs: restore mp null check in vop_stdgetwritemount The initially read mount point can already be NULL. Reported by: markj Fixes: r351656 ("vfs: stop refing freed mount points in vop_stdgetwritemount") Sponsored by: The FreeBSD Foundation	2019-09-02 15:24:25 +00:00
johalun	dc4227bd60	LinuxKPI: Add sysfs create/remove functions that handles multiple files in one call. Reviewed by: hps Approved by: imp (mentor), hps MFC after: 1 week Differential Revision: D21475	2019-09-02 14:51:59 +00:00
emaste	ab82639f44	Belatedly bump __FreeBSD_version for r351659, gets(3) removal Reported by: linimon	2019-09-02 12:48:18 +00:00
mjg	ad39028102	proc: clear pid bitmap entry after dropping proctree lock There is no correctness change here, but the procid lock is contended in the fork path and taking it while holding proctree avoidably extends its hold time. Note that there are other ids which can end up getting cleared with the lock. Sponsored by: The FreeBSD Foundation	2019-09-02 12:46:43 +00:00
hselasky	f2d1759c50	Use DEVICE memory instead of UNCACHEABLE on aarch64 in ioremap() in the LinuxKPI. This fixes system hangs on reading device registers on aarch64. Tested with: Marvell MACCHIATObin (Armada8k) + mlx4en, amdgpu Submitted by: Greg V <greg@unrelenting.technology> Differential Revision: https://reviews.freebsd.org/D20789 MFC after: 1 week Sponsored by: Mellanox Technologies	2019-09-02 08:34:45 +00:00
hselasky	9e15cb9d49	Fix regression issue after r351616. Make sure the mbuf queue gets initialized. Found by: gonzo@ MFC after: 1 week Sponsored by: Mellanox Technologies	2019-09-02 08:31:18 +00:00
kevans	6bddb149b5	mips: fix some mcount nits The symbol version for _mcount was removed 12 years ago in r169525 from gmon/Symbol.map, to be added to the per-arch Symbol.map. mips was overlooked in this, so _mcount has no symver. Add it back to where it should have been, rather than where it would go if it were added today, since we're correcting a historical mistake. Additionally, _mcount is getting thrown into .mdebug.abi32 in the llvm80/90 world as it's not getting explicitly thrown into .text, so do this now. This fixes the libc build that was previously failing due to relocations in .mdebug.abi32. This is specifically due to the way clang's integrated AS works and that they emit the .mdebug.abiNN section early in the process. An LLVM bug has been submitted[0] and an agreement has been made that the mips backend should switch to .text following .mdebug.abiNN for compatibility. [0] https://bugs.llvm.org/show_bug.cgi?id=43119 Reviewed by: imp, arichardson MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D21435	2019-09-02 01:55:55 +00:00
markj	5451b35f06	Extend uma_reclaim() to permit different reclamation targets. The page daemon periodically invokes uma_reclaim() to reclaim cached items from each zone when the system is under memory pressure. This is important since the size of these caches is unbounded by default. However it also results in bursts of high latency when allocating from heavily used zones as threads miss in the per-CPU caches and must access the keg in order to allocate new items. With r340405 we maintain an estimate of each zone's usage of its (per-NUMA domain) cache of full buckets. Start making use of this estimate to avoid reclaiming the entire cache when under memory pressure. In particular, introduce TRIM, DRAIN and DRAIN_CPU verbs for uma_reclaim() and uma_zone_reclaim(). When trimming, only items in excess of the estimate are reclaimed. Draining a zone reclaims all of the cached full buckets (the previous behaviour of uma_reclaim()), and may further drain the per-CPU caches in extreme cases. Now, when under memory pressure, the page daemon will trim zones rather than draining them. As a result, heavily used zones do not incur bursts of bucket cache misses following reclamation, but large, unused caches will be reclaimed as before. Reviewed by: jeff Tested by: pho (an earlier version) MFC after: 2 months Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D16667	2019-09-01 22:22:43 +00:00
markj	b1573a081f	Restrict the input domain set in cpuset_setdomain(2) to all_domains. To permit larger values of MAXMEMDOM, which is currently 8 on amd64, cpuset_setdomain(2) accepts a mask of size 256. In the kernel, domain set masks are 64 bits wide, but can only represent a set of MAXMEMDOM domains due to the use of the ds_order table. Domain sets passed to cpuset_setdomain(2) are restricted to a subset of their parent set, which is typically the root set, but before this happens we modify the input set to exclude empty domains. domainset_empty_vm() and other code which manipulates domain sets expect the mask to be a subset of all_domains, so enforce that when performing validation of cpuset_setdomain(2) parameters. Reported and tested by: pho Reviewed by: kib MFC after: 3 days Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21477	2019-09-01 21:38:08 +00:00
emaste	f7d0c0bfca	makefs: share msdosfsmount.h between kernel msdosfs and makefs Sponsored by: The FreeBSD Foundation	2019-09-01 16:55:33 +00:00
emaste	9d119c57d1	vnic: correct and simplify SIOCSIFFLAGS PR: 223573, 223575 Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D13028	2019-09-01 16:53:17 +00:00
emaste	1b8613f62d	Remove CLANG_NO_IAS definition CLANG_NO_IAS is not used anywhere in the tree. Sponsored by: The FreeBSD Foundation	2019-09-01 16:47:48 +00:00
vmaffione	b67d437ec8	netmap: import changes from upstream (SHA 137f537eae513) - Rework option processing. - Use larger integers for memory size values in the memory management code. MFC after: 2 weeks	2019-09-01 14:47:41 +00:00
mjg	11bc2f123b	vfs: stop refing freed mount points in vop_stdgetwritemount The code used blindly ref based on an unsafely red address and then would backpedal if necessary. This was safe in terms of memory access since mounts are type-stable, but made for a potential a bug where the mount was reused and had the count reset to 0 before this code decreased it. Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21411	2019-09-01 14:01:09 +00:00
tuexen	a1d2b51873	Fix initialization of top_fsn. MFC after: 3 days	2019-09-01 10:39:16 +00:00
tuexen	209e505321	Improve the handling of state cookie parameters in INIT-ACK chunks. This fixes problem with parameters indicating a zero length or partial parameters after an unknown parameter indicating to stop processing. It also fixes a problem with state cookie parameters after unknown parametes indicating to stop porcessing. Thanks to Mark Wodrich from Google for finding two of these issues by fuzz testing the userland stack and reporting them in https://github.com/sctplab/usrsctp/issues/355 and https://github.com/sctplab/usrsctp/issues/352 MFC after: 3 days	2019-09-01 10:09:53 +00:00
jkim	6c62a51480	Add support for TP-Link Archer T2U Nano. MFC after: 2 weeks	2019-09-01 06:40:58 +00:00
mjg	5d15f9ad75	nullfs: reduce areas protected by vnode interlock in null_lock Similarly to the other routine stop taking the interlock for the lower vnode. The interlock for nullfs vnode is still taken to ensure stability of ->v_data. Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21480	2019-09-01 02:52:00 +00:00
kevans	c72c92d361	posixshm: switch to OBJT_SWAP in advance of other changes Future changes to posixshm will start tracking writeable mappings in order to support file sealing. Tracking writeable mappings for an OBJT_DEFAULT object is complicated as it may be swapped out and converted to an OBJT_SWAP. One may generically add this tracking for vm_object, but this is difficult to do without increasing memory footprint of vm_object and blowing up memory usage by a significant amount. On the other hand, the swap pager can be expanded to track writeable mappings without increasing vm_object size. This change is currently in D21456. Switch over to OBJT_SWAP in advance of the other changes to the swap pager and posixshm.	2019-09-01 00:33:16 +00:00
ray	8a4d833254	ARM kernel can get RAM regions three ways: o from FDT; o from EFI; o from Linux Boot API (ATAG). U-Boot may pass RAM info all that 3 ways simultaneously. We do select between FDT and EFI, but not for ATAG. So this is not problem fix, but correctness check. MFC after: 2 weeks	2019-08-31 21:28:06 +00:00
mjg	2a0e793620	zfs: fix snapshot dir destruction after introducion of VOP_NEED_INACTIVE Reported by: lwhsu PR: 240221 Sponsored by: The FreeBSD Foundation	2019-08-31 13:24:22 +00:00
tuexen	f17d8ccdcd	Improve function definition. MFC after: 3 days	2019-08-31 13:13:40 +00:00
tuexen	d0f0e21769	Improve the handling of illegal sequence number combinations in received data chunks. Abort the association if there are data chunks with larger fragement sequence numbers than the fragement sequence of the last fragment. Thanks to Mark Wodrich from Google who found this issue by fuzz testing the userland stack and reporting this issue in https://github.com/sctplab/usrsctp/issues/355 MFC after: 3 days	2019-08-31 08:18:49 +00:00
mjg	eb4d1499a2	vfs: add a missing VNODE_REFCOUNT_FENCE_REL to v_incr_usecount_locked Sponsored by: The FreeBSD Foundation	2019-08-30 21:54:45 +00:00
mjoras	c6f2373c08	Wrap a vlan's parent's if_output in a separate function. When a vlan interface is created, its if_output is set directly to the parent interface's if_output. This is fine in the normal case but has an unfortunate consequence if you end up with a certain combination of vlan and lagg interfaces. Consider you have a lagg interface with a single laggport member. When an interface is added to a lagg its if_output is set to lagg_port_output, which blackholes traffic from the normal networking stack but not certain frames from BPF (pseudo_AF_HDRCMPLT). If you now create a vlan with the laggport member (not the lagg interface) as its parent, its if_output is set to lagg_port_output as well. While this is confusing conceptually and likely represents a misconfigured system, it is not itself a problem. The problem arises when you then remove the lagg interface. Doing this resets the if_output of the laggport member back to its original state, but the vlan's if_output is left pointing to lagg_port_output. This gives rise to the possibility that the system will panic when e.g. bpf is used to send any frames on the vlan interface. Fix this by creating a new function, vlan_output, which simply wraps the parent's current if_output. That way when the parent's if_output is restored there is no stale usage of lagg_port_output. Reviewed by: rstone Differential Revision: D21209	2019-08-30 20:19:43 +00:00
emax	3fc0b380c6	avoid holding PCB mutex during copyin/copyout() Reported by: imp, mms dot vanbreukelingen at gmail dot com Reviewed by: imp	2019-08-30 16:35:31 +00:00
markj	1dafc6f4d5	Properly check for an interrupted cv_wait_sig(). The returned error number may be EINTR or ERESTART depending on whether or not the signal is supposed to interrupt the system call. Reported and tested by: pho MFC after: 1 week Sponsored by: The FreeBSD Foundation	2019-08-30 15:40:31 +00:00
mjg	4f45cc9c1d	vfs: tidy up assertions in vfs_subr - assert unlocked vnode interlock in vref - assert right counts in vputx - print debug info for panic in vdrop Sponsored by: The FreeBSD Foundation	2019-08-30 00:45:53 +00:00
emaste	8c95558f9f	xdma: avoid NULL deref in error case Reported by: Dr Silvio Cesare of InfoSect MFC after: 3 days Sponsored by: The FreeBSD Foundation	2019-08-30 00:40:08 +00:00
emaste	7b7bac4249	qlxgbe: avoid NULL deref in error case Reported by: Dr Silvio Cesare of InfoSect MFC after: 3 days Sponsored by: The FreeBSD Foundation	2019-08-30 00:38:16 +00:00
emaste	77353c9cf8	exynos5: avoid NULL deref in error case Reported by: Dr Silvio Cesare of InfoSect MFC after: 3 days MFC with: r351618 Sponsored by: The FreeBSD Foundation	2019-08-30 00:36:17 +00:00
emaste	98a4ec13e0	exynos5: avoid NULL deref in error case Reported by: Dr Silvio Cesare of InfoSect MFC after: 3 days Sponsored by: The FreeBSD Foundation	2019-08-30 00:34:27 +00:00
mjg	ee71762742	nullfs: use VOP_NEED_INACTIVE Reviewed by: kib Tested by: pho (previous version) Sponsored by: The FreeBSD Foundation	2019-08-30 00:30:03 +00:00
glebius	423f98ea90	Use mbuf queue instead of ifqueue in USB network drivers. Reviewed by: stevek	2019-08-30 00:05:04 +00:00
glebius	1c4a16a282	Allow mbuf queues to be unlimited. There is number of legacy code that uses ifqueue without setting a limit on it first. Easier to allow for that rather than improve legacy drivers.	2019-08-30 00:03:41 +00:00

1 2 3 4 5 ...

137889 Commits