freebsd-skq

Author	SHA1	Message	Date
kib	74ece39650	Typo out->in. Sponsored by: The FreeBSD Foundation MFC after: 3 days	2019-10-10 18:52:24 +00:00
avg	50cdda62fc	emulate illumos membar_producer with atomic_thread_fence_rel membar_producer is supposed to be a store-store barrier. Also, in the code that FreeBSD has ported from illumos membar_producer is used only with regular stores to regular memory (with respect to caching). We do not have an MI primitive for the store-store barrier, so atomic_thread_fence_rel is the closest we have as it provides (load \| store) -> store barrier. Previously, membar_producer was an empty function call on all 32-bit arm-s, 32-bit powerpc, riscv and all mips variants. I think that it was inadequate. On other platforms, such as amd64, arm64, i386, powerpc64, sparc64, membar_producer was implemented using stronger primitives than required for a store-store barrier with respect to regular memory access. For example, it used sfence on amd64 and lock-ed nop in i386 (despite TSO). On powerpc64 we now use recommended lwsync instead of eieio. On sparc64 FreeBSD uses TSO mode. On arm64/aarch64 we now use dmb sy instead of dmb ish. Not sure if this is an improvement, actually. After this change we can drop opensolaris_atomic.S for aarch64, amd64, powerpc64 and sparc64 as all required atomic operations have either direct or light-weight mapping to FreeBSD native atomic operations. Discussed with: kib MFC after: 4 weeks	2019-10-10 07:39:41 +00:00
ambrisko	235ed49b64	This driver attaches to the Intel VMD drive and connects a new PCI domain starting at the max. domain, and then work down. Then existing FreeBSD drivers will attach. Interrupt routing from the VMD MSI-X to the NVME drive is not well known, so any interrupt is sent to all children that register. VROC used Intel meta data so graid(8) works with it. However, graid(8) supports RAID 0,1,10 for read and write. I have some early code to support writes with RAID 5. Note that RAID 5 can have life issues with SSDs since it can cause write amplification from updating the parity data. Hot plug support needs a change to skip the following check to work: if (pcib_request_feature(dev, PCI_FEATURE_HP) != 0) { in sys/dev/pci/pci_pci.c. Looked at by: imp, rpokala, bcr Differential Revision: https://reviews.freebsd.org/D21383	2019-10-10 03:12:17 +00:00
jhb	b6209cc9c7	Add opt_kern_tls.h to the sources from t4_tom.ko. Missed in r353328. Sponsored by: Chelsio Communications	2019-10-09 23:35:42 +00:00
jhb	56a61b7cc2	Don't free the cursor boundary tag during vmem_destroy(). The cursor boundary tag is statically allocated in the vmem instead of from the vmem_bt_zone. Explicitly remove it from the vmem's segment list in vmem_destroy before freeing all the segments from the vmem. Reviewed by: markj MFC after: 1 week Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D21953	2019-10-09 21:20:39 +00:00
jhb	bf3394bad4	Remove adapters from t4_list earlier during detach. This ensures the clip task won't race with t4_destroy_clip_table. While here, make some mutex destroys unconditional since attach always initializes them. Reviewed by: np MFC after: 1 week Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D21952	2019-10-09 21:08:51 +00:00
imp	516b5533ca	Fix casting error from newer gcc Cast the pointers to (uintptr_t) before assigning to type uint64_t. This eliminates an error from gcc when we cast the pointer to a larger integer type.	2019-10-09 21:02:06 +00:00
trasz	bd1ba152e6	Fix the compilation workaround so it's not entirely dead code - clang also defines __GNUC__. Submitted by: cem Sponsored by: Klara Inc, Netflix	2019-10-09 18:46:56 +00:00
hselasky	ff0a2f4dbc	Factor out TCP rateset destruction code. Ensure the epoch_call() function is not called more than one time before the callback has been executed, by always checking the RS_FUNERAL_SCHD flag before invoking epoch_call(). The "rs_number_dead" is balanced again after r353353. Discussed with: rrs@ Sponsored by: Mellanox Technologies	2019-10-09 17:08:40 +00:00
dim	b44399126d	Merge llvm, clang, compiler-rt, libc++, libunwind, lld, lldb and openmp 9.0.0 final release r372316. Release notes for llvm, clang, lld and libc++ 9.0.0 are available here: https://releases.llvm.org/9.0.0/docs/ReleaseNotes.html https://releases.llvm.org/9.0.0/tools/clang/docs/ReleaseNotes.html https://releases.llvm.org/9.0.0/tools/lld/docs/ReleaseNotes.html https://releases.llvm.org/9.0.0/projects/libcxx/docs/ReleaseNotes.html PR: 240629 MFC after: 1 month	2019-10-09 17:06:56 +00:00
glebius	ac17d49973	Revert most of the multicast changes from r353292. This needs a more accurate approach.	2019-10-09 17:03:20 +00:00
glebius	62fd3c33fa	ip6_output() has a complex set of gotos, and some can jump out of the epoch section towards return statement. Since entering epoch is cheap, it is easier to cover the whole function with epoch, rather than try to properly maintain its state.	2019-10-09 17:02:28 +00:00
glebius	9618aadcd8	Cleanup unneeded includes that crept in with r353292.	2019-10-09 16:59:42 +00:00
manu	dd7d8c699a	dwmmc: Reset the dma controller at attach If the bootloader enabled DMA we need to fully reset the DMA controller otherwise we might have some stale data in it that provoke weird behavior. MFC after: 1 week	2019-10-09 16:57:14 +00:00
hselasky	166e590d9c	Fix locking order reversal in the TCP ratelimit code by moving destructors outside the rsmtx mutex. Witness message: lock order reversal: (sleepable after non-sleepable) 1st tcp_rs_mtx (rsmtx) @ sys/netinet/tcp_ratelimit.c:242 2nd sysctl lock (sysctl lock) @ sys/kern/kern_sysctl.c:607 Backtrace: witness_debugger witness_checkorder _rm_wlock_debug sysctl_ctx_free rs_destroy epoch_call_task gtaskqueue_run_locked gtaskqueue_thread_loop Discussed with: rrs@ Sponsored by: Mellanox Technologies	2019-10-09 16:48:48 +00:00
dim	beb54142f7	Merge ^/head r353316 through r353350.	2019-10-09 16:40:22 +00:00
glebius	1408ed7d50	ifnet_byindex_ref() requires network epoch.	2019-10-09 16:21:50 +00:00
glebius	7299f8c33d	Enter network epoch in domain callouts.	2019-10-09 16:21:05 +00:00
vangyzen	bacb9e94fe	Add CTLFLAG_STATS to the dev.ioat.N.stats sysctl OIDs Refer to r353111. MFC after: 2 weeks Sponsored by: Dell EMC Isilon	2019-10-09 12:14:10 +00:00
avg	e9642c209b	cleanup of illumos compatibility atomics atomic_cas_32 is implemented using atomic_fcmpset_32 on all platforms. Ditto for atomic_cas_64 and atomic_fcmpset_64 on platforms that have it. The only exception is sparc64 that provides MD atomic_cas_32 and atomic_cas_64. This is slightly inefficient as fcmpset reports whether the operation updated the target and that information is not needed for cas. Nevertheless, there is less code to maintain and to add for new platforms. Also, the operations are done inline now as opposed to function calls before. atomic_add_64_nv is implemented using atomic_fetchadd_64 on platforms that provide it. casptr, cas32, atomic_or_8, atomic_or_8_nv are completely removed as they have no users. atomic_mtx that is used to emulate 64-bit atomics on platforms that lack them is defined only on those platforms. As a result, platform specific opensolaris_atomic.S files have lost most of their code. The only exception is i386 where the compat+contrib code provides 64-bit atomics for userland use. That code assumes availability of cmpxchg8b instruction. FreeBSD does not have that assumption for i386 userland and does not provide 64-bit atomics. Hopefully, this can and will be fixed. MFC after: 3 weeks	2019-10-09 11:26:36 +00:00
glebius	6bf933c434	Revert changes to rip6_bind() from r353292. This function is always called in syscall context, so it must enter epoch itself. This changeset originates from early version of the patch, and somehow slipped to the final version. Reported by: pho	2019-10-09 05:52:07 +00:00
markj	bb579d181e	Fix handling of empty SCM_RIGHTS messages. As unp_internalize() processes the input control messages, it builds an output mbuf chain containing the internalized representations of those messages. In one special case, that of an empty SCM_RIGHTS message, the message is simply discarded. However, the loop which appends mbufs to the output chain assumed that each iteration would produce an mbuf, resulting in a null pointer dereference if an empty SCM_RIGHTS message was followed by a non-empty message. Fix this by advancing the output mbuf chain tail pointer only if an internalized control message was produced. Reported by: syzbot+1b5cced0f7fad26ae382@syzkaller.appspotmail.com MFC after: 1 week Sponsored by: The FreeBSD Foundation	2019-10-08 23:34:48 +00:00
jhb	0e8a8e5def	Add support for KTLS in the Chelsio TOE module. This adds a TOE hook to allocate a KTLS session. It also recognizes TLS mbufs in the socket buffer and sends those to the NIC using a TLS work request to encrypt the record before segmenting it. TOE TLS support must be enabled via the dev.t6nex.<N>.tls sysctl in addition to enabling KTLS. Reviewed by: np, gallatin Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D21891	2019-10-08 21:40:42 +00:00
jhb	02e5a4c53c	Add a TOE KTLS mode and a TOE hook for allocating TLS sessions. This adds the glue to allocate TLS sessions and invokes it from the TLS enable socket option handler. This also adds some counters for active TOE sessions. The TOE KTLS mode is returned by getsockopt(TLSTX_TLS_MODE) when TOE KTLS is in use on a socket, but cannot be set via setsockopt(). To simplify various checks, a TLS session now includes an explicit 'mode' member set to the value returned by TLSTX_TLS_MODE. Various places that used to check 'sw_encrypt' against NULL to determine software vs ifnet (NIC) TLS now check 'mode' instead. Reviewed by: np, gallatin Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D21891	2019-10-08 21:34:06 +00:00
mjg	b68ee60bab	amd64: plug spurious cld instructions ABI already guarantees the direction is forward. Note this does not take care of i386-specific cld's. Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21906	2019-10-08 21:14:11 +00:00
jhb	79f37acc1f	Set the FID field in lookaside crypto requests to the rx queue ID. The PCI block in the adapter requires this field to be set to a valid queue ID. It is not clear why it did not fail on all machines, but the effect was that crypto operations reading input data via DMA failed with an internal PCI read error on machines with 128G or more of RAM. Reported by: gallatin Reviewed by: np MFC after: 3 days Sponsored by: Chelsio Communications	2019-10-08 20:22:05 +00:00
hselasky	a9b059643b	Fix regression issue after r352989: As noted by the commit message, callouts are now persistant and should not be in the auto-zero section of the RQ's and SQ's. This fixes an assert when using the TX completion event factor feature with mlx5en(4). Found by: gallatin@ MFC after: 3 days Sponsored by: Mellanox Technologies	2019-10-08 19:49:25 +00:00
dim	53d410d088	Prepare for merging back to head: * Set tentative merge date * Add UPDATING entry * Bump __FreeBSD_version * Bump FREEBSD_CC_VERSION * Bump LLD_REVISION	2019-10-08 18:21:33 +00:00
dim	1da7bc25ba	Merge ^/head r352764 through r353315.	2019-10-08 18:17:02 +00:00
glebius	be8284f983	Remove epoch assertion from if_setlladdr(). Originally this function was protected by IF_ADDR_LOCK(), which was a mutex, so that two simultaneous if_setlladdr() can't execute. Later it was switched to IF_ADDR_RLOCK(), likely by a mistake. Later it was switched to NET_EPOCH_ENTER(). Then I incorrectly added NET_EPOCH_ASSERT() here. In reality ifp->if_addr never goes away and never changes its length. So, doing bcopy() in it is always "safe", meaning it won't dereference a wrong pointer or write into someone's else memory. Of course doing two bcopy() in parallel would result in a mess of two addresses, but net epoch doesn't protect against that, neither IF_ADDR_RLOCK() did. So for now, just remove the assertion and leave for later a proper fix. Reported by: markj	2019-10-08 17:55:45 +00:00
glebius	1c64917ffa	Quickly plug another regression from r353292. Again, multicast locking needs lots of work... Reported by: pho	2019-10-08 16:59:17 +00:00
glebius	32a0d5bbd1	In DIAGNOSTIC block of if_delmulti_ifma_flags() enter the network epoch. This quickly plugs the regression from r353292. The locking of multicast definitely needs a broader review today... Reported by: pho, dhw	2019-10-08 16:45:56 +00:00
markj	8a79c7414f	Simplify pmap_page_array_startup() a bit. No functional change intended. Sponsored by: The FreeBSD Foundation	2019-10-08 16:42:50 +00:00
markj	1e2840f2f0	Avoid erroneously clearing PGA_WRITEABLE in riscv's pmap_enter(). During a CoW fault, we must check for both 4KB and 2MB mappings before clearing PGA_WRITEABLE on the old mapping's page. Previously we were only checking for 4KB mappings. This was missed in r344106. MFC after: 3 days Sponsored by: The FreeBSD Foundation	2019-10-08 15:03:48 +00:00
mjg	236e505a95	amd64 pmap: allocate pv table entries for gaps in PA This matches the state prior to r353149 and fixes crashes with DRM modules. Reported and tested by: cy, garga, Krasznai Andras Fixes: r353149 ("amd64 pmap: implement per-superpage locks") Sponsored by: The FreeBSD Foundation	2019-10-08 14:59:50 +00:00
markj	d203e7ff7d	Clear PGA_WRITEABLE in riscv's pmap_remove_l3(). pmap_remove_l3() may remove the last mapping of a page, in which case it must clear PGA_WRITEABLE. Reported by: Jenkins, via lwhsu MFC after: 1 week Sponsored by: The FreeBSD Foundation	2019-10-08 14:54:35 +00:00
avg	2fa7e77a36	zfs: use atomic_load_64 to read atomic variable in dmu_object_alloc_impl As long as we support ZFS on 32-bit platforms we should do this for all 64-bit variables that are modified in a lockless fashion using atomic operations. Otherwise, there is a risk of a reading a torn value. Here is a rationale for why I am doing this in dmu_object_alloc_impl: - it's very recent code - the code deals with object IDs and a number of objects in a file system can overflow 32 bits - incorrect allocation of an object ID may result in hard to debug problems - fixing all plain reads of 64-bit atomic variables is not a trivial undertaking to do in one shot, so I chose to do it incrementally MFC after: 3 weeks X-MFC after: r353301, r353176	2019-10-08 11:27:48 +00:00
tuexen	e8c2889eec	Validate length before use it, not vice versa. r353060 should have contained this... This fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=18070 MFC after: 3 days	2019-10-08 11:07:16 +00:00
hselasky	131783f8e3	Fix regression issue after r353274: Make sure the vnet_shutdown field is not set until after all VNET_SYSUNINIT()'s in the SI_SUB_VNET_DONE subsystem have been executed. Especially the vnet_if_return() functions requires that if_move() is still operational. Reported by: lwhsu@ MFC after: 1 week Sponsored by: Mellanox Technologies	2019-10-08 11:06:24 +00:00
avg	fb316b3e7b	i386: hide more of atomic 64-bit definitions under _KERNEL At the moment i386 does not provide 64-bit atomic operations in userland. Exposing some atomic_*_64 defines can cause unnecessary confusion. Discussed with: kib MFC after: 2 weeks	2019-10-08 10:50:16 +00:00
dougm	918670a5ed	Define macro VM_MAP_ENTRY_FOREACH for enumerating the entries in a vm_map. In case the implementation ever changes from using a chain of next pointers, then changing the macro definition will be necessary, but changing all the files that iterate over vm_map entries will not. Drop a counter in vm_object.c that would have an effect only if the vm_map entry count was wrong. Discussed with: alc Reviewed by: markj Tested by: pho (earlier version) Differential Revision: https://reviews.freebsd.org/D21882	2019-10-08 07:14:21 +00:00
jhibbits	a6677a026c	powerpc: Implement atomic_(f)cmpset_ for short and char \| This adds two implementations for each atomic_fcmpset_ and atomic_cmpset_ short and char functions, selectable at compile time for the target architecture. By default, it uses a generic shift-and-mask to perform atomic updates to sub-components of 32-bit words from <sys/_atomic_subword.h>. However, if ISA_206_ATOMICS is defined it uses the ll/sc instructions for halfword and bytes, introduced in PowerISA 2.06. These instructions are supported by all IBM processors from POWER7 on, as well as the Freescale/NXP e6500 core. Although the e5500 and e500mc both implement PowerISA 2.06 they do not implement these instructions. As part of this, clean up the atomic_(f)cmpset_acq and _rel wrappers, by using macros to reduce code duplication. ISA_206_ATOMICS requires clang or newer binutils (2.20 or later). Differential Revision: https://reviews.freebsd.org/D21682	2019-10-08 01:36:34 +00:00
markj	eae373986b	Improve locking in the IPV6_V6ONLY socket option handler. Acquire the inp lock before checking whether the socket is already bound, and around updates to the inp_vflag field. MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21867	2019-10-07 23:35:23 +00:00
markj	c0e5f81551	Assert that the PGA_{WRITEABLE,EXECUTABLE} flags do not leak. Reviewed by: alc, kib MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D21783	2019-10-07 23:31:17 +00:00
mjg	d8e13e75c0	vm: stop trylocking page queues in vm_page_pqbatch_submit About 11 minutes of poudriere -s -j 104 and probing on return value of trylocks reveals that over 10% of attempts fail, which in turn means there are more atomics performed than necessary. Trylocking was there to try preventing migration, but it's not very likely to happen if the lock is uncontested. Reviewed by: markj Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21925	2019-10-07 23:19:09 +00:00
glebius	337378e04f	Widen NET_EPOCH coverage. When epoch(9) was introduced to network stack, it was basically dropped in place of existing locking, which was mutexes and rwlocks. For the sake of performance mutex covered areas were as small as possible, so became epoch covered areas. However, epoch doesn't introduce any contention, it just delays memory reclaim. So, there is no point to minimise epoch covered areas in sense of performance. Meanwhile entering/exiting epoch also has non-zero CPU usage, so doing this less often is a win. Not the least is also code maintainability. In the new paradigm we can assume that at any stage of processing a packet, we are inside network epoch. This makes coding both input and output path way easier. On output path we already enter epoch quite early - in the ip_output(), in the ip6_output(). This patch does the same for the input path. All ISR processing, network related callouts, other ways of packet injection to the network stack shall be performed in net_epoch. Any leaf function that walks network configuration now asserts epoch. Tricky part is configuration code paths - ioctls, sysctls. They also call into leaf functions, so some need to be changed. This patch would introduce more epoch recursions (see EPOCH_TRACE) than we had before. They will be cleaned up separately, as several of them aren't trivial. Note, that unlike a lock recursion the epoch recursion is safe and just wastes a bit of resources. Reviewed by: gallatin, hselasky, cy, adrian, kristof Differential Revision: https://reviews.freebsd.org/D19111	2019-10-07 22:40:05 +00:00
tuexen	03060312fd	In r343587 a simple port filter as sysctl tunable was added to siftr. The new sysctl was not added to the siftr.4 man page at the time. This updates the man page, and removes one left over trailing whitespace. Submitted by: Richard Scheffenegger Reviewed by: bcr@ MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D21619	2019-10-07 20:35:04 +00:00
trasz	008d4a5775	Introduce stats(3), a flexible statistics gathering API. This provides a framework to define a template describing a set of "variables of interest" and the intended way for the framework to maintain them (for example the maximum, sum, t-digest, or a combination thereof). Afterwards the user code feeds in the raw data, and the framework maintains these variables inside a user-provided, opaque stats blobs. The framework also provides a way to selectively extract the stats from the blobs. The stats(3) framework can be used in both userspace and the kernel. See the stats(3) manual page for details. This will be used by the upcoming TCP statistics gathering code, https://reviews.freebsd.org/D20655. The stats(3) framework is disabled by default for now, except in the NOTES kernel (for QA); it is expected to be enabled in amd64 GENERIC after a cool down period. Reviewed by: sef (earlier version) Obtained from: Netflix Relnotes: yes Sponsored by: Klara Inc, Netflix Differential Revision: https://reviews.freebsd.org/D20477	2019-10-07 19:05:05 +00:00
hselasky	cd0a9b48d4	Compile time assert a valid subsystem for all VNET init and uninit functions. Using VNET init and uninit functions outside the given range has undefined behaviour. MFC after: 1 week Sponsored by: Mellanox Technologies	2019-10-07 14:24:59 +00:00
hselasky	bd1a11f46e	Factor out VNET shutdown check into an own vnet structure field. Remove the now obsolete vnet_state field. This greatly simplifies the detection of VNET shutdown and avoids code duplication. Discussed with: bz@ MFC after: 1 week Sponsored by: Mellanox Technologies	2019-10-07 14:15:41 +00:00

1 2 3 4 5 ...

138272 Commits