freebsd-skq

Author	SHA1	Message	Date
John Baldwin	e42b096439	Document EINTEGRITY errors for many system calls. EINTEGRITY was previously documented as a UFS-specific error for mount(2). This documents EINTEGRITY as a filesystem-independent error that may be reported by the backing store of a filesystem. While here, document EIO as a filesystem-independent error for both mount(2) and posix_fadvise(2). EIO was previously only documented for UFS for mount(2). Reviewed by: mckusick Suggested by: mckusick MFC after: 2 weeks Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D24168	2020-03-30 21:44:00 +00:00
John Baldwin	c034143269	Refactor driver and consumer interfaces for OCF (in-kernel crypto). - The linked list of cryptoini structures used in session initialization is replaced with a new flat structure: struct crypto_session_params. This session includes a new mode to define how the other fields should be interpreted. Available modes include: - COMPRESS (for compression/decompression) - CIPHER (for simply encryption/decryption) - DIGEST (computing and verifying digests) - AEAD (combined auth and encryption such as AES-GCM and AES-CCM) - ETA (combined auth and encryption using encrypt-then-authenticate) Additional modes could be added in the future (e.g. if we wanted to support TLS MtE for AES-CBC in the kernel we could add a new mode for that. TLS modes might also affect how AAD is interpreted, etc.) The flat structure also includes the key lengths and algorithms as before. However, code doesn't have to walk the linked list and switch on the algorithm to determine which key is the auth key vs encryption key. The 'csp_auth_' fields are always used for auth keys and settings and 'csp_cipher_' for cipher. (Compression algorithms are stored in csp_cipher_alg.) - Drivers no longer register a list of supported algorithms. This doesn't quite work when you factor in modes (e.g. a driver might support both AES-CBC and SHA2-256-HMAC separately but not combined for ETA). Instead, a new 'crypto_probesession' method has been added to the kobj interface for symmteric crypto drivers. This method returns a negative value on success (similar to how device_probe works) and the crypto framework uses this value to pick the "best" driver. There are three constants for hardware (e.g. ccr), accelerated software (e.g. aesni), and plain software (cryptosoft) that give preference in that order. One effect of this is that if you request only hardware when creating a new session, you will no longer get a session using accelerated software. Another effect is that the default setting to disallow software crypto via /dev/crypto now disables accelerated software. Once a driver is chosen, 'crypto_newsession' is invoked as before. - Crypto operations are now solely described by the flat 'cryptop' structure. The linked list of descriptors has been removed. A separate enum has been added to describe the type of data buffer in use instead of using CRYPTO_F_* flags to make it easier to add more types in the future if needed (e.g. wired userspace buffers for zero-copy). It will also make it easier to re-introduce separate input and output buffers (in-kernel TLS would benefit from this). Try to make the flags related to IV handling less insane: - CRYPTO_F_IV_SEPARATE means that the IV is stored in the 'crp_iv' member of the operation structure. If this flag is not set, the IV is stored in the data buffer at the 'crp_iv_start' offset. - CRYPTO_F_IV_GENERATE means that a random IV should be generated and stored into the data buffer. This cannot be used with CRYPTO_F_IV_SEPARATE. If a consumer wants to deal with explicit vs implicit IVs, etc. it can always generate the IV however it needs and store partial IVs in the buffer and the full IV/nonce in crp_iv and set CRYPTO_F_IV_SEPARATE. The layout of the buffer is now described via fields in cryptop. crp_aad_start and crp_aad_length define the boundaries of any AAD. Previously with GCM and CCM you defined an auth crd with this range, but for ETA your auth crd had to span both the AAD and plaintext (and they had to be adjacent). crp_payload_start and crp_payload_length define the boundaries of the plaintext/ciphertext. Modes that only do a single operation (COMPRESS, CIPHER, DIGEST) should only use this region and leave the AAD region empty. If a digest is present (or should be generated), it's starting location is marked by crp_digest_start. Instead of using the CRD_F_ENCRYPT flag to determine the direction of the operation, cryptop now includes an 'op' field defining the operation to perform. For digests I've added a new VERIFY digest mode which assumes a digest is present in the input and fails the request with EBADMSG if it doesn't match the internally-computed digest. GCM and CCM already assumed this, and the new AEAD mode requires this for decryption. The new ETA mode now also requires this for decryption, so IPsec and GELI no longer do their own authentication verification. Simple DIGEST operations can also do this, though there are no in-tree consumers. To eventually support some refcounting to close races, the session cookie is now passed to crypto_getop() and clients should no longer set crp_sesssion directly. - Assymteric crypto operation structures should be allocated via crypto_getkreq() and freed via crypto_freekreq(). This permits the crypto layer to track open asym requests and close races with a driver trying to unregister while asym requests are in flight. - crypto_copyback, crypto_copydata, crypto_apply, and crypto_contiguous_subsegment now accept the 'crp' object as the first parameter instead of individual members. This makes it easier to deal with different buffer types in the future as well as separate input and output buffers. It's also simpler for driver writers to use. - bus_dmamap_load_crp() loads a DMA mapping for a crypto buffer. This understands the various types of buffers so that drivers that use DMA do not have to be aware of different buffer types. - Helper routines now exist to build an auth context for HMAC IPAD and OPAD. This reduces some duplicated work among drivers. - Key buffers are now treated as const throughout the framework and in device drivers. However, session key buffers provided when a session is created are expected to remain alive for the duration of the session. - GCM and CCM sessions now only specify a cipher algorithm and a cipher key. The redundant auth information is not needed or used. - For cryptosoft, split up the code a bit such that the 'process' callback now invokes a function pointer in the session. This function pointer is set based on the mode (in effect) though it simplifies a few edge cases that would otherwise be in the switch in 'process'. It does split up GCM vs CCM which I think is more readable even if there is some duplication. - I changed /dev/crypto to support GMAC requests using CRYPTO_AES_NIST_GMAC as an auth algorithm and updated cryptocheck to work with it. - Combined cipher and auth sessions via /dev/crypto now always use ETA mode. The COP_F_CIPHER_FIRST flag is now a no-op that is ignored. This was actually documented as being true in crypto(4) before, but the code had not implemented this before I added the CIPHER_FIRST flag. - I have not yet updated /dev/crypto to be aware of explicit modes for sessions. I will probably do that at some point in the future as well as teach it about IV/nonce and tag lengths for AEAD so we can support all of the NIST KAT tests for GCM and CCM. - I've split up the exising crypto.9 manpage into several pages of which many are written from scratch. - I have converted all drivers and consumers in the tree and verified that they compile, but I have not tested all of them. I have tested the following drivers: - cryptosoft - aesni (AES only) - blake2 - ccr and the following consumers: - cryptodev - IPsec - ktls_ocf - GELI (lightly) I have not tested the following: - ccp - aesni with sha - hifn - kgssapi_krb5 - ubsec - padlock - safe - armv8_crypto (aarch64) - glxsb (i386) - sec (ppc) - cesa (armv7) - cryptocteon (mips64) - nlmsec (mips64) Discussed with: cem Relnotes: yes Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D23677	2020-03-27 18:25:23 +00:00
Mateusz Piotrowski	f04020edc5	Sort UMA macros and create MLINKS for them This patch is a follow-up to r344518. Reported by: ngie Reviewed by: hselasky Differential Revision: https://reviews.freebsd.org/D24165	2020-03-23 14:04:42 +00:00
Mark Johnston	fffcb56f7a	Add COUNTER_U64_SYSINIT() and COUNTER_U64_DEFINE_EARLY(). The aim is to reduce the boilerplate needed today to define and initialize global counters. Also add SI_SUB_COUNTER to the sysinit ordering. Reviewed by: kib MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D23977	2020-03-06 19:09:01 +00:00
Warner Losh	68a229849a	_Static_assert is to be preferred to CTASSERT. Document the existing prefernce that _Static_assert be used in preference to the old CTASSERT we used to use for compile time assertions.	2020-02-27 15:30:13 +00:00
Ed Maste	59ffd5eb99	style.9: update C99 commentary Make style.9 read as a current statement of C99 preferences, rather than a description of ongoing changes to our preferred style. Alsu use the short form "ISO C99" on the 2nd and later instances rather than repeating the unwieldy `ISO/IEC 9899:1999 ("ISO C99")` each time. Reviewed by: cem, imp, jhb, kib MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D23648	2020-02-25 17:18:59 +00:00
Matt Macy	45035becfe	Add zfree to zero allocation before free Key and cookie management typically wants to avoid information leaks by explicitly zeroing before free. This routine simplifies that by permitting consumers to do so without carrying the size around. Reviewed by: jeff@, jhb@ MFC after: 1 week Sponsored by: Rubicon Communications, LLC (Netgate) Differential Revision: https://reviews.freebsd.org/D22790	2020-02-16 00:12:53 +00:00
Warner Losh	702547720c	Remove sparc64 specific bits of the man pages.	2020-02-12 06:52:22 +00:00
Ryan Libby	ec0d828071	uma: add UMA_ZONE_CONTIG, and a default contig_alloc For now, copy the mbuf allocator. Reviewed by: jeff, markj (previous version) Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D23237	2020-02-04 22:40:11 +00:00
Warner Losh	51691e26d0	Remove vpo.4 The Parallel Port SCSI adapter was interesting for 100MB ZIP drives, but is no longer used or maintained. Remove it from the tree. The Parallel Port microsequencer (microseq.9) is now mostly unused in the tree, but remains. PPI still refrences it, but doesn't use its full functionality. Relnotes: Yes Reviewed by: rgrimes@, Ihor Antonov Discussed on: arch@ Differential Revision: https://reviews.freebsd.org/D23389	2020-02-02 04:53:27 +00:00
Mark Johnston	1c29da0279	Reimplement stack capture of running threads on i386 and amd64. After r355784 the td_oncpu field is no longer synchronized by the thread lock, so the stack capture interrupt cannot be delievered precisely. Fix this using a loop which drops the thread lock and restarts if the wrong thread was sampled from the stack capture interrupt handler. Change the implementation to use a regular interrupt instead of an NMI. Now that we drop the thread lock, there is no advantage to the latter. Simplify the KPIs. Remove stack_save_td_running() and add a return value to stack_save_td(). On platforms that do not support stack capture of running threads, stack_save_td() returns EOPNOTSUPP. If the target thread is running in user mode, stack_save_td() returns EBUSY. Reviewed by: kib Reported by: mjg, pho Tested by: pho Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D23355	2020-01-31 15:43:33 +00:00
John Baldwin	03fd4409c7	Correct the return types of fueword*(). MFC after: 1 week Sponsored by: DARPA	2020-01-23 23:36:58 +00:00
Gleb Smirnoff	a7f12fce22	Remove struct callout_handle. Should have gone with r355732.	2020-01-22 05:47:59 +00:00
Gleb Smirnoff	1d11092814	Change argument order of epoch_call() to more natural, first function, then its argument. A miss from r356826.	2020-01-22 02:28:39 +00:00
Emmanuel Vadot	94292d7e95	fdt_pinctrl: Add new methods for gpios Most of the gpio controller cannot configure or get the configuration of the pin muxing as it's usually handled in the pinctrl driver. But they can know what is the pinmuxing driver either because they are child of it or via the gpio-range property. Add some new methods to fdt_pinctrl that a pin controller can implement. Some methods are : fdt_pinctrl_is_gpio: Use to know if the pin in the gpio mode fdt_pinctrl_set_flags: Set the flags of the pin (pullup/pulldown etc ...) fdt_pinctrl_get_flags: Get the flags of the pin (pullup/pulldown etc ...) The defaults method returns EOPNOTSUPP. Reviewed by: ian, bcr (manpages) MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D23093	2020-01-16 21:19:27 +00:00
Ryan Libby	54c5ae804f	uma: reorganize flags - Garbage collect UMA_ZONE_PAGEABLE & UMA_ZONE_STATIC. - Move flag VTOSLAB from public to private. - Introduce public NOTPAGE flag and make HASH private. - Introduce public NOTOUCH flag and make OFFPAGE private. - Update man page. The net effect of this should be to make the contract with clients more clear. Clients should choose constraints, UMA will figure out how to implement them. This also breaks the confusing double meaning of OFFPAGE. Reviewed by: jeff, markj Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D23016	2020-01-09 02:03:03 +00:00
Conrad Meyer	2c1962aba6	epoch.9: Add missing functions, clean up documentation Various rototilling.	2019-12-28 01:35:32 +00:00
Mateusz Guzik	1f162fef76	Add read-mostly sleepable locks To be used when like rmlocks, except when sleeping for readers needs to be allowed. See the manpage for more information. Reviewed by: kib (previous version) Differential Revision: https://reviews.freebsd.org/D22823	2019-12-27 11:19:57 +00:00
Conrad Meyer	f3bae413e9	random(9): Deprecate random(9), remove meaningless srandom(9) srandom(9) is meaningless on SMP systems or any system with, say, interrupts. One could never rely on random(9) to produce a reproducible sequence of outputs on the basis of a specific srandom() seed because the global state was shared by all kernel contexts. As such, removing it is literally indistinguishable to random(9) consumers (as compared with retaining it). Mark random(9) as deprecated and slated for quick removal. This is not to say we intend to remove all fast, non-cryptographic PRNG(s) in the kernel. It/they just won't be random(9), as it exists today, in either name or implementation. Before random(9) is removed, a replacement will be provided and in-tree consumers will be converted. Note that despite the name, the random(9) interface does not bear any resemblance to random(3). Instead, it is the same crummy 1988 Park-Miller LCG used in libc rand(3).	2019-12-26 19:41:09 +00:00
Conrad Meyer	fea73412a0	sleep(9), sleepqueue(9): const'ify wchan pointers _sleep(9), wakeup(9), sleepqueue(9), et al do not dereference or modify the channel pointers provided in any way; they are merely used as intptrs into a dictionary structure to match waiters with wakers. Correctly annotate this such that _sleep() and wakeup() may be used on const pointers without invoking ugly patterns like __DECONST(). Plumb const through all of the underlying sleepqueue bits. No functional change. Reviewed by: rlibby Discussed with: kib, markj Differential Revision: https://reviews.freebsd.org/D22914	2019-12-24 16:19:33 +00:00
Scott Long	757d4fbaa7	Introduce the concept of busdma tag templates. A template can be allocated off the stack, initialized to default values, and then filled in with driver-specific values, all without having to worry about the numerous other fields in the tag. The resulting template is then passed into busdma and the normal opaque tag object created. See the man page for details on how to initialize a template. Templates do not support tag filters. Filters have been broken for many years, and only existed for an ancient make/model of hardware that had a quirky DMA engine. Instead of breaking the ABI/API and changing the arugment signature of bus_dma_tag_create() to remove the filter arguments, templates allow us to ignore them, and also significantly reduce the complexity of creating and managing tags. Reviewed by: imp, kib Differential Revision: https://reviews.freebsd.org/D22906	2019-12-24 14:48:46 +00:00
John Baldwin	f236a86702	Bump Dd for changes in r355866. Pointy hat to: jhb MFC after: 2 weeks	2019-12-18 01:27:49 +00:00
John Baldwin	284789e871	Update the crypto(4) and crypto(9) manpages. There are probably bits that are still wrong, but this fixes some things at least: - Add named arguments to the functions in crypto(9). - Add missing algorithms. - Don't mention arguments that don't exist in crypto_register. - Add CIOGSESSION2. - Remove CIOCNFSESSION. - Clarify some stale language that assumed an fd had only one sesson. - Note that you have to use CRIOGET and add a note in BUGS lamenting that one has to use CRIOGET. - Various other cleanups. Reviewed by: cem (earlier version) MFC after: 2 weeks Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D22784	2019-12-17 22:58:07 +00:00
Warner Losh	c0834910ca	Better copyright advice Document the common practices around copyrights with "all rights reserved" in them as new copyright notices get added. It's an open question qhether to point people at the fact that since the Berne convention was ratified, All rights reserved is largely obsolete. https://en.wikipedia.org/wiki/All_rights_reserved#Obsolescence has the details. The committer's guide will be revised shortly, and it's likely that's a better place for this discussion. If not, I'll add a blurb here. Reviewed by: jhb@, brooks@ Differential Review: https://reviews.freebsd.org/D22800	2019-12-13 22:32:05 +00:00
Warner Losh	16db09d8c1	Don't use contractions. Fix the date. Contractions cause problems for translators, so s/aren't/are not/ in the one place this slipped through. While here, noticed I commited with the date I did the work, not today's date. Fix that too. Noticed by: bjk@	2019-12-13 21:39:10 +00:00
John Baldwin	4b28d96e5d	Remove the deprecated timeout(9) interface. All in-tree consumers have been converted to callout(9). Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D22602	2019-12-13 21:03:12 +00:00
Warner Losh	b832a7e505	Create new wrapper function: bus_delayed_attach_children() Delay the attachment of children, when requested, until after interrutps are running. This is often needed to allow children to run transactions on i2c or spi busses. It's a common enough idiom that it will be useful to have its own wrapper. Reviewed by: ian Differential Revision: https://reviews.freebsd.org/D21465	2019-12-13 19:39:33 +00:00
Ryan Libby	9825eadf2c	bitset: rename confusing macro NAND to ANDNOT s/BIT_NAND/BIT_ANDNOT/, and for CPU and DOMAINSET too. The actual implementation is "and not" (or "but not"), i.e. A but not B. Fortunately this does appear to be what all existing callers want. Don't supply a NAND (not (A and B)) operation at this time. Discussed with: jeff Reviewed by: cem Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D22791	2019-12-13 09:32:16 +00:00
John Baldwin	a8a03706fb	Add a callout_func_t typedef for functions used with callout_*(). This typedef is the same as timeout_t except that it is in the callout namespace and header. Use this typedef in various places of the callout implementation that were either using the raw type or timeout_t. While here, add <sys/callout.h> to the manpage. Reviewed by: kib, imp MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D22751	2019-12-10 21:58:30 +00:00
Mateusz Guzik	abd80ddb94	vfs: introduce v_irflag and make v_type smaller The current vnode layout is not smp-friendly by having frequently read data avoidably sharing cachelines with very frequently modified fields. In particular v_iflag inspected for VI_DOOMED can be found in the same line with v_usecount. Instead make it available in the same cacheline as the v_op, v_data and v_type which all get read all the time. v_type is avoidably 4 bytes while the necessary data will easily fit in 1. Shrinking it frees up 3 bytes, 2 of which get used here to introduce a new flag field with a new value: VIRF_DOOMED. Reviewed by: kib, jeff Differential Revision: https://reviews.freebsd.org/D22715	2019-12-08 21:30:04 +00:00
Warner Losh	a7147da9d4	Fix accidentally changed copyright year. Noticed by: bapt@	2019-12-04 16:55:55 +00:00
Warner Losh	a339dcccb0	Regularize copyright notices for me. Remove stray All Rights Reserved and other non-license stuff. Make sure all copyrights have year.	2019-12-03 15:48:28 +00:00
Mark Johnston	003cf08ba9	Revise the page cache size policy. In r353734 the use of the page caches was limited to systems with a relatively large amount of RAM per CPU. This was to mitigate some issues reported with the system not able to keep up with memory pressure in cases where it had been able to do so prior to the addition of the direct free pool cache. This change re-enables those caches. The change modifies uma_zone_set_maxcache(), which was introduced specifically for the page cache zones. Rather than using it to limit only the full bucket cache, have it also set uz_count_max to provide an upper bound on the per-CPU cache size that is consistent with the number of items requested. Remove its return value since it has no use. Enable the page cache zones unconditionally, and limit them to 0.1% of the domain's pages. The limit can be overridden by the vm.pgcache_zone_max tunable as before. Change the item size parameter passed to uma_zcache_create() to the correct size, and stop setting UMA_ZONE_MAXBUCKET. This allows the page cache buckets to be adaptively sized, like the rest of UMA's caches. This also causes the initial bucket size to be small, so only systems which benefit from large caches will get them. Reviewed by: gallatin, jeff MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D22393	2019-11-22 16:30:47 +00:00
Rick Macklem	84208c00bc	Update the VOP_COPY_FILE_RANGE man page to reflect the semantic change made by r354574. This is a content change.	2019-11-10 01:21:10 +00:00
Rick Macklem	282c7fdc5f	Update the VOP_COPY_FILE_RANGE.9 man page to reflect the semantic change implemented by r354564. This is a content change.	2019-11-08 23:58:33 +00:00
Rick Macklem	b2ea509012	Fix the man page to correctly describe the use of the "len" argument. The man page incorrectly described the use of the"len" argument, which is updated to the number of bytes copied and not reduced by the number of bytes copied. This is a content change.	2019-11-08 06:40:17 +00:00
Andriy Gapon	337f6465a9	document taskqueue_start_threads_in_proc While here, fix taskqueue_start_threads_cpuset that was documented under old name of taskqueue_start_threads_pinned. MFC after: 4 weeks	2019-10-17 06:58:07 +00:00
Andriy Gapon	c812bea351	add superio.4 and superio.9 manual pages This adds basic documentation on what the superio driver is and how other drivers can interact with it. I decided to also document superio's ivar accessors. Reviewed by: bcr, brueffer (both manual contents only) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D21958	2019-10-11 11:13:47 +00:00
Andriy Gapon	93d9a79816	remove unrelated files accidentally committed in r353381	2019-10-10 07:41:42 +00:00
Andriy Gapon	d0c0856f63	emulate illumos membar_producer with atomic_thread_fence_rel membar_producer is supposed to be a store-store barrier. Also, in the code that FreeBSD has ported from illumos membar_producer is used only with regular stores to regular memory (with respect to caching). We do not have an MI primitive for the store-store barrier, so atomic_thread_fence_rel is the closest we have as it provides (load \| store) -> store barrier. Previously, membar_producer was an empty function call on all 32-bit arm-s, 32-bit powerpc, riscv and all mips variants. I think that it was inadequate. On other platforms, such as amd64, arm64, i386, powerpc64, sparc64, membar_producer was implemented using stronger primitives than required for a store-store barrier with respect to regular memory access. For example, it used sfence on amd64 and lock-ed nop in i386 (despite TSO). On powerpc64 we now use recommended lwsync instead of eieio. On sparc64 FreeBSD uses TSO mode. On arm64/aarch64 we now use dmb sy instead of dmb ish. Not sure if this is an improvement, actually. After this change we can drop opensolaris_atomic.S for aarch64, amd64, powerpc64 and sparc64 as all required atomic operations have either direct or light-weight mapping to FreeBSD native atomic operations. Discussed with: kib MFC after: 4 weeks	2019-10-10 07:39:41 +00:00
Mark Johnston	38a20ba16f	Fix a couple of nits in r352110. - Remove a dead variable from the amd64 pmap_extract_and_hold(). - Fix grammar in the vm_page_wire man page. Reported by: alc Reviewed by: alc, kib Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D21639	2019-09-16 15:06:19 +00:00
Mark Johnston	e8bcf6966b	Revert r352406, which contained changes I didn't intend to commit.	2019-09-16 15:04:45 +00:00
Mark Johnston	41fd4b9422	Fix a couple of nits in r352110. - Remove a dead variable from the amd64 pmap_extract_and_hold(). - Fix grammar in the vm_page_wire man page. Reported by: alc Reviewed by: alc, kib Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D21639	2019-09-16 15:03:12 +00:00
Yuri Pankov	e3e469850c	sbuf(9): fix sbuf_drain_func typedef markup Reviewed by: 0mp (previous version) Differential Revision: https://reviews.freebsd.org/D21569	2019-09-16 13:10:03 +00:00
Mark Johnston	fee2a2fa39	Change synchonization rules for vm_page reference counting. There are several mechanisms by which a vm_page reference is held, preventing the page from being freed back to the page allocator. In particular, holding the page's object lock is sufficient to prevent the page from being freed; holding the busy lock or a wiring is sufficent as well. These references are protected by the page lock, which must therefore be acquired for many per-page operations. This results in false sharing since the page locks are external to the vm_page structures themselves and each lock protects multiple structures. Transition to using an atomically updated per-page reference counter. The object's reference is counted using a flag bit in the counter. A second flag bit is used to atomically block new references via pmap_extract_and_hold() while removing managed mappings of a page. Thus, the reference count of a page is guaranteed not to increase if the page is unbusied, unmapped, and the object's write lock is held. As a consequence of this, the page lock no longer protects a page's identity; operations which move pages between objects are now synchronized solely by the objects' locks. The vm_page_wire() and vm_page_unwire() KPIs are changed. The former requires that either the object lock or the busy lock is held. The latter no longer has a return value and may free the page if it releases the last reference to that page. vm_page_unwire_noq() behaves the same as before; the caller is responsible for checking its return value and freeing or enqueuing the page as appropriate. vm_page_wire_mapped() is introduced for use in pmap_extract_and_hold(). It fails if the page is concurrently being unmapped, typically triggering a fallback to the fault handler. vm_page_wire() no longer requires the page lock and vm_page_unwire() now internally acquires the page lock when releasing the last wiring of a page (since the page lock still protects a page's queue state). In particular, synchronization details are no longer leaked into the caller. The change excises the page lock from several frequently executed code paths. In particular, vm_object_terminate() no longer bounces between page locks as it releases an object's pages, and direct I/O and sendfile(SF_NOCACHE) completions no longer require the page lock. In these latter cases we now get linear scalability in the common scenario where different threads are operating on different files. __FreeBSD_version is bumped. The DRM ports have been updated to accomodate the KPI changes. Reviewed by: jeff (earlier version) Tested by: gallatin (earlier version), pho Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D20486	2019-09-09 21:32:42 +00:00
Mark Johnston	e46cfc2542	Revert a portion of r351628 that I did not mean to commit. Reported by: mjg MFC with: r351628	2019-09-03 14:39:36 +00:00
Mark Johnston	08cfa56ea3	Extend uma_reclaim() to permit different reclamation targets. The page daemon periodically invokes uma_reclaim() to reclaim cached items from each zone when the system is under memory pressure. This is important since the size of these caches is unbounded by default. However it also results in bursts of high latency when allocating from heavily used zones as threads miss in the per-CPU caches and must access the keg in order to allocate new items. With r340405 we maintain an estimate of each zone's usage of its (per-NUMA domain) cache of full buckets. Start making use of this estimate to avoid reclaiming the entire cache when under memory pressure. In particular, introduce TRIM, DRAIN and DRAIN_CPU verbs for uma_reclaim() and uma_zone_reclaim(). When trimming, only items in excess of the estimate are reclaimed. Draining a zone reclaims all of the cached full buckets (the previous behaviour of uma_reclaim()), and may further drain the per-CPU caches in extreme cases. Now, when under memory pressure, the page daemon will trim zones rather than draining them. As a result, heavily used zones do not incur bursts of bucket cache misses following reclamation, but large, unused caches will be reclaimed as before. Reviewed by: jeff Tested by: pho (an earlier version) MFC after: 2 months Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D16667	2019-09-01 22:22:43 +00:00
Mark Johnston	d794b3a3c2	Update and clean up the UMA man page. - Fix warnings from igor and mandoc. - Provide a brief description of the separation between zones and their backend slab allocators. - Document cache zones and secondary zones. - Document the kernel config options added in r350659. - Document the uma_zalloc_pcpu() and uma_zfree_pcpu() wrappers. - Document uma_zone_reserve(), uma_zone_reserve_kva() and uma_zone_prealloc(). - Document uma_zone_alloc() and uma_zone_freef(). - Add some missing MLINKs and Xrefs. MFC after: 2 weeks	2019-08-30 19:35:44 +00:00
John Baldwin	93a0379392	Trim a spurious blank line I added in r348969. I did not bump .Dd since there is no content change. MFC after: 3 days	2019-08-19 17:28:12 +00:00
Konstantin Belousov	3a91d1062a	i386: Implement atomic_load_64(9) and atomic_store_64(9). Sponsored by: The FreeBSD Foundation MFC after: 1 week	2019-08-18 15:58:44 +00:00

1 2 3 4 5 ...

2835 Commits