freebsd-skq

Author	SHA1	Message	Date
jeff	ba27b5187b	Make v_wire_count a per-cpu counter(9) counter. This eliminates a significant source of cache line contention from vm_page_alloc(). Use accessors and vm_page_unwire_noq() so that the mechanism can be easily changed in the future. Reviewed by: markj Discussed with: kib, glebius Tested by: pho (earlier version) Sponsored by: Netflix, Dell/EMC Isilon Differential Revision: https://reviews.freebsd.org/D14273	2018-02-12 22:53:00 +00:00
ian	e1340a314d	Add a new sysctl, debug.clock_do_io, to allow manully triggering a one-shot read or write of all registered realtime clocks. In the read case, the values read are simply discarded. For writes, there's no alternative but to actually write the current system time to the device.	2018-02-12 17:41:11 +00:00
ian	e5909c170b	Add a set of convenience routines for RTC drivers to use for debug output, and a debug.clock_show_io sysctl to control debugging output.	2018-02-12 17:33:14 +00:00
ian	19e2991d59	Replace the existing print_ct() private debugging function with a set of three public functions to format and print the three major data structures used by realtime clock drivers (clocktime, bcd_clocktime, and timespec).	2018-02-12 16:25:56 +00:00
mckusick	83f7902028	Merge biodone_finish() back into biodone(). The primary purpose is to make the order of operations clearer to avoid the race condition that was fixed in r328914. In particular, this commit corrects a similar race that existed in the soft updates callback. Doing some sleuthing through the SVN repository, it appears that bufdone_finish() was added to support XFS: ------------------------------------------------------------------------ r153192 \| rodrigc \| 2005-12-06 19:39:08 -0800 (Tue, 06 Dec 2005) \| 13 lines Changes imported from XFS for FreeBSD project: - add fields to struct buf (needed by XFS) - 3 private fields: b_fsprivate1, b_fsprivate2, b_fsprivate3 - b_pin_count, count of pinned buffer - add new B_MANAGED flag - add breada() function to initiate asynchronous I/O on read-ahead blocks. - add bufdone_finish(), bpin(), bunpin_wait() functions Patches provided by: kan Reviewed by: phk Silence on: arch@ ------------------------------------------------------------------------ It does not appear to ever have been used for anything else. XFS was disconnected in r241607: ------------------------------------------------------------------------ r241607 \| attilio \| 2012-10-16 03:04:00 -0700 (Tue, 16 Oct 2012) \| 5 lines Disconnect non-MPSAFE XFS from the build in preparation for dropping GIANT from VFS. This is not targeted for MFC. ------------------------------------------------------------------------ and removed entirely in r247631: ------------------------------------------------------------------------ r247631 \| attilio \| 2013-03-02 07:33:54 -0800 (Sat, 02 Mar 2013) \| 5 lines Garbage collect XFS bits which are now already completely disconnected from the tree since few months. This is not targeted for MFC. ------------------------------------------------------------------------ Since XFS support is gone, there is no reason to retain biodone_finish(). Suggested by: Warner Losh (imp) Discussed with: cem, kib Tested by: Peter Holm (pho)	2018-02-09 19:50:47 +00:00
glebius	2c573b6df7	Fix boot_pages exhaustion on machines with many domains and cores, where size of UMA zone allocation is greater than page size. In this case zone of zones can not use UMA_MD_SMALL_ALLOC, and we need to postpone switch off of this zone from startup_alloc() until full launch of VM. o Always supply number of VM zones to uma_startup_count(). On machines with UMA_MD_SMALL_ALLOC ignore it completely, unless zsize goes over a page. In the latter case account VM zones for number of allocations from the zone of zones. o Rewrite startup_alloc() so that it will immediately switch off from itself any zone that is already capable of running real alloc. In worst case scenario we may leak a single page here. See comment in uma_startup_count(). o Hardcode call to uma_startup2() into vm_mem_init(). Otherwise some extra SYSINITs, e.g. vm_page_init() may sneak in before. o While here, remove uma_boot_pages_mtx. With recent changes to boot pages calculation, we are guaranteed to use all of the boot_pages in the early single threaded stage. Reported & tested by: mav	2018-02-09 04:45:39 +00:00
avg	75f02086d1	exec_map_first_page: fix an inverse condition introduced in r254138 While the bug itself was serious, as we could either pass a non-busied page to vm_pager_get_pages() or leak a busy page, it could only be triggered under a very rare condition where the page is already inserted into the object, but it is not valid yet. Reviewed by: kib MFC after: 2 weeks	2018-02-07 21:51:59 +00:00
glebius	ed8d237f4f	Fix three miscalculations in amount of boot pages: o Most of startup zones have struct uma_slab embedded into the slab, so provide macro UMA_SLAB_SPACE and use it instead of UMA_SLAB_SIZE, when calculating how many pages would certain kind of allocations require. Some zones are offpage, so we might have a positive inaccuracy. o The keg for the zone of zones is allocated "dynamically", so we need +1 when calculating amount of pages for kegs. [1] o The zones of zones and zones of kegs have arbitrary alignment of 32, and this also needs to be accounted for. [2] While here, spread more comments and improve diagnostic messages. Reported by: pho [1], jtl [2]	2018-02-07 18:32:51 +00:00
markj	d241ba38fa	Dequeue wired pages lazily. Previously, wiring a page would cause it to be removed from its page queue. In the common case, unwiring causes it to be enqueued at the tail of that page queue. This change modifies vm_page_wire() to not dequeue the page, thus avoiding the highly contended page queue locks. Instead, vm_page_unwire() takes care of requeuing the page as a single operation, and the page daemon dequeues wired pages as they are encountered during a queue scan to avoid needlessly revisiting them later. For pages in PQ_ACTIVE we do even better, since a requeue is unnecessary. The change improves scalability for some common workloads. For instance, threads wiring pages into the buffer cache no longer need to modify global page queues, and unwiring is usually done by the bufspace thread, so concurrency is not as much of an issue. As another example, many sysctl handlers wire the output buffer to avoid faults on copyout, and since the buffer is likely to be in PQ_ACTIVE, we now entirely avoid modifying the page queue in this case. The change also adds a block comment describing some properties of struct vm_page's reference counters, and the busy lock. Reviewed by: jeff Discussed with: alc, kib MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D11943	2018-02-07 16:57:10 +00:00
ian	7f4e81c493	Use const pointers for input data not modified by clock utility functions.	2018-02-06 22:17:01 +00:00
jeff	e67ec0d694	Use per-domain locks for vm page queue free. Move paging control from global to per-domain state. Protect reservations with the free lock from the domain that they belong to. Refactor to make vm domains more of a first class object. Reviewed by: markj, kib, gallatin Tested by: pho Sponsored by: Netflix, Dell/EMC Isilon Differential Revision: https://reviews.freebsd.org/D14000	2018-02-06 22:10:07 +00:00
glebius	59ed8012f8	Fix boot_pages calculation for machines that don't have UMA_MD_SMALL_ALLOC. o Call uma_startup1() after initializing kmem, vmem and domains. o Include 8 eight VM startup pages into uma_startup_count() calculation. o Account for vmem_startup() and vm_map_startup() preallocating pages. o Account for extra two allocations done by kmem_init() and vmem_create(). o Hardcode the place of execution of vm_radix_reserve_kva(). Using SYSINIT allowed several other SYSINITs to sneak in before it, thus bumping requirement for amount of boot pages.	2018-02-06 22:06:59 +00:00
glebius	baecd1a4b3	Followup on r302393 by cperciva, improving calculation of boot pages required for UMA startup. o Introduce another stage of UMA startup, which is entered after vm_page_startup() finishes. After this stage we don't yet enable buckets, but we can ask VM for pages. Rename stages to meaningful names while here. New list of stages: BOOT_COLD, BOOT_STRAPPED, BOOT_PAGEALLOC, BOOT_BUCKETS, BOOT_RUNNING. Enabling page alloc earlier allows us to dramatically reduce number of boot pages required. What is more important number of zones becomes consistent across different machines, as no MD allocations are done before the BOOT_PAGEALLOC stage. Now only UMA internal zones actually need to use startup_alloc(), however that may change, so vm_page_startup() provides its need for early zones as argument. o Introduce uma_startup_count() function, to avoid code duplication. The functions calculates sizes of zones zone and kegs zone, and calculates how many pages UMA will need to bootstrap. It counts not only of zone structures, but also of kegs, slabs and hashes. o Hide uma_startup_foo() declarations from public file. o Provide several DIAGNOSTIC printfs on boot_pages usage. o Bugfix: when calculating zone of zones size use (mp_maxid + 1) instead of mp_ncpus. Use resulting number not only in the size argument to zone_ctor() but also as args.size. Reviewed by: imp, gallatin (earlier version) Differential Revision: https://reviews.freebsd.org/D14054	2018-02-06 04:16:00 +00:00
mckusick	a7ac77d8a7	Occasional cylinder-group check-hash errors were being reported on systems running with a heavy filesystem load. Tracking down this bug was elusive because there were actually two problems. Sometimes the in-memory check hash was wrong and sometimes the check hash computed when doing the read was wrong. The occurrence of either error caused a check-hash mismatch to be reported. The first error was that the check hash in the in-memory cylinder group was incorrect. This error was caused by the following sequence of events: - We read a cylinder-group buffer and the check hash is valid. - We update its cg_time and cg_old_time which makes the in-memory check-hash value invalid but we do not mark the cylinder group dirty. - We do not make any other changes to the cylinder group, so we never mark it dirty, thus do not write it out, and hence never update the incorrect check hash for the in-memory buffer. - Later, the buffer gets freed, but the page with the old incorrect check hash is still in the VM cache. - Later, we read the cylinder group again, and the first page with the old check hash is still in the VM cache, but some other pages are not, so we have to do a read. - The read does not actually get the first page from disk, but rather from the VM cache, resulting in the old check hash in the buffer. - The value computed after doing the read does not match causing the error to be printed. The fix for this problem is to only set cg_time and cg_old_time as the cylinder group is being written to disk. This keeps the in-memory check-hash valid unless the cylinder group has had other modifications which will require it to be written with a new check hash calculated. It also requires that the check hash be recalculated in the in-memory cylinder group when it is marked clean after doing a background write. The second problem was that the check hash computed at the end of the read was incorrect because the calculation of the check hash on completion of the read was being done too soon. - When a read completes we had the following sequence: - bufdone() -- b_ckhashcalc (calculates check hash) -- bufdone_finish() --- vfs_vmio_iodone() (replaces bogus pages with the cached ones) - When we are reading a buffer where one or more pages are already in memory (but not all pages, or we wouldn't be doing the read), the I/O is done with bogus_page mapped in for the pages that exist in the VM cache. This mapping is done to avoid corrupting the cached pages if there is any I/O overrun. The vfs_vmio_iodone() function is responsible for replacing the bogus_page(s) with the cached ones. But we were calculating the check hash before the bogus_page(s) were replaced. Hence, when we were calculating the check hash, we were partly reading from bogus_page, which means we calculated a bad check hash (e.g., because multiple pages have been mapped to bogus_page, so its contents are indeterminate). The second fix is to move the check-hash calculation from bufdone() to bufdone_finish() after the call to vfs_vmio_iodone() so that it computes the check hash over the correct set of pages. With these two changes, the occasional cylinder-group check-hash errors are gone. Submitted by: David Pfitzner <dpfitzner@netflix.com> Reviewed by: kib Tested by: David Pfitzner	2018-02-06 00:19:46 +00:00
jhb	6145bc74c1	Ignore relocation tables for non-memory-resident sections. As a followup to r328101, ignore relocation tables for ELF object sections that are not memory resident. For modules loaded by the loader, ignore relocation tables whose associated section was not loaded by the loader (sh_addr is zero). For modules loaded at runtime via kldload(2), ignore relocation tables whose associated section is not marked with SHF_ALLOC. Reported by: Mori Hiroki <yamori813@yahoo.co.jp>, adrian Tested on: mips, mips64 MFC after: 1 month Sponsored by: DARPA / AFRL	2018-02-05 23:35:33 +00:00
jhb	3f3863ea7c	Always give ELF brands a chance to veto a match. If a brand provides a header_supported hook, check it when trying to find a brand based on a matching interpreter as well as in the final loop for the fallback brand. Previously a brand might reject a binary via a header_supported hook in one of the earlier loops, but still be chosen by one of these later loops. Reviewed by: kib Obtained from: CheriBSD MFC after: 2 weeks Sponsored by: DARPA / AFRL Differential Revision: https://reviews.freebsd.org/D13945	2018-02-05 23:27:42 +00:00
brooks	dc50858dea	Reduce duplication in extattr_*_(file\|link) syscalls. Reviewed by: rwatson Obtained from: CheriBSD Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D14173	2018-02-05 19:06:34 +00:00
brooks	44d1111709	ANSIfy syscall implementations. Reviewed by: rwatson Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D14172	2018-02-05 18:58:55 +00:00
brooks	59fcf4b241	Add kern.ipc.{msqids,semsegs,sema} sysctls for FreeBSD32. Stop leaking kernel pointers though theses sysctls and make sure that the padding in the structures is zeroed on allocation to avoid other leaks. Reviewed by: gordon, kib Obtained from: CheriBSD MFC after: 1 week Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D13459	2018-02-02 18:03:12 +00:00
hselasky	91b534f7d7	Slightly bump the maximum OID path for loading tunable SYSCTLs. Coming updates to the mlx5en(4) driver will require this. MFC after: 1 week Sponsored by: Mellanox Technologies	2018-02-02 12:42:46 +00:00
mckusick	dca0cf2869	One of the vnode fields listed by vn_printf is the union of pointers whose type depends on the type of vnode. Correct vn_printf so that it correctly identifies the name of the pointer that it is printing. Submitted by: Andreas Longwitz <longwitz at incore.de> MFC after: 1 week	2018-01-31 22:49:50 +00:00
emaste	187eb493d5	makesyscalls: permit a range of syscall numbers for UNIMPL Some ABIs have large gaps in syscall numbers. Allow gaps to be filled as ranges of UNIMPL, with an entry like: 248-1023 AUE_NULL UNIMPL unimplemented Reviewed by: jhb, gnn Sponsored by: Turing Robotic Industries Inc. Differential Revision: https://reviews.freebsd.org/D14122	2018-01-30 18:29:38 +00:00
bdrewery	59f27534cc	Don't use an .OBJDIR for 'make sysent'. Reported by: emaste, jhb Sponsored by: Dell EMC	2018-01-29 19:14:15 +00:00
lwhsu	520dbdaa53	Fix LINT build after r328508, add forgotten part in format string Reviewed by: delphij Differential Revision: https://reviews.freebsd.org/D14089	2018-01-29 02:29:08 +00:00
imp	bfc21ccd43	Create deprecation management functions. gone_in(majar, msg); If we're running in FreeBSD major, tell the user this code may be deleted soon. If we're running in FreeBSD major - 1, the the user is deprecated and will be gone in major. Otherwise say nothing. gone_in_dev(dev, major, msg) Just like gone_in, except use device_printf. New tunable / sysctl debug.oboslete_panic: 0 - don't panic, 1 - panic in major or newer , 2 - panic in major - 1 or newer default: 0 if NO_OBSOLETE_CODE is defined, then both of these turn into compile time errors when building for major. Add options NO_OBSOLETE_CODE to kernel build system. This lets us tag code that's going away so users know it will be gone, as well as automatically manage things. Differential Review: https://reviews.freebsd.org/D13818	2018-01-29 00:14:39 +00:00
imp	3506c70696	Add the DF_SUSPENDED flag to flags that are printed.	2018-01-28 05:13:17 +00:00
mckusick	b60c21e66e	For many years the message "fsync: giving up on dirty" has occationally appeared on UFS/FFS filesystems. In some cases it was promptly followed by a panic of "softdep_deallocate_dependencies: dangling deps". This fix should eliminate both of these occurences. Submitted by: Andreas Longwitz <longwitz at incore.de> Reviewed by: kib Tested by: Peter Holm (pho) PR: 225423 MFC after: 1 week	2018-01-26 18:17:11 +00:00
lwhsu	344189d5c0	Fix build for architectures where size_t is not unsigned long Reviewed by: cem Differential Revision: https://reviews.freebsd.org/D14045	2018-01-25 06:37:14 +00:00
cem	2b1bc6707d	malloc(9): Change nominal size to size_t to match standard C No functional change -- size_t matches unsigned long on all platforms. Reported by: bde Discussed with: jhb Sponsored by: Dell EMC Isilon	2018-01-24 19:37:18 +00:00
wma	86a31a6bb4	Reverting r328320	2018-01-24 13:57:01 +00:00
wma	d8d083c4f2	ULE: provide defaults to ts_cpu Fix a bug when the system has no CPU 0. When created, threads were implicitly assigned to CPU 0. This had no practical effect since a real CPU was chosen immediately by the scheduler. However, on systems without a CPU 0, sched_ule attempted to access the scheduler queue of the "old" CPU when assigned the initial choice of the old one. This caused an attempt to use illegal memory and a crash (or, more usually, a deadlock). Fix this by assigned new threads to the BSP explicitly and add some asserts to see that this problem does not recur. Authored by: Nathan Whitehorn <nwhitehorn@freebsd.org> Submitted by: Wojciech Macek <wma@semihalf.com> Obtained from: Semihalf Differential revision: https://reviews.freebsd.org/D13932	2018-01-24 07:54:05 +00:00
pfg	6f66677652	Forgot to sort here in r328238.	2018-01-22 02:26:10 +00:00
pfg	f0c6025eb6	Unsign some values related to allocation. When allocating memory through malloc(9), we always expect the amount of memory requested to be unsigned as a negative value would either stand for an error or an overflow. Unsign some values, found when considering the use of mallocarray(9), to avoid unnecessary casting. Also consider that indexes should be of at least the same size/type as the upper limit they pretend to index. MFC after: 3 weeks	2018-01-22 02:08:10 +00:00
pfg	ced875130d	Revert r327828, r327949, r327953, r328016-r328026, r328041: Uses of mallocarray(9). The use of mallocarray(9) has rocketed the required swap to build FreeBSD. This is likely caused by the allocation size attributes which put extra pressure on the compiler. Given that most of these checks are superfluous we have to choose better where to use mallocarray(9). We still have more uses of mallocarray(9) but hopefully this is enough to bring swap usage to a reasonable level. Reported by: wosch PR: 225197	2018-01-21 15:42:36 +00:00
nwhitehorn	e79f2b9178	Remove SFBUF_OPTIONAL_DIRECT_MAP and such hacks, replacing them across the kernel by PHYS_TO_DMAP() as previously present on amd64, arm64, riscv, and powerpc64. This introduces a new MI macro (PMAP_HAS_DMAP) that can be evaluated at runtime to determine if the architecture has a direct map; if it does not (or does) unconditionally and PMAP_HAS_DMAP is either 0 or 1, the compiler can remove the conditional logic. As part of this, implement PHYS_TO_DMAP() on sparc64 and mips64, which had similar things but spelled differently. 32-bit MIPS has a partial direct-map that maps poorly to this concept and is unchanged. Reviewed by: kib Suggestions from: marius, alc, kib Runtime tested on: amd64, powerpc64, powerpc, mips64	2018-01-19 17:46:31 +00:00
avg	d57a913b89	correct read-ahead calculations in vfs_bio_getpages Previously the calculations were done as if the requested region ended at the start of the last requested page, not its end. The problem as actually quite minor as it affected only stats and page prefaulting, not the actual page data, and only with specific parameters. Reviewed by: kib (previous version) MFC after: 2 weeks	2018-01-18 12:59:04 +00:00
wma	e93d0fe0cf	KDB: restart only CPUs stopped by KDB There is a case when not all CPUs went online. In that situation, restart only APs which were operational before entering KDB. Created by: Wojciech Macek <wma@semihalf.com> Obtained from: Semihalf Reviewed by: nwhitehorn Differential revision: https://reviews.freebsd.org/D13949 Sponsored by: QCM Technologies	2018-01-18 07:38:54 +00:00
jhb	1248834695	Require the SHF_ALLOC flag for program sections from kernel object modules. ELF object files can contain program sections which are not supposed to be loaded into memory (e.g. .comment). Normally the static linker uses these flags to decide which sections are allocated to loadable program segments in ELF binaries and shared objects (including kernels on all architectures and kernel modules on architectures other than amd64). Mapping ELF object files (such as amd64 kernel modules) into memory directly is a bit of a grey area. ELF object files are intended to be used as inputs to the static linker. As a result, there is not a standardized definition for what the memory layout of an ELF object should be (none of the section headers have valid virtual memory addresses for example). The kernel and loader were not checking the SHF_ALLOC flag but loading any program sections with certain types such as SHT_PROGBITS. As a result, the kernel and loader would load into RAM some sections that weren't marked with SHF_ALLOC such as .comment that are not loaded into RAM for kernel modules on other architectures (which are implemented as ELF shared objects). Aside from possibly requiring slightly more RAM to hold a kernel module this does not affect runtime correctness as the kernel relocates symbols based on the layout it uses. Debuggers such as gdb and lldb do not extract symbol tables from a running process or kernel. Instead, they replicate the memory layout of ELF executables and shared objects and use that to construct their own symbol tables. For executables and shared objects this works fine. For ELF objects the current logic in kgdb (and probably lldb based on a simple reading) assumes that only sections with SHF_ALLOC are memory resident when constructing a memory layout. If the debugger constructs a different memory layout than the kernel, then it will compute different addresses for symbols causing symbols in the debugger to appear to have the wrong values (though the kernel itself is working fine). The current port of mdb does not check SHF_ALLOC as it replicates the kernel's logic in its existing kernel support. The bfd linker sorts the sections in ELF object files such that all of the allocated sections (sections with SHF_ALLOCATED) are placed first followed by unallocated sections. As a result, when kgdb composed a memory layout using only the allocated sections, this layout happened to match the layout used by the kernel and loader. The lld linker does not sort the sections in ELF object files and mixed allocated and unallocated sections. This resulted in kgdb composing a different memory layout than the kernel and loader. We could either patch kgdb (and possibly in the future lldb) to use custom handling when generating memory layouts for kernel modules that are ELF objects, or we could change the kernel and loader to check SHF_ALLOCATED. I chose the latter as I feel we shouldn't be loading things into RAM that the module won't use. This should mostly be a NOP when linking with bfd but will allow the existing kgdb to work with amd64 kernel modules linked with lld. Note that we only require SHF_ALLOC for "program" sections for types like SHT_PROGBITS and SHT_NOBITS. Other section types such as symbol tables, string tables, and relocations must also be loaded and are not marked with SHF_ALLOC. Reported by: np Reviewed by: kib, emaste MFC after: 1 month Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D13926	2018-01-17 22:51:59 +00:00
jhb	0aaf564f94	Use long for the last argument to VOP_PATHCONF rather than a register_t. pathconf(2) and fpathconf(2) both return a long. The kern_[f]pathconf() functions now accept a pointer to a long value rather than modifying td_retval directly. Instead, the system calls explicitly store the returned long value in td_retval[0]. Requested by: bde Reviewed by: kib Sponsored by: Chelsio Communications	2018-01-17 22:36:58 +00:00
pfg	50d0301ca5	kern: make some use of mallocarray(9). Focus on code where we are doing multiplications within malloc(9). None of these ire likely to overflow, however the change is still useful as some static checkers can benefit from the allocation attributes we use for mallocarray. This initial sweep only covers malloc(9) calls with M_NOWAIT. No good reason but I started doing the changes before r327796 and at that time it was convenient to make sure the sorrounding code could handle NULL values. X-Differential revision: https://reviews.freebsd.org/D13837	2018-01-15 21:18:04 +00:00
ian	9c7cac637a	Add RTC clock conversions for BCD values, with non-panic validation. RTC clock hardware frequently uses BCD numbers. Currently the low-level bcd2bin() and bin2bcd() functions will KASSERT if given out-of-range BCD values. Every RTC driver must implement its own code for validating the unreliable data coming from the hardware to avoid a potential kernel panic. This change introduces two new functions, clock_bcd_to_ts() and clock_ts_to_bcd(). The former validates its inputs and returns EINVAL if any values are out of range. The latter guarantees the returned data will be valid BCD in a known format (4-digit years, etc). A new bcd_clocktime structure is used with the new functions. It is similar to the original clocktime structure, but defines the fields holding BCD values as uint8_t (uint16_t for year), and adds a PM flag for handling hours using AM/PM mode. PR: 224813 Differential Revision: https://reviews.freebsd.org/D13730 (no reviewers)	2018-01-14 17:01:37 +00:00
bz	8d499a89c5	Remove trailing whitespace. No functional change.	2018-01-14 15:01:25 +00:00
kib	78904c70bb	Add sysctl debug.kdb.stack_overflow to conveniently test kernel handling of the kstack overflow. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2018-01-13 11:59:49 +00:00
mjg	00f02d857e	sx: retry hard shared unlock just like in r327905 for rwlocks	2018-01-13 09:26:24 +00:00
mjg	6f29c630d9	rwlock: try regular read unlock even in the hard path Saves on turnstile trips if the lock got more readers.	2018-01-13 00:05:31 +00:00
jeff	f375b4dd66	Implement NUMA support in uma(9) and malloc(9). Allocations from specific domains can be done by the _domain() API variants. UMA also supports a first-touch policy via the NUMA zone flag. The slab layer is now segregated by VM domains and is precise. It handles iteration for round-robin directly. The per-cpu cache layer remains a mix of domains according to where memory is allocated and freed. Well behaved clients can achieve perfect locality with no performance penalty. The direct domain allocation functions have to visit the slab layer and so require per-zone locks which come at some expense. Reviewed by: Attilio (a slightly older version) Tested by: pho Sponsored by: Netflix, Dell/EMC Isilon	2018-01-12 23:25:05 +00:00
jeff	e7c9f84113	Implement NUMA policy for kmem_*(9). This maintains compatibility with reservations by giving each memory domain its own KVA space in vmem that is naturally aligned on superpage boundaries. Reviewed by: alc, markj, kib (some objections) Sponsored by: Netflix, Dell/EMC Isilon Tested by; pho Differential Revision: https://reviews.freebsd.org/D13289	2018-01-12 23:13:55 +00:00
jeff	058d03378a	Regenerate auto-generated files	2018-01-12 23:06:35 +00:00
jeff	94c7af8ca2	Implement 'domainset', a cpuset based NUMA policy mechanism. This allows userspace to control NUMA policy administratively and programmatically. Implement domainset based iterators in the page layer. Remove the now legacy numa_* syscalls. Cleanup some header polution created by having seq.h in proc.h. Reviewed by: markj, kib Discussed with: alc Tested by: pho Sponsored by: Netflix, Dell/EMC Isilon Differential Revision: https://reviews.freebsd.org/D13403	2018-01-12 22:48:23 +00:00
mjg	ddfd5797d3	mtx: use fcmpset to cover setting MTX_CONTESTED	2018-01-12 13:40:50 +00:00

1 2 3 4 5 ...

15915 Commits