freebsd-skq

Author	SHA1	Message	Date
cem	3038f1af7b	Include ktr.h in more compilation units Similar to r348026, exhaustive search for uses of CTRn() and cross reference ktr.h includes. Where it was obvious that an OS compat header of some kind included ktr.h indirectly, .c files were left alone. Some of these files clearly got ktr.h via header pollution in some scenarios, or tinderbox would not be passing prior to this revision, but go ahead and explicitly include it in files using it anyway. Like r348026, these CUs did not show up in tinderbox as missing the include. Reported by: peterj (arm64/mp_machdep.c) X-MFC-With: r347984 Sponsored by: Dell EMC Isilon	2019-05-21 20:38:48 +00:00
markj	b6758fe7e2	Stop checking TD_IDLETHREAD() in buffer cache routines. These predicates are vestigal and cannot be true today. For example, idle threads are not allowed to acquire locks. Also cache curthread in breada(). No functional change intended. Reviewed by: kib, mckusick MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D20066	2019-04-29 13:23:32 +00:00
asomers	2ea01560e1	[skip ci] fix typo in comment from r59840 MFC after: 2 weeks Sponsored by: The FreeBSD Foundation	2019-04-26 15:00:59 +00:00
mckusick	069ffc8a67	This is an additional fix for bug report 230962. When using extended attributes, the kernel can panic with either "ffs_truncate3" or with "softdep_deallocate_dependencies: dangling deps". The problem arises because the flushbuflist() function which is called to clear out buffers is passed either the V_NORMAL flag to indicate that it should flush buffer associated with the contents of the file or the V_ALT flag to indicate that it should flush the buffers associated with the extended attribute data. The buffers containing the extended attribute data are identified by having their BX_ALTDATA flag set in the buffer's b_xflags field. The BX_ALTDATA flag is set on the buffer when the extended attribute block is first allocated or when its contents are read in from the disk. On a busy system, a buffer may be reused for another purpose, but the contents of the block that it contained continues to be held in the main page cache. Each physical page is identified as holding the contents of a logical block within a specified file (identified by a vnode). When a request is made to read a file, the kernel first looks for the block in the existing buffers. If it is not found there, it checks the page cache to see if it is still there. If it is found in the page cache, then it is remapped into a new buffer thus avoiding the need to read it in from the disk. The bug is that when a buffer request made for an extended attribute is fulfilled by reconstituting a buffer from the page cache rather than reading it in from disk, the BX_ALTDATA flag was not being set. Thus the flushbuflist() function would never clear it out and the "ffs_truncate3" panic would occur because the vnode being cleared still had buffers on its clean-buffer list. If the extended attribute was being updated, it is first read, then updated, and finally written. If the read is fulfilled by reconstituting the buffer from the page cache the BX_ALTDATA flag was not set and thus the dirty buffer would never be flushed by flushbuflist(). Eventually the buffer would be recycled. Since it was never written it would have an unfinished dependency which would trigger the "softdep_deallocate_dependencies: dangling deps" panic. The fix is to ensure that the BX_ALTDATA flag is set when a buffer has been reconstituted from the page cache. PR: 230962 Reported by: 2t8mr7kx9f@protonmail.com Reviewed by: kib Tested by: Peter Holm MFC after: 1 week Sponsored by: Netflix	2019-03-12 19:08:41 +00:00
mckusick	a92139d82f	Augment DDB "show buffer" command to print the buffer's referenced vnode pointer (b_vp). The value of b_vp can be used by "show vnode" to print the vnode and "show vnodebufs" to print all the clean and dirty buffers associated with the vnode (which should include this buffer). Sponsored by: Netflix	2019-03-11 21:49:44 +00:00
mckusick	5f49ddd0a0	Add printing of b_ioflags to DDB `show buffer' command. Sponsored by: Netflix	2019-01-25 21:24:09 +00:00
glebius	35140f62be	Fix mistake in r343030: move nswbuf calculation back to kern_vfs_bio_buffer_alloc(), because in init_param2() nbuf isn't really initialized yet. Pointed out by: bde	2019-01-16 20:20:38 +00:00
glebius	7ee1aa34d4	Allocate pager bufs from UMA instead of 80-ish mutex protected linked list. o In vm_pager_bufferinit() create pbuf_zone and start accounting on how many pbufs are we going to have set. In various subsystems that are going to utilize pbufs create private zones via call to pbuf_zsecond_create(). The latter calls uma_zsecond_create(), and sets a limit on created zone. After startup preallocate pbufs according to requirements of all pbuf zones. Subsystems that used to have a private limit with old allocator now have private pbuf zones: md(4), fusefs, NFS client, smbfs, VFS cluster, FFS, swap, vnode pager. The following subsystems use shared pbuf zone: cam(4), nvme(4), physio(9), aio(4). They should have their private limits, but changing that is out of scope of this commit. o Fetch tunable value of kern.nswbuf from init_param2() and while here move NSWBUF_MIN to opt_param.h and eliminate opt_swap.h, that was holding only this option. Default values aren't touched by this commit, but they probably should be reviewed wrt to modern hardware. This change removes a tight bottleneck from sendfile(2) operation, that uses pbufs in vnode pager. Other pagers also would benefit from faster allocation. Together with: gallatin Tested by: pho	2019-01-15 01:02:16 +00:00
kib	966a7e1173	Correct accuracy of the barrier writes accounting. Discussed with: mckusick MFC after: 1 week Sponsored by: The FreeBSD Foundation	2018-12-02 12:53:39 +00:00
markj	e5f2eb7b67	Use plain atomic_{add,subtract} when that's sufficient. CID: 1386920 MFC after: 2 weeks	2018-11-06 17:32:25 +00:00
markj	746978a065	Avoid completing I/O when dumping core after a panic. Filesystem or pager completion callbacks are generally non-functional after a panic and may trigger deadlocks if invoked in this context (e.g., by attempting to destroying a buffer mapping). To avoid this situation, short-circuit I/O completion in biodone(). Reviewed by: imp Discussed with: mav MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D15592	2018-06-01 23:49:32 +00:00
mmacy	4eacc08586	vfs: annotate variables only used by debug builds as __unused	2018-05-19 04:59:39 +00:00
kib	57921f916a	Detect and optimize reads from the hole on UFS. - Create getblkx(9) variant of getblk(9) which can return error. - Add GB_NOSPARSE flag for getblk()/getblkx() which requests that BMAP was performed before the buffer is created, and EJUSTRETURN returned in case the requested block does not exist. - Make ffs_read() use GB_NOSPARSE to avoid instantiating buffer (and allocating the pages for it), copying from zero_region instead. The end result is less page allocations and buffer recycling when a hole is read, which is important for some benchmarks. Requested and reviewed by: jeff Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D14917	2018-05-13 09:47:28 +00:00
markj	c1d51518dd	Fix some races introduced in r332974. With r332974, when performing a synchronized access of a page's "queue" field, one must first check whether the page is logically dequeued. If so, then the page lock does not prevent the page from being removed from its page queue. Intoduce vm_page_queue(), which returns the page's logical queue index. In some cases, direct access to the "queue" field is still required, but such accesses should be confined to sys/vm. Reported and tested by: pho Reviewed by: kib Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D15280	2018-05-04 17:17:30 +00:00
tijl	83ceca9af0	Make bufdaemon and bufspacedaemon use kthread_suspend_check instead of kproc_suspend_check. In r329612 bufspacedaemon was turned into a thread of the bufdaemon process causing both to call kproc_suspend_check with the same proc argument and that function contains the following while loop: while (SIGISMEMBER(p->p_siglist, SIGSTOP)) { wakeup(&p->p_siglist); msleep(&p->p_siglist, &p->p_mtx, PPAUSE, "kpsusp", 0); } So one thread wakes up the other and the other wakes up the first again, locking up UP machines on shutdown. Also register the shutdown handlers with SHUTDOWN_PRI_LAST + 100 so they run after the syncer has shutdown, because the syncer can cause a situation where bufdaemon help is needed to proceed. PR: 227404 Reviewed by: kib Tested by: cy, rmacklem	2018-04-22 16:05:29 +00:00
brooks	9d79658aab	Move most of the contents of opt_compat.h to opt_global.h. opt_compat.h is mentioned in nearly 180 files. In-progress network driver compabibility improvements may add over 100 more so this is closer to "just about everywhere" than "only some files" per the guidance in sys/conf/options. Keep COMPAT_LINUX32 in opt_compat.h as it is confined to a subset of sys/compat/linux/*.c. A fake _COMPAT_LINUX option ensure opt_compat.h is created on all architectures. Move COMPAT_LINUXKPI to opt_dontuse.h as it is only used to control the set of compiled files. Reviewed by: kib, cem, jhb, jtl Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D14941	2018-04-06 17:35:35 +00:00
jeff	63a483ed5f	Fix a bug introduced in r329612 that slowly invalidates all clean bufs. Reported by: bde Reviewed by: bde Sponsored by: Netflix, Dell/EMC Isilon	2018-03-26 18:36:17 +00:00
glebius	b14bf58088	Redo r331328. We need to fix not only type but also format. While here again notice that we are fixing regression from r331106.	2018-03-22 05:26:27 +00:00
glebius	293a5a55c1	Fix sysctl types broken in r329612.	2018-03-21 23:21:32 +00:00
markj	40e57cc4d2	Elide the object lock in the common case in vfs_vmio_unwire(). The object lock was only needed when attempting to free B_DIRECT buffer pages, and for testing for invalid pages (and freeing them if so). Handle the latter by instead moving invalid pages near the head of the inactive queue, where they will be reclaimed quickly. Reviewed by: alc, kib, jeff MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D14778	2018-03-21 21:15:43 +00:00
imp	16e006edb9	bufshutdown is no longer called with Giant held, so there's no need to drop or pickup Giant anymore. Remove that code and adjust comments.	2018-03-21 14:46:59 +00:00
jhibbits	68f6ae368c	Cast through uintptr_t to narrow the buf domain pointer on 32-bit archs arg2 is an intmax_t, which on 32-bit architectures is 64 bits, wider than a pointer. When &bdomain[i] is added to arg2 it widens from uintptr_t to intmax_t, then gcc whines when it gets cast to a pointer. Casting through uintptr_t silences this warning.	2018-03-20 02:01:30 +00:00
markj	3394e82adc	Have vm_page_{deactivate,launder}() requeue already-queued pages. In many cases the page is not enqueued so the change will have no effect. However, the change is needed to support an optimization in the fault handler and in some cases (sendfile, the buffer cache) it was being emulated by the caller anyway. Reviewed by: alc Tested by: pho MFC after: 2 weeks X-Differential Revision: https://reviews.freebsd.org/D14625	2018-03-18 16:40:56 +00:00
jeff	3fc6b3c461	Move the dirty queues inside the per-domain structure. This resolves a bug where we had not hit global dirty limits but a single queue was starved for space by dirty buffers. A single buf_daemon is maintained for now. Add a bd_speedup() when we are low on bufspace. This can happen due to SUJ keeping many bufs locked until a cg block is written. Document this with a comment. Fix sysctls to work with per-domain variables. Add more ddb debugging. Reported by: pho Reviewed by: kib Tested by: pho Sponsored by: Netflix, Dell/EMC Isilon Differential Revision: https://reviews.freebsd.org/D14705	2018-03-17 18:14:49 +00:00
cem	a476ed15dd	vfs_bio.c: Apply cleanups motivated by Coverity analysis It is believed that the conditions Coverity indicated were actually impossible to hit. So this patch just adds a cleanup to only compute v_mount once in brelse(), and in vfs_bio_getpages() always initializes error to zero to appease the static analyzer. No functional change intended. Submitted by: Darrick Lew <darrick.freebsd AT gmail.com> Reviewed by: kib Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D14613	2018-03-14 22:11:45 +00:00
jeff	429f27856d	Fix issues with sparse cpu allocation. Consistently use mp_maxid + 1. Reported by: pho Reviewed by: markj Sponsored by: Netflix, Dell/EMC Isilon	2018-02-25 00:35:21 +00:00
mjg	f85647797f	Fix up sysctl vfs.buffercache broken in r329612 Sample problem: top: sysctl(vfs.bufspace...) expected 8, got 4 Reported by: O. Hartmann <ohartmann walstatt.org>	2018-02-22 20:39:25 +00:00
jeff	21bd298f13	Fix the broken subqueue assignment for the cleanq. Reported by: pho Tested by: pho Sponsored by: Netflix, Dell/EMC Isilon	2018-02-20 21:27:17 +00:00
jeff	e3be9f8fb6	Further parallelize the buffer cache. Provide multiple clean queues partitioned into 'domains'. Each domain manages its own bufspace and has its own bufspace daemon. Each domain has a set of subqueues indexed by the current cpuid to reduce lock contention on the cleanq. Refine the sleep/wakeup around the bufspace daemon to use atomics as much as possible. Add a B_REUSE flag that is used to requeue bufs during the scan to approximate LRU rather than locking the queue on every use of a frequently accessed buf. Implement bufspace_reserve with only atomic_fetchadd to avoid loop restarts. Reviewed by: markj Tested by: pho Sponsored by: Netflix, Dell/EMC Isilon Differential Revision: https://reviews.freebsd.org/D14274	2018-02-20 00:06:07 +00:00
jeff	ba27b5187b	Make v_wire_count a per-cpu counter(9) counter. This eliminates a significant source of cache line contention from vm_page_alloc(). Use accessors and vm_page_unwire_noq() so that the mechanism can be easily changed in the future. Reviewed by: markj Discussed with: kib, glebius Tested by: pho (earlier version) Sponsored by: Netflix, Dell/EMC Isilon Differential Revision: https://reviews.freebsd.org/D14273	2018-02-12 22:53:00 +00:00
mckusick	83f7902028	Merge biodone_finish() back into biodone(). The primary purpose is to make the order of operations clearer to avoid the race condition that was fixed in r328914. In particular, this commit corrects a similar race that existed in the soft updates callback. Doing some sleuthing through the SVN repository, it appears that bufdone_finish() was added to support XFS: ------------------------------------------------------------------------ r153192 \| rodrigc \| 2005-12-06 19:39:08 -0800 (Tue, 06 Dec 2005) \| 13 lines Changes imported from XFS for FreeBSD project: - add fields to struct buf (needed by XFS) - 3 private fields: b_fsprivate1, b_fsprivate2, b_fsprivate3 - b_pin_count, count of pinned buffer - add new B_MANAGED flag - add breada() function to initiate asynchronous I/O on read-ahead blocks. - add bufdone_finish(), bpin(), bunpin_wait() functions Patches provided by: kan Reviewed by: phk Silence on: arch@ ------------------------------------------------------------------------ It does not appear to ever have been used for anything else. XFS was disconnected in r241607: ------------------------------------------------------------------------ r241607 \| attilio \| 2012-10-16 03:04:00 -0700 (Tue, 16 Oct 2012) \| 5 lines Disconnect non-MPSAFE XFS from the build in preparation for dropping GIANT from VFS. This is not targeted for MFC. ------------------------------------------------------------------------ and removed entirely in r247631: ------------------------------------------------------------------------ r247631 \| attilio \| 2013-03-02 07:33:54 -0800 (Sat, 02 Mar 2013) \| 5 lines Garbage collect XFS bits which are now already completely disconnected from the tree since few months. This is not targeted for MFC. ------------------------------------------------------------------------ Since XFS support is gone, there is no reason to retain biodone_finish(). Suggested by: Warner Losh (imp) Discussed with: cem, kib Tested by: Peter Holm (pho)	2018-02-09 19:50:47 +00:00
markj	d241ba38fa	Dequeue wired pages lazily. Previously, wiring a page would cause it to be removed from its page queue. In the common case, unwiring causes it to be enqueued at the tail of that page queue. This change modifies vm_page_wire() to not dequeue the page, thus avoiding the highly contended page queue locks. Instead, vm_page_unwire() takes care of requeuing the page as a single operation, and the page daemon dequeues wired pages as they are encountered during a queue scan to avoid needlessly revisiting them later. For pages in PQ_ACTIVE we do even better, since a requeue is unnecessary. The change improves scalability for some common workloads. For instance, threads wiring pages into the buffer cache no longer need to modify global page queues, and unwiring is usually done by the bufspace thread, so concurrency is not as much of an issue. As another example, many sysctl handlers wire the output buffer to avoid faults on copyout, and since the buffer is likely to be in PQ_ACTIVE, we now entirely avoid modifying the page queue in this case. The change also adds a block comment describing some properties of struct vm_page's reference counters, and the busy lock. Reviewed by: jeff Discussed with: alc, kib MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D11943	2018-02-07 16:57:10 +00:00
mckusick	a7ac77d8a7	Occasional cylinder-group check-hash errors were being reported on systems running with a heavy filesystem load. Tracking down this bug was elusive because there were actually two problems. Sometimes the in-memory check hash was wrong and sometimes the check hash computed when doing the read was wrong. The occurrence of either error caused a check-hash mismatch to be reported. The first error was that the check hash in the in-memory cylinder group was incorrect. This error was caused by the following sequence of events: - We read a cylinder-group buffer and the check hash is valid. - We update its cg_time and cg_old_time which makes the in-memory check-hash value invalid but we do not mark the cylinder group dirty. - We do not make any other changes to the cylinder group, so we never mark it dirty, thus do not write it out, and hence never update the incorrect check hash for the in-memory buffer. - Later, the buffer gets freed, but the page with the old incorrect check hash is still in the VM cache. - Later, we read the cylinder group again, and the first page with the old check hash is still in the VM cache, but some other pages are not, so we have to do a read. - The read does not actually get the first page from disk, but rather from the VM cache, resulting in the old check hash in the buffer. - The value computed after doing the read does not match causing the error to be printed. The fix for this problem is to only set cg_time and cg_old_time as the cylinder group is being written to disk. This keeps the in-memory check-hash valid unless the cylinder group has had other modifications which will require it to be written with a new check hash calculated. It also requires that the check hash be recalculated in the in-memory cylinder group when it is marked clean after doing a background write. The second problem was that the check hash computed at the end of the read was incorrect because the calculation of the check hash on completion of the read was being done too soon. - When a read completes we had the following sequence: - bufdone() -- b_ckhashcalc (calculates check hash) -- bufdone_finish() --- vfs_vmio_iodone() (replaces bogus pages with the cached ones) - When we are reading a buffer where one or more pages are already in memory (but not all pages, or we wouldn't be doing the read), the I/O is done with bogus_page mapped in for the pages that exist in the VM cache. This mapping is done to avoid corrupting the cached pages if there is any I/O overrun. The vfs_vmio_iodone() function is responsible for replacing the bogus_page(s) with the cached ones. But we were calculating the check hash before the bogus_page(s) were replaced. Hence, when we were calculating the check hash, we were partly reading from bogus_page, which means we calculated a bad check hash (e.g., because multiple pages have been mapped to bogus_page, so its contents are indeterminate). The second fix is to move the check-hash calculation from bufdone() to bufdone_finish() after the call to vfs_vmio_iodone() so that it computes the check hash over the correct set of pages. With these two changes, the occasional cylinder-group check-hash errors are gone. Submitted by: David Pfitzner <dpfitzner@netflix.com> Reviewed by: kib Tested by: David Pfitzner	2018-02-06 00:19:46 +00:00
avg	d57a913b89	correct read-ahead calculations in vfs_bio_getpages Previously the calculations were done as if the requested region ended at the start of the last requested page, not its end. The problem as actually quite minor as it affected only stats and page prefaulting, not the actual page data, and only with specific parameters. Reviewed by: kib (previous version) MFC after: 2 weeks	2018-01-18 12:59:04 +00:00
jeff	f375b4dd66	Implement NUMA support in uma(9) and malloc(9). Allocations from specific domains can be done by the _domain() API variants. UMA also supports a first-touch policy via the NUMA zone flag. The slab layer is now segregated by VM domains and is precise. It handles iteration for round-robin directly. The per-cpu cache layer remains a mix of domains according to where memory is allocated and freed. Well behaved clients can achieve perfect locality with no performance penalty. The direct domain allocation functions have to visit the slab layer and so require per-zone locks which come at some expense. Reviewed by: Attilio (a slightly older version) Tested by: pho Sponsored by: Netflix, Dell/EMC Isilon	2018-01-12 23:25:05 +00:00
pfg	cc22a86800	sys/kern: adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 2-Clause license, however the tool I was using misidentified many licenses so this was mostly a manual - error prone - task. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts.	2017-11-27 15:20:12 +00:00
scottl	6e64b87b4f	Update a comment in brelse() to match reality.	2017-11-20 20:53:03 +00:00
jeff	3c355d849c	Replace manyinstances of VM_WAIT with blocking page allocation flags similar to the kernel memory allocator. This simplifies NUMA allocation because the domain will be known at wait time and races between failure and sleeping are eliminated. This also reduces boilerplate code and simplifies callers. A wait primitive is supplied for uma zones for similar reasons. This eliminates some non-specific VM_WAIT calls in favor of more explicit sleeps that may be satisfied without new pages. Reviewed by: alc, kib, markj Tested by: pho Sponsored by: Netflix, Dell/EMC Isilon	2017-11-08 02:39:37 +00:00
mckusick	4c3c44cdd8	Continuing efforts to provide hardening of FFS, this change adds a check hash to cylinder groups. If a check hash fails when a cylinder group is read, no further allocations are attempted in that cylinder group until it has been fixed by fsck. This avoids a class of filesystem panics related to corrupted cylinder group maps. The hash is done using crc32c. Check hases are added only to UFS2 and not to UFS1 as UFS1 is primarily used in embedded systems with small memories and low-powered processors which need as light-weight a filesystem as possible. Specifics of the changes: sys/sys/buf.h: Add BX_FSPRIV to reserve a set of eight b_xflags that may be used by individual filesystems for their own purpose. Their specific definitions are found in the header files for each filesystem that uses them. Also add fields to struct buf as noted below. sys/kern/vfs_bio.c: It is only necessary to compute a check hash for a cylinder group when it is actually read from disk. When calling bread, you do not know whether the buffer was found in the cache or read. So a new flag (GB_CKHASH) and a pointer to a function to perform the hash has been added to breadn_flags to say that the function should be called to calculate a hash if the data has been read. The check hash is placed in b_ckhash and the B_CKHASH flag is set to indicate that a read was done and a check hash calculated. Though a rather elaborate mechanism, it should also work for check hashing other metadata in the future. A kernel internal API change was to change breada into a static fucntion and add flags and a function pointer to a check-hash function. sys/ufs/ffs/fs.h: Add flags for types of check hashes; stored in a new word in the superblock. Define corresponding BX_ flags for the different types of check hashes. Add a check hash word in the cylinder group. sys/ufs/ffs/ffs_alloc.c: In ffs_getcg do the dance with breadn_flags to get a check hash and if one is provided, check it. sys/ufs/ffs/ffs_vfsops.c: Copy across the BX_FFSTYPES flags in background writes. Update the check hash when writing out buffers that need them. sys/ufs/ffs/ffs_snapshot.c: Recompute check hash when updating snapshot cylinder groups. sys/libkern/crc32.c: lib/libufs/Makefile: lib/libufs/libufs.h: lib/libufs/cgroup.c: Include libkern/crc32.c in libufs and use it to compute check hashes when updating cylinder groups. Four utilities are affected: sbin/newfs/mkfs.c: Add the check hashes when building the cylinder groups. sbin/fsck_ffs/fsck.h: sbin/fsck_ffs/fsutil.c: Verify and update check hashes when checking and writing cylinder groups. sbin/fsck_ffs/pass5.c: Offer to add check hashes to existing filesystems. Precompute check hashes when rebuilding cylinder group (although this will be done when it is written in fsutil.c it is necessary to do it early before comparing with the old cylinder group) sbin/dumpfs/dumpfs.c Print out the new check hash flag(s) sbin/fsdb/Makefile: Needs to add libufs now used by pass5.c imported from fsck_ffs. Reviewed by: kib Tested by: Peter Holm (pho)	2017-09-22 12:45:15 +00:00
mjg	4cc87bd651	Start annotating global _padalign locks with __exclusive_cache_line While these locks are guarnteed to not share their respective cache lines, their current placement leaves unnecessary holes in lines which preceeded them. For instance the annotation of vm_page_queue_free_mtx allows 2 neighbour cachelines (previously separate by the lock) to be collapsed into 1. The annotation is only effective on architectures which have it implemented in their linker script (currently only amd64). Thus locks are not converted to their not-padaligned variants as to not affect the rest. MFC after: 1 week	2017-09-06 20:28:18 +00:00
markj	6f4724899b	Modify vm_page_grab_pages() to handle VM_ALLOC_NOWAIT. This will allow its use in sendfile_swapin(). Reviewed by: alc, kib MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D11942	2017-08-11 16:29:22 +00:00
alc	318304a5b7	Introduce vm_page_grab_pages(), which is intended to replace loops calling vm_page_grab() on consecutive page indices. Besides simplifying the code in the caller, vm_page_grab_pages() allows for batching optimizations. For example, the current implementation replaces calls to vm_page_lookup() on consecutive page indices by cheaper calls to vm_page_next(). Reviewed by: kib, markj Tested by: pho (an earlier version) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D11926	2017-08-09 04:23:04 +00:00
markj	2c9a28c567	Batch v_wire_count decrements in vm_hold_free_pages(). Atomic updates to v_wire_count are a significant source of contention, so combine multiple updates into one in this easy case. Also remove an old printf that gets executed if the page is shared-busied, which is a case that will lead to a panic anyway. Reviewed by: alc, kib MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D11791	2017-07-31 18:48:58 +00:00
rmacklem	d25c98bbc4	Make MAXBCACHEBUF a tunable called vfs.maxbcachebuf. By making MAXBCACHEBUF a tunable, it can be increased to allow for larger read/write data sizes for the NFS client. The tunable is limited to MAXPHYS, which is currently 128K. Making MAXPHYS a tunable or increasing its value is being discussed, since it would be nice to support a read/write data size of 1Mbyte for the NFS client when mounting the AmazonEFS file service. Reviewed by: kib MFC after: 2 weeks Relnotes: yes Differential Revision: https://reviews.freebsd.org/D10991	2017-06-17 22:24:19 +00:00
trasz	d01492fe6d	Make it possible to terminate "show lockedbufs" by pressing "q". MFC after: 2 weeks	2017-04-23 22:20:25 +00:00
trasz	e2433e37fa	Improve BUF_TRACKING by not displaying NULL entries. Reviewed by: cem MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D10443	2017-04-23 17:39:31 +00:00
glebius	21ead51d79	- Remove 'struct vmmeter' from 'struct pcpu', leaving only global vmmeter in place. To do per-cpu stats, convert all fields that previously were maintained in the vmmeters that sit in pcpus to counter(9). - Since some vmmeter stats may be touched at very early stages of boot, before we have set up UMA and we can do counter_u64_alloc(), provide an early counter mechanism: o Leave one spare uint64_t in struct pcpu, named pc_early_dummy_counter. o Point counter(9) fields of vmmeter to pcpu[0].pc_early_dummy_counter, so that at early stages of boot, before counters are allocated we already point to a counter that can be safely written to. o For sparc64 that required a whole dummy pcpu[MAXCPU] array. Further related changes: - Don't include vmmeter.h into pcpu.h. - vm.stats.vm.v_swappgsout and vm.stats.vm.v_swappgsin changed to 64-bit, to match kernel representation. - struct vmmeter hidden under _KERNEL, and only vmstat(1) is an exclusion. This is based on benno@'s 4-year old patch: https://lists.freebsd.org/pipermail/freebsd-arch/2013-July/014471.html Reviewed by: kib, gallatin, marius, lidl Differential Revision: https://reviews.freebsd.org/D10156	2017-04-17 17:34:47 +00:00
trasz	46b5afe7b1	Don't try to write out bufs that have already failed with ENXIO. This fixes some panics after disconnecting mounted disks. Submitted by: imp (slightly different version, which I've then lost) Reviewed by: kib, imp, mckusick MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D9674	2017-04-14 20:15:34 +00:00
alc	f319609095	Style fixes. In particular, the variable "bogus" is used like a Boolean. Define it as such. Reviewed by: kib MFC after: 1 week	2017-03-19 23:06:11 +00:00
kib	8cf2af841c	Use atop() instead of OFF_TO_IDX() for convertion of addresses or addresses offsets, as intended. Suggested and reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2017-03-14 19:39:17 +00:00

1 2 3 4 5 ...

768 Commits