freebsd-nq

Author	SHA1	Message	Date
Alexander Motin	10ae42ccbd	GEOM: Set G_CF_DIRECT_SEND/RECEIVE for taste consumers. All I/O requests through the taste consumers are synchronous, done with g_read_data() and without any locks held. It makes no sense to delegate the I/O to g_down/g_up threads. This removes many of context switches during disk retaste. MFC after: 2 weeks	2022-01-29 21:59:03 -05:00
Edward Tomasz Napierala	739a9c51b0	geom(4): Fix some of the "set but not used" warnings The few I've left in place look like potential bugs. Sponsored By: EPSRC	2021-12-18 11:42:34 +00:00
Mateusz Guzik	cb2bfd3ecb	geom_journal: plug set-but-not-unused vars Sponsored by: Rubicon Communications, LLC ("Netgate")	2021-11-24 21:21:59 +00:00
Gordon Bergling	9d2e51884e	gjournal(8): Fix a typo in a source code comment - s/writting/writing/ MFC after: 3 days	2021-11-03 17:14:00 +01:00
Gordon Bergling	5bdf58e196	Fix some common typos in source code comments - s/priviledged/privileged/ - s/funtion/function/ - s/doens't/doesn't/ - s/sychronization/synchronization/ MFC after: 3 days	2021-08-28 18:57:23 +02:00
Alexander Motin	c2da954203	geom(4): Mark all sysctls as CTLFLAG_MPSAFE. This code does not use Giant lock for very long time. MFC after: 2 weeks	2021-08-10 20:18:46 -04:00
Konstantin Belousov	cd85379104	Make MAXPHYS tunable. Bump MAXPHYS to 1M. Replace MAXPHYS by runtime variable maxphys. It is initialized from MAXPHYS by default, but can be also adjusted with the tunable kern.maxphys. Make b_pages[] array in struct buf flexible. Size b_pages[] for buffer cache buffers exactly to atop(maxbcachebuf) (currently it is sized to atop(MAXPHYS)), and b_pages[] for pbufs is sized to atop(maxphys) + 1. The +1 for pbufs allow several pbuf consumers, among them vmapbuf(), to use unaligned buffers still sized to maxphys, esp. when such buffers come from userspace (). Overall, we save significant amount of otherwise wasted memory in b_pages[] for buffer cache buffers, while bumping MAXPHYS to desired high value. Eliminate all direct uses of the MAXPHYS constant in kernel and driver sources, except a place which initialize maxphys. Some random (and arguably weird) uses of MAXPHYS, e.g. in linuxolator, are converted straight. Some drivers, which use MAXPHYS to size embeded structures, get private MAXPHYS-like constant; their convertion is out of scope for this work. Changes to cam/, dev/ahci, dev/ata, dev/mpr, dev/mpt, dev/mvs, dev/siis, where either submitted by, or based on changes by mav. Suggested by: mav () Reviewed by: imp, mav, imp, mckusick, scottl (intermediate versions) Tested by: pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D27225	2020-11-28 12:12:51 +00:00
Edward Tomasz Napierala	d22ff249d9	Make g_attach() return ENXIO for orphaned providers; update various classes to add missing error checking. Reviewed by: imp MFC after: 2 weeks Sponsored by: NetApp, Inc. Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D26658	2020-10-18 16:24:08 +00:00
Xin LI	8510f61acd	sys/geom: consistently use _PATH_DEV instead of hardcoding "/dev/". Reviewed by: cem MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D25565	2020-07-09 02:52:39 +00:00
Kirk McKusick	9407f25df2	Optimize g_journal's superblock update by noting that the summary information is neither read nor written so it need not be written out when updating the superblock. PR: 247425 Sponsored by: Netflix	2020-06-23 21:44:00 +00:00
Kirk McKusick	34816cb9ae	Move the pointers stored in the superblock into a separate fs_summary_info structure. This change was originally done by the CheriBSD project as they need larger pointers that do not fit in the existing superblock. This cleanup of the superblock eases the task of the commit that immediately follows this one. Suggested by: brooks Reviewed by: kib PR: 246983 Sponsored by: Netflix	2020-06-19 01:02:53 +00:00
Mark Johnston	c205ac921b	geom_journal: Only stop the switcher process if one was started. PR: 243196 MFC after: 1 week	2020-04-03 13:57:41 +00:00
Pawel Biernacki	7029da5c36	Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many) r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are still not MPSAFE (or already are but aren’t properly marked). Use it in preparation for a general review of all nodes. This is non-functional change that adds annotations to SYSCTL_NODE and SYSCTL_PROC nodes using one of the soon-to-be-required flags. Mark all obvious cases as MPSAFE. All entries that haven't been marked as MPSAFE before are by default marked as NEEDGIANT Approved by: kib (mentor, blanket) Commented by: kib, gallatin, melifaro Differential Revision: https://reviews.freebsd.org/D23718	2020-02-26 14:26:36 +00:00
Warner Losh	8b522bdae6	Pass BIO_SPEEDUP through all the geom layers While some geom layers pass unknown commands down, not all do. For the ones that don't, pass BIO_SPEEDUP down to the providers that constittue the geom, as applicable. No changes to vinum or virstor because I was unsure how to add this support, and I'm also unsure how to test these. gvinum doesn't implement BIO_FLUSH either, so it may just be poorly maintained. gvirstor is for testing and not supportig BIO_SPEEDUP is fine. Reviewed by: chs Differential Revision: https://reviews.freebsd.org/D23183	2020-01-17 01:15:55 +00:00
Mateusz Guzik	879e0604ee	Add KERNEL_PANICKED macro for use in place of direct panicstr tests	2020-01-12 06:07:54 +00:00
Mateusz Guzik	c8b3463dd0	vfs: reimplement deferred inactive to use a dedicated flag (VI_DEFINACT) The previous behavior of leaving VI_OWEINACT vnodes on the active list without a hold count is eliminated. Hold count is kept and inactive processing gets explicitly deferred by setting the VI_DEFINACT flag. The syncer is then responsible for vdrop. Reviewed by: kib (previous version) Tested by: pho (in a larger patch, previous version) Differential Revision: https://reviews.freebsd.org/D23036	2020-01-07 15:56:24 +00:00
Conrad Meyer	ac03832ef3	GEOM: Reduce unnecessary log interleaving with sbufs Similar to what was done for device_printfs in r347229. Convert g_print_bio() to a thin shim around g_format_bio(), which acts on an sbuf; documented in g_bio.9. Reviewed by: markj Discussed with: rlibby Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D21165	2019-08-07 19:28:35 +00:00
Alexander Motin	49ee0fcea5	Use sbuf_cat() in GEOM confxml generation. When it comes to megabytes of text, difference between sbuf_printf() and sbuf_cat() becomes substantial. MFC after: 2 weeks Sponsored by: iXsystems, Inc.	2019-06-19 15:36:02 +00:00
Kirk McKusick	fb14e73cb4	Normally when an attempt is made to mount a UFS/FFS filesystem whose superblock has a check-hash error, an error message noting the superblock check-hash failure is printed and the mount fails. The administrator then runs fsck to repair the filesystem and when successful, the filesystem can once again be mounted. This approach fails if the filesystem in question is a root filesystem from which you are trying to boot. Here, the loader fails when trying to access the filesystem to get the kernel to boot. So it is necessary to allow the loader to ignore the superblock check-hash error and make a best effort to read the kernel. The filesystem may be suffiently corrupted that the read attempt fails, but there is no harm in trying since the loader makes no attempt to write to the filesystem. Once the kernel is loaded and starts to run, it attempts to mount its root filesystem. Once again, failure means that it breaks to its prompt to ask where to get its root filesystem. Unless you have an alternate root filesystem, you are stuck. Since the root filesystem is initially mounted read-only, it is safe to make an attempt to mount the root filesystem with the failed superblock check-hash. Thus, when asked to mount a root filesystem with a failed superblock check-hash, the kernel prints a warning message that the root filesystem superblock check-hash needs repair, but notes that it is ignoring the error and proceeding. It does mark the filesystem as needing an fsck which prevents it from being enabled for writing until fsck has been run on it. The net effect is that the reboot fails to single user, but at least at that point the administrator has the tools at hand to fix the problem. Reported by: Rick Macklem (rmacklem@) Discussed with: Warner Losh (imp@) Sponsored by: Netflix	2018-12-06 00:09:39 +00:00
Kyle Evans	74d6c131cb	Annotate geom modules with MODULE_VERSION GEOM ELI may double ask the password during boot. Once at loader time, and once at init time. This happens due a module loading bug. By default GEOM ELI caches the password in the kernel, but without the MODULE_VERSION annotation, the kernel loads over the kernel module, even if the GEOM ELI was compiled into the kernel. In this case, the newly loaded module purges/invalidates/overwrites the GEOM ELI's password cache, which causes the double asking. MFC Note: There's a pc98 component to the original submission that is omitted here due to pc98 removal in head. This part will need to be revived upon MFC. Reviewed by: imp Submitted by: op Obtained from: opBSD MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D14992	2018-04-10 19:18:16 +00:00
Kirk McKusick	fb15890a8c	When freeing a superblock returned by ffs_sbget, be sure to also free the superblock summary information. Reported by: Peter Holm (pho@) Tested by: Peter Holm (pho@)	2018-03-24 15:36:25 +00:00
Kirk McKusick	efbf396426	This change is some refactoring of Mark Johnston's changes in r329375 to fix the memory leak that I introduced in r328426. Instead of trying to clear up the possible memory leak in all the clients, I ensure that it gets cleaned up in the source (e.g., ffs_sbget ensures that memory is always freed if it returns an error). The original change in r328426 was a bit sparse in its description. So I am expanding on its description here (thanks cem@ and rgrimes@ for your encouragement for my longer commit messages). In preparation for adding check hashing to superblocks, r328426 is a refactoring of the code to get the reading/writing of the superblock into one place. Unlike the cylinder group reading/writing which ends up in two places (ffs_getcg/ffs_geom_strategy in the kernel and cgget/cgput in libufs), I have the core superblock functions just in the kernel (ffs_sbfetch/ffs_sbput in ffs_subr.c which is already imported into utilities like fsck_ffs as well as libufs to implement sbget/sbput). The ffs_sbfetch and ffs_sbput functions take a function pointer to do the actual I/O for which there are four variants: ffs_use_bread / ffs_use_bwrite for the in-kernel filesystem g_use_g_read_data / g_use_g_write_data for kernel geom clients ufs_use_sa_read for the standalone code (stand/libsa/ufs.c but not stand/libsa/ufsread.c which is size constrained) use_pread / use_pwrite for libufs Uses of these interfaces are in the UFS filesystem, geoms journal & label, libsa changes, and libufs. They also permeate out into the filesystem utilities fsck_ffs, newfs, growfs, clri, dump, quotacheck, fsirand, fstyp, and quot. Some of these utilities should probably be converted to directly use libufs (like dumpfs was for example), but there does not seem to be much win in doing so. Tested by: Peter Holm (pho@)	2018-03-02 04:34:53 +00:00
Mark Johnston	16759360d4	Fix a memory leak introduced in r328426. ffs_sbget() may return a superblock buffer even if it fails, so the caller must be prepared to free it in this case. Moreover, when tasting alternate superblock locations in a loop, ffs_sbget()'s readfunc callback must free the previously allocated buffer. Reported and tested by: pho Reviewed by: kib (previous version) Differential Revision: https://reviews.freebsd.org/D14390	2018-02-16 15:41:03 +00:00
Kirk McKusick	5d84ae8b49	Null out journal softc pointer earlier to avoid a segment fault that can otherwise occur. PR: 221804 Submitted by: Andreas Longwitz <longwitz at incore.de> MFC after: 1 week	2018-01-31 23:30:49 +00:00
Kirk McKusick	dffce2150e	Refactoring of reading and writing of the UFS/FFS superblock. Specifically reading is done if ffs_sbget() and writing is done in ffs_sbput(). These functions are exported to libufs via the sbget() and sbput() functions which then used in the various filesystem utilities. This work is in preparation for adding subperblock check hashes. No functional change intended. Reviewed by: kib	2018-01-26 00:58:32 +00:00
Pedro F. Giffuni	3728855a0f	sys/geom: adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 2-Clause license, however the tool I was using misidentified many licenses so this was mostly a manual - error prone - task. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts.	2017-11-27 15:17:37 +00:00
Kirk McKusick	037331ddbd	When read requests are sent from a filesystem running above g_journal, the g_journal level needs to check whether it is holding a newer copy of the block than that which exists on the disk. If so, it needs to return its copy. If not, it should pass the request down to the disk to fulfill. It currently considers six queues: 0) delayed queue, 1) unsent (current queue), 2) in-flight to the journal (flush queue), 3) active journal (active queue), 4) inactive journal (inactive queue), and 5) inflight to the disk (copy queue). Checking on two of these queues is unnecessary: 0) The delayed requests should not be used for reads because they have not yet been entered into the journal, so their value should reflect the disk contents, not the future contents that are not yet committed. 2) Because all the bio's in the flush queue are also found on the active queue, there is no need to inspect the flush queue for reads since they will be found when searching the active queue. Submitted by: Dr. Andreas Longwitz <longwitz@incore.de> Discussed with: kib MFC after: 1 week	2017-08-13 18:09:22 +00:00
Kirk McKusick	8fccf8ffd7	Eliminate a variable that is only ever set. Submitted by: Dr. Andreas Longwitz <longwitz@incore.de> Discussed with: kib MFC after: 1 week	2017-08-13 18:06:38 +00:00
Kirk McKusick	6c6118b390	gjournal is broken in handling its flush_queue. If we have 10 bio's in the flush_queue: 1 2 3 4 5 6 7 8 9 10 and another 10 bio's go into the flush queue after only the first five bio's are removed from the flush queue, the queue should look like: 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20, but because of the bug we end up with 6 11 12 13 14 15 16 17 18 19 20 7 8 9 10. So the sequence of the bio's is damaged in the flush queue (and therefore in the journal on disk !). This error can be triggered by ffs_snapshot() when a block is read with readblock() and gjournal finds this block in the broken flush queue before it goes to the correct active queue. The fix is to place all new blocks at the end of the queue. Submitted by: Dr. Andreas Longwitz <longwitz@incore.de> Discussed with: kib MFC after: 1 week	2017-08-07 19:40:03 +00:00
Kirk McKusick	683590b642	sysctl kern.geom.journal.cache.limit shows negative value for FreeBSD/amd64 system having over 4GB RAM. That's due to: 1) the limit being u_int instead of u_long like vm.kmem_size (the limit is half of vm.kmem_size by default for amd64); 2) sysctl handler g_journal_cache_limit_sysctl() using u_int instead of u_long. The fix is to replace u_int with u_long for the kern.geom.journal.cache.limit sysctl variable. PR: 198500 Submitted by: Dr. Andreas Longwitz <longwitz@incore.de> Reported by: Eugene Grosbein Discussed with: kib MFC after: 1 week	2017-08-07 19:18:27 +00:00
John Baldwin	dcbe5188da	Defer startup of gjournal switcher kproc. Don't start switcher kproc until the first GEOM is created. Reviewed by: pjd MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D8576	2017-02-07 22:45:59 +00:00
Alexander Motin	8b64f3ca6c	Use g_wither_provider() where applicable. It is just a helper function combining G_PF_WITHER setting with g_orphan_provider().	2016-09-23 21:29:40 +00:00
Konstantin Belousov	4e2732b550	Removal of Giant droping wrappers for GEOM classes. Sponsored by: The FreeBSD Foundation	2016-05-20 08:25:37 +00:00
Pedro F. Giffuni	e8d5712284	sys/geom: spelling fixes in comments. No functional change.	2016-04-29 20:56:58 +00:00
Pedro F. Giffuni	310aef3257	sys/geom: spelling fixes. These affect debugging messages. MFC after: 2 weeks	2016-04-28 19:26:46 +00:00
Warner Losh	c55f57071a	Create an API to reset a struct bio (g_reset_bio). This is mandatory for all struct bio you get back from g_{new,alloc}_bio. Temporary bios that you create on the stack or elsewhere should use this before first use of the bio, and between uses of the bio. At the moment, it is nothing more than a wrapper around bzero, but that may change in the future. The wrapper also removes one place where we encode the size of struct bio in the KBI.	2016-02-17 17:16:02 +00:00
Jung-uk Kim	fd90e2ed54	CALLOUT_MPSAFE has lost its meaning since r141428, i.e., for more than ten years for head. However, it is continuously misused as the mpsafe argument for callout_init(9). Deprecate the flag and clean up callout_init() calls to make them more consistent. Differential Revision: https://reviews.freebsd.org/D2613 Reviewed by: jhb MFC after: 2 weeks	2015-05-22 17:05:21 +00:00
Warner Losh	cba7d97b61	cswitch is unsigned, so don't compare it < 0. Any negative numbers will look huge and be caught by > 100.	2014-08-07 21:56:42 +00:00
Hans Petter Selasky	af3b2549c4	Pull in r267961 and r267973 again. Fix for issues reported will follow.	2014-06-28 03:56:17 +00:00
Glen Barber	37a107a407	Revert r267961, r267973: These changes prevent sysctl(8) from returning proper output, such as: 1) no output from sysctl(8) 2) erroneously returning ENOMEM with tools like truss(1) or uname(1) truss: can not get etype: Cannot allocate memory	2014-06-27 22:05:21 +00:00
Hans Petter Selasky	3da1cf1e88	Extend the meaning of the CTLFLAG_TUN flag to automatically check if there is an environment variable which shall initialize the SYSCTL during early boot. This works for all SYSCTL types both statically and dynamically created ones, except for the SYSCTL NODE type and SYSCTLs which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to be used in the case a tunable sysctl has a custom initialisation function allowing the sysctl to still be marked as a tunable. The kernel SYSCTL API is mostly the same, with a few exceptions for some special operations like iterating childrens of a static/extern SYSCTL node. This operation should probably be made into a factored out common macro, hence some device drivers use this. The reason for changing the SYSCTL API was the need for a SYSCTL parent OID pointer and not only the SYSCTL parent OID list pointer in order to quickly generate the sysctl path. The motivation behind this patch is to avoid parameter loading cludges inside the OFED driver subsystem. Instead of adding special code to the OFED driver subsystem to post-load tunables into dynamically created sysctls, we generalize this in the kernel. Other changes: - Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask" to "hw.pcic.intr_mask". - Removed redundant TUNABLE statements throughout the kernel. - Some minor code rewrites in connection to removing not needed TUNABLE statements. - Added a missing SYSCTL_DECL(). - Wrapped two very long lines. - Avoid malloc()/free() inside sysctl string handling, in case it is called to initialize a sysctl from a tunable, hence malloc()/free() is not ready when sysctls from the sysctl dataset are registered. - Bumped FreeBSD version to indicate SYSCTL API change. MFC after: 2 weeks Sponsored by: Mellanox Technologies	2014-06-27 16:33:43 +00:00
Justin Hibbits	6cec74b2e4	Partially revert r259080. bde@ pointed out that there are a lot more style bugs going on in here than can be fixed, and I introduced some of my own. Rather than fix the whole host of them, back out my bugs. Found by: bde X-MFC with: r259080	2013-12-08 09:34:56 +00:00
Justin Hibbits	8991c54091	Fix some integer signs. These unsigned integers should all be signed. Found by: clang (powerpc64)	2013-12-07 19:55:34 +00:00
Konstantin Belousov	a4a65e69c6	When panicing due to the gjournal overflow, print the geom metadata journal id. Requested by: Andreas Longwitz <longwitz@incore.de> MFC after: 1 week	2013-07-10 10:11:43 +00:00
Konstantin Belousov	cc3d8c35f5	There are several code sequences like vfs_busy(mp); vfs_write_suspend(mp); which are problematic if other thread starts unmount between two calls. The unmount starts a write, while vfs_write_suspend() drain writers. On the other hand, unmount drains busy references, causing the deadlock. Add a flag argument to vfs_write_suspend and require the callers of it to specify VS_SKIP_UNMOUNT flag, when the call is performed not in the mount path, i.e. the covered vnode is not locked. The suspension is not attempted if VS_SKIP_UNMOUNT is specified and unmount is in progress. Reported and tested by: Andreas Longwitz <longwitz@incore.de> Sponsored by: The FreeBSD Foundation MFC after: 3 weeks	2013-07-09 20:49:32 +00:00
Konstantin Belousov	ddd6b3fc33	Add flags argument to vfs_write_resume() and remove vfs_write_resume_flags(). Sponsored by: The FreeBSD Foundation	2013-01-11 06:08:32 +00:00
Konstantin Belousov	5050aa86cf	Remove the support for using non-mpsafe filesystem modules. In particular, do not lock Giant conditionally when calling into the filesystem module, remove the VFS_LOCK_GIANT() and related macros. Stop handling buffers belonging to non-mpsafe filesystems. The VFS_VERSION is bumped to indicate the interface change which does not result in the interface signatures changes. Conducted and reviewed by: attilio Tested by: pho	2012-10-22 17:50:54 +00:00
Konstantin Belousov	c480f781ea	Current implementations of sync(2) and syncer vnode fsync() VOP uses mnt_noasync counter to temporary remove MNTK_ASYNC mount option, which is needed to guarantee a synchronous completion of the initiated i/o before syscall or VOP return. Global removal of MNTK_ASYNC option is harmful because not only i/o started from corresponding thread becomes synchronous, but all i/o is synchronous on the filesystem which is initiated during sync(2) or syncer activity. Instead of removing MNTK_ASYNC from mnt_kern_flag, provide a local thread flag to disable async i/o for current thread only. Use the opportunity to move DOINGASYNC() macro into sys/vnode.h and consistently use it through places which tested for MNTK_ASYNC. Some testing demonstrated 60-70% improvements in run time for the metadata-intensive operations on async-mounted UFS volumes, but still with great deviation due to other reasons. Reviewed by: mckusick Tested by: scottl MFC after: 2 weeks	2012-02-06 11:04:36 +00:00
Ed Schouten	6472ac3d8a	Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs. The SYSCTL_NODE macro defines a list that stores all child-elements of that node. If there's no SYSCTL_DECL macro anywhere else, there's no reason why it shouldn't be static.	2011-11-07 15:43:11 +00:00
Andrey V. Elsukov	2fbefe4829	Removed KASSERT, g_new_providerf() can not fail. MFC after: 1 week	2011-05-04 18:06:40 +00:00

1 2

75 Commits