Commit Graph

49 Commits

Author SHA1 Message Date
Kirk McKusick
037331ddbd When read requests are sent from a filesystem running above g_journal,
the g_journal level needs to check whether it is holding a newer
copy of the block than that which exists on the disk. If so, it
needs to return its copy. If not, it should pass the request down
to the disk to fulfill. It currently considers six queues:

0) delayed queue,
1) unsent (current queue),
2) in-flight to the journal (flush queue),
3) active journal (active queue),
4) inactive journal (inactive queue), and
5) inflight to the disk (copy queue).

Checking on two of these queues is unnecessary:

0) The delayed requests should not be used for reads because they
   have not yet been entered into the journal, so their value should
   reflect the disk contents, not the future contents that are not
   yet committed.

2) Because all the bio's in the flush queue are also found on the
   active queue, there is no need to inspect the flush queue for
   reads since they will be found when searching the active queue.

Submitted by: Dr. Andreas Longwitz <longwitz@incore.de>
Discussed with: kib
MFC after: 1 week
2017-08-13 18:09:22 +00:00
Kirk McKusick
8fccf8ffd7 Eliminate a variable that is only ever set.
Submitted by: Dr. Andreas Longwitz <longwitz@incore.de>
Discussed with: kib
MFC after: 1 week
2017-08-13 18:06:38 +00:00
Kirk McKusick
6c6118b390 gjournal is broken in handling its flush_queue. If we have 10 bio's
in the flush_queue:
         1 2 3 4 5 6 7 8 9 10
and another 10 bio's go into the flush queue after only the first five
bio's are removed from the flush queue, the queue should look like:
         6 7 8 9 10 11 12 13 14 15 16 17 18 19 20,
but because of the bug we end up with
         6 11 12 13 14  15 16 17 18 19 20 7 8 9 10.
So the sequence of the bio's is damaged in the flush queue (and
therefore in the journal on disk !). This error can be triggered by
ffs_snapshot() when a block is read with readblock() and gjournal finds
this block in the broken flush queue before it goes to the correct
active queue.

The fix is to place all new blocks at the end of the queue.

Submitted by: Dr. Andreas Longwitz <longwitz@incore.de>
Discussed with: kib
MFC after: 1 week
2017-08-07 19:40:03 +00:00
Kirk McKusick
683590b642 sysctl kern.geom.journal.cache.limit shows negative value for FreeBSD/amd64
system having over 4GB RAM. That's due to:

1) the limit being u_int instead of u_long like vm.kmem_size (the limit is
   half of vm.kmem_size by default for amd64);
2) sysctl handler g_journal_cache_limit_sysctl() using u_int instead of u_long.

The fix is to replace u_int with u_long for the kern.geom.journal.cache.limit
sysctl variable.

PR: 198500
Submitted by: Dr. Andreas Longwitz <longwitz@incore.de>
Reported by: Eugene Grosbein
Discussed with: kib
MFC after: 1 week
2017-08-07 19:18:27 +00:00
John Baldwin
dcbe5188da Defer startup of gjournal switcher kproc.
Don't start switcher kproc until the first GEOM is created.

Reviewed by:	pjd
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D8576
2017-02-07 22:45:59 +00:00
Alexander Motin
8b64f3ca6c Use g_wither_provider() where applicable.
It is just a helper function combining G_PF_WITHER setting with
g_orphan_provider().
2016-09-23 21:29:40 +00:00
Konstantin Belousov
4e2732b550 Removal of Giant droping wrappers for GEOM classes.
Sponsored by:	The FreeBSD Foundation
2016-05-20 08:25:37 +00:00
Pedro F. Giffuni
e8d5712284 sys/geom: spelling fixes in comments.
No functional change.
2016-04-29 20:56:58 +00:00
Pedro F. Giffuni
310aef3257 sys/geom: spelling fixes.
These affect debugging messages.

MFC after:	2 weeks
2016-04-28 19:26:46 +00:00
Warner Losh
c55f57071a Create an API to reset a struct bio (g_reset_bio). This is mandatory
for all struct bio you get back from g_{new,alloc}_bio. Temporary
bios that you create on the stack or elsewhere should use this before
first use of the bio, and between uses of the bio. At the moment, it
is nothing more than a wrapper around bzero, but that may change in
the future. The wrapper also removes one place where we encode the
size of struct bio in the KBI.
2016-02-17 17:16:02 +00:00
Jung-uk Kim
fd90e2ed54 CALLOUT_MPSAFE has lost its meaning since r141428, i.e., for more than ten
years for head.  However, it is continuously misused as the mpsafe argument
for callout_init(9).  Deprecate the flag and clean up callout_init() calls
to make them more consistent.

Differential Revision:	https://reviews.freebsd.org/D2613
Reviewed by:	jhb
MFC after:	2 weeks
2015-05-22 17:05:21 +00:00
Warner Losh
cba7d97b61 cswitch is unsigned, so don't compare it < 0. Any negative numbers
will look huge and be caught by > 100.
2014-08-07 21:56:42 +00:00
Hans Petter Selasky
af3b2549c4 Pull in r267961 and r267973 again. Fix for issues reported will follow. 2014-06-28 03:56:17 +00:00
Glen Barber
37a107a407 Revert r267961, r267973:
These changes prevent sysctl(8) from returning proper output,
such as:

 1) no output from sysctl(8)
 2) erroneously returning ENOMEM with tools like truss(1)
    or uname(1)
 truss: can not get etype: Cannot allocate memory
2014-06-27 22:05:21 +00:00
Hans Petter Selasky
3da1cf1e88 Extend the meaning of the CTLFLAG_TUN flag to automatically check if
there is an environment variable which shall initialize the SYSCTL
during early boot. This works for all SYSCTL types both statically and
dynamically created ones, except for the SYSCTL NODE type and SYSCTLs
which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to
be used in the case a tunable sysctl has a custom initialisation
function allowing the sysctl to still be marked as a tunable. The
kernel SYSCTL API is mostly the same, with a few exceptions for some
special operations like iterating childrens of a static/extern SYSCTL
node. This operation should probably be made into a factored out
common macro, hence some device drivers use this. The reason for
changing the SYSCTL API was the need for a SYSCTL parent OID pointer
and not only the SYSCTL parent OID list pointer in order to quickly
generate the sysctl path. The motivation behind this patch is to avoid
parameter loading cludges inside the OFED driver subsystem. Instead of
adding special code to the OFED driver subsystem to post-load tunables
into dynamically created sysctls, we generalize this in the kernel.

Other changes:
- Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask"
to "hw.pcic.intr_mask".
- Removed redundant TUNABLE statements throughout the kernel.
- Some minor code rewrites in connection to removing not needed
TUNABLE statements.
- Added a missing SYSCTL_DECL().
- Wrapped two very long lines.
- Avoid malloc()/free() inside sysctl string handling, in case it is
called to initialize a sysctl from a tunable, hence malloc()/free() is
not ready when sysctls from the sysctl dataset are registered.
- Bumped FreeBSD version to indicate SYSCTL API change.

MFC after:	2 weeks
Sponsored by:	Mellanox Technologies
2014-06-27 16:33:43 +00:00
Justin Hibbits
6cec74b2e4 Partially revert r259080. bde@ pointed out that there are a lot more style bugs
going on in here than can be fixed, and I introduced some of my own.  Rather
than fix the whole host of them, back out my bugs.

Found by:	bde
X-MFC with:	r259080
2013-12-08 09:34:56 +00:00
Justin Hibbits
8991c54091 Fix some integer signs. These unsigned integers should all be signed.
Found by:	clang (powerpc64)
2013-12-07 19:55:34 +00:00
Konstantin Belousov
a4a65e69c6 When panicing due to the gjournal overflow, print the geom metadata
journal id.

Requested by:	Andreas Longwitz <longwitz@incore.de>
MFC after:	1 week
2013-07-10 10:11:43 +00:00
Konstantin Belousov
cc3d8c35f5 There are several code sequences like
vfs_busy(mp);
      vfs_write_suspend(mp);
which are problematic if other thread starts unmount between two
calls.  The unmount starts a write, while vfs_write_suspend() drain
writers.  On the other hand, unmount drains busy references, causing
the deadlock.

Add a flag argument to vfs_write_suspend and require the callers of it
to specify VS_SKIP_UNMOUNT flag, when the call is performed not in the
mount path, i.e. the covered vnode is not locked.  The suspension is
not attempted if VS_SKIP_UNMOUNT is specified and unmount is in
progress.

Reported and tested by:	Andreas Longwitz <longwitz@incore.de>
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
2013-07-09 20:49:32 +00:00
Konstantin Belousov
ddd6b3fc33 Add flags argument to vfs_write_resume() and remove
vfs_write_resume_flags().

Sponsored by:	The FreeBSD Foundation
2013-01-11 06:08:32 +00:00
Konstantin Belousov
5050aa86cf Remove the support for using non-mpsafe filesystem modules.
In particular, do not lock Giant conditionally when calling into the
filesystem module, remove the VFS_LOCK_GIANT() and related
macros. Stop handling buffers belonging to non-mpsafe filesystems.

The VFS_VERSION is bumped to indicate the interface change which does
not result in the interface signatures changes.

Conducted and reviewed by:	attilio
Tested by:	pho
2012-10-22 17:50:54 +00:00
Konstantin Belousov
c480f781ea Current implementations of sync(2) and syncer vnode fsync() VOP uses
mnt_noasync counter to temporary remove MNTK_ASYNC mount option, which
is needed to guarantee a synchronous completion of the initiated i/o
before syscall or VOP return.  Global removal of MNTK_ASYNC option is
harmful because not only i/o started from corresponding thread becomes
synchronous, but all i/o is synchronous on the filesystem which is
initiated during sync(2) or syncer activity.

Instead of removing MNTK_ASYNC from mnt_kern_flag, provide a local
thread flag to disable async i/o for current thread only. Use the
opportunity to move DOINGASYNC() macro into sys/vnode.h and
consistently use it through places which tested for MNTK_ASYNC.

Some testing demonstrated 60-70% improvements in run time for the
metadata-intensive operations on async-mounted UFS volumes, but still
with great deviation due to other reasons.

Reviewed by:	mckusick
Tested by:	scottl
MFC after:	2 weeks
2012-02-06 11:04:36 +00:00
Ed Schouten
6472ac3d8a Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.
The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.
2011-11-07 15:43:11 +00:00
Andrey V. Elsukov
2fbefe4829 Removed KASSERT, g_new_providerf() can not fail.
MFC after:	1 week
2011-05-04 18:06:40 +00:00
Alexander Motin
90f2be2430 Implement relaxed comparision for hardcoded provider names to make it
ignore adX/adaY difference in both directions to simplify migration to
the CAM-based ATA or back.
2011-04-27 00:10:26 +00:00
Alexander Leidinger
cb08c2cc83 Add some FEATURE macros for various GEOM classes.
No FreeBSD version bump, the userland application to query the features will
be committed last and can serve as an indication of the availablility if
needed.

Sponsored by:	Google Summer of Code 2010
Submitted by:	kibab
Reviewed by:	silence on geom@ during 2 weeks
X-MFC after:	to be determined in last commit with code from this project
2011-02-25 10:24:35 +00:00
Konstantin Belousov
0ea2e01412 Treat async buffer writes from the gjournal switcher thread the same as
from syncer. We shall not sleep on running buffer space when suspending.

Reproduced and tested by:	pho
PR:	kern/154228
MFC after:	1 week
2011-01-26 10:34:21 +00:00
Edward Tomasz Napierala
fb231f3627 Make gjournal work with kernel compiled with "options DIAGNOSTIC".
Previously, it would panic immediately.

Reviewed by:	pjd
Approved by:	re (kib)
2009-06-30 14:34:06 +00:00
Attilio Rao
dfd233edd5 Remove the thread argument from the FSD (File-System Dependent) parts of
the VFS.  Now all the VFS_* functions and relating parts don't want the
context as long as it always refers to curthread.

In some points, in particular when dealing with VOPs and functions living
in the same namespace (eg. vflush) which still need to be converted,
pass curthread explicitly in order to retain the old behaviour.
Such loose ends will be fixed ASAP.

While here fix a bug: now, UFS_EXTATTR can be compiled alone without the
UFS_EXTATTR_AUTOSTART option.

VFS KPI is heavilly changed by this commit so thirdy parts modules needs
to be recompiled.  Bump __FreeBSD_version in order to signal such
situation.
2009-05-11 15:33:26 +00:00
Andrew Thompson
853a10a581 Revert r190676,190677
The geom and CAM changes for root_hold are the wrong solution for USB design
quirks.

Requested by:	scottl
2009-04-10 04:08:34 +00:00
Andrew Thompson
626fc9fe3d Add a how argument to root_mount_hold() so it can be passed NOWAIT and be called
in situations where sleeping isnt allowed.
2009-04-03 19:46:12 +00:00
Edward Tomasz Napierala
d27a975f72 Make it possible to use gjournal for the root filesystem. Previously,
an unclean shutdown would make it impossible to mount rootfs at boot.

PR:		kern/128529
Reviewed by:	pjd
Approved by:	rwatson (mentor)
Sponsored by:	FreeBSD Foundation
2008-12-06 11:33:10 +00:00
Attilio Rao
83b3bdbc8a Improve VFS locking:
- Implement real draining for vfs consumers by not relying on the
  mnt_lock and using instead a refcount in order to keep track of lock
  requesters.
- Due to the change above, remove the mnt_lock lockmgr because it is now
  useless.
- Due to the change above, vfs_busy() is no more linked to a lockmgr.
  Change so its KPI by removing the interlock argument and defining 2 new
  flags for it: MBF_NOWAIT which basically replaces the LK_NOWAIT of the
  old version (which was unlinked from the lockmgr alredy) and
  MBF_MNTLSTLOCK which provides the ability to drop the mountlist_mtx
  once the mnt interlock is held (ability still desired by most consumers).
- The stub used into vfs_mount_destroy(), that allows to override the
  mnt_ref if running for more than 3 seconds, make it totally useless.
  Remove it as it was thought to work into older versions.
  If a problem of "refcount held never going away" should appear, we will
  need to fix properly instead than trust on such hackish solution.
- Fix a bug where returning (with an error) from dounmount() was still
  leaving the MNTK_MWAIT flag on even if it the waiters were actually
  woken up. Just a place in vfs_mount_destroy() is left because it is
  going to recycle the structure in any case, so it doesn't matter.
- Remove the markercnt refcount as it is useless.

This patch modifies VFS ABI and breaks KPI for vfs_busy() so manpages and
__FreeBSD_version will be modified accordingly.

Discussed with:	kib
Tested by:	pho
2008-11-02 10:15:42 +00:00
Attilio Rao
59d4932531 Decontextualize vfs_busy(), vfs_unbusy() and vfs_mount_alloc() functions.
Manpages are updated accordingly.

Tested by:	Diego Sardina <siarodx at gmail dot com>
2008-08-31 14:26:08 +00:00
Ulf Lilleengen
7e7a4e1d18 - Fix spelling errors.
Approved by:    kib (mentor)
PR:             kern/124788
Submitted by:   Hywel Mallett <Hywel -at- hmallett.co.uk>
2008-06-20 19:48:18 +00:00
Julian Elischer
3745c395ec Rename the kthread_xxx (e.g. kthread_create()) calls
to kproc_xxx as they actually make whole processes.
Thos makes way for us to add REAL kthread_create() and friends
that actually make theads. it turns out that most of these
calls actually end up being moved back to the thread version
when it's added. but we need to make this cosmetic change first.

I'd LOVE to do this rename in 7.0  so that we can eventually MFC the
new kthread_xxx() calls.
2007-10-20 23:23:23 +00:00
Jeff Roberson
982d11f836 Commit 14/14 of sched_lock decomposition.
- Use thread_lock() rather than sched_lock for per-thread scheduling
   sychronization.
 - Use the per-process spinlock rather than the sched_lock for per-process
   scheduling synchronization.

Tested by:      kris, current@
Tested on:      i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
2007-06-05 00:00:57 +00:00
David Malone
041b706b2f Despite several examples in the kernel, the third argument of
sysctl_handle_int is not sizeof the int type you want to export.
The type must always be an int or an unsigned int.

Remove the instances where a sizeof(variable) is passed to stop
people accidently cut and pasting these examples.

In a few places this was sysctl_handle_int was being used on 64 bit
types, which would truncate the value to be exported.  In these
cases use sysctl_handle_quad to export them and change the format
to Q so that sysctl(1) can still print them.
2007-06-04 18:25:08 +00:00
Pawel Jakub Dawidek
68474f1930 Sysctl description is not a format string, so one % is enough. 2007-04-06 12:53:54 +00:00
Pawel Jakub Dawidek
52b509e738 Add missing \n. 2007-03-22 15:42:13 +00:00
Pawel Jakub Dawidek
9cb9930ea6 Softc may be NULL in g_journal_orphan(), so don't be surprised. 2006-12-02 09:10:29 +00:00
Pawel Jakub Dawidek
95de128d55 Fix ia64 build breakage. 2006-11-02 16:24:18 +00:00
Pawel Jakub Dawidek
41517ab2e9 - Use g_duplicate_bio() instead of g_clone_bio(), so there memory is
allocated with M_WAITOK flag.
- Check 'buf' instead of 'error' so Prevent is not confused.

CID:		1562, 1563
Found by:	Coverity Prevent analysis tool
2006-11-02 09:14:18 +00:00
Pawel Jakub Dawidek
3398f41fc0 Grr, fix one more build breakage. 2006-11-02 00:37:39 +00:00
Pawel Jakub Dawidek
f187490a2d Change spaces to tabs where needed. 2006-11-01 22:16:53 +00:00
Pawel Jakub Dawidek
8e570b35f7 Forgot to remove this line.
Reported by:	maxim
2006-11-01 14:09:59 +00:00
Pawel Jakub Dawidek
3911a26e74 Update the code to the current sync(2) version:
- Do not modify mnt_flag without mount interlock held.
- Do not touch MNT_ASYNC flag, as this can lead to a race with nmount(2).

Pointed out by:	tegge
Reviewed by:	tegge
2006-11-01 09:37:11 +00:00
Pawel Jakub Dawidek
ff4ff59d64 Remove debugging code I accidentally committed. 2006-11-01 01:19:13 +00:00
Pawel Jakub Dawidek
a23d879f34 Add gjournal GEOM class (kernel side), which implements block level
journaling and can be tought about marking file system as clean before
doing journal switch, which easly allows to add journaling to file
systems that don't have this feature.

Sponsored by:	home.pl
2006-10-31 21:31:00 +00:00