When safety requirements are met, it allows to avoid passing I/O requests
to GEOM g_up/g_down thread, executing them directly in the caller context.
That allows to avoid CPU bottlenecks in g_up/g_down threads, plus avoid
several context switches per I/O.
The defined now safety requirements are:
- caller should not hold any locks and should be reenterable;
- callee should not depend on GEOM dual-threaded concurency semantics;
- on the way down, if request is unmapped while callee doesn't support it,
the context should be sleepable;
- kernel thread stack usage should be below 50%.
To keep compatibility with GEOM classes not meeting above requirements
new provider and consumer flags added:
- G_CF_DIRECT_SEND -- consumer code meets caller requirements (request);
- G_CF_DIRECT_RECEIVE -- consumer code meets callee requirements (done);
- G_PF_DIRECT_SEND -- provider code meets caller requirements (done);
- G_PF_DIRECT_RECEIVE -- provider code meets callee requirements (request).
Capable GEOM class can set them, allowing direct dispatch in cases where
it is safe. If any of requirements are not met, request is queued to
g_up or g_down thread same as before.
Such GEOM classes were reviewed and updated to support direct dispatch:
CONCAT, DEV, DISK, GATE, MD, MIRROR, MULTIPATH, NOP, PART, RAID, STRIPE,
VFS, ZERO, ZFS::VDEV, ZFS::ZVOL, all classes based on g_slice KPI (LABEL,
MAP, FLASHMAP, etc).
To declare direct completion capability disk(9) KPI got new flag equivalent
to G_PF_DIRECT_SEND -- DISKFLAG_DIRECT_COMPLETION. da(4) and ada(4) disk
drivers got it set now thanks to earlier CAM locking work.
This change more then twice increases peak block storage performance on
systems with manu CPUs, together with earlier CAM locking changes reaching
more then 1 million IOPS (512 byte raw reads from 16 SATA SSDs on 4 HBAs to
256 user-level threads).
Sponsored by: iXsystems, Inc.
MFC after: 2 months
disable GEOM tasting to avoid the "bouncing GEOM" problem where, when
you shut down the consumer of a provider which can be viewed in multiple
ways (typically a mirror whose members are labeled partitions), GEOM
will immediately taste that provider's alter ego and reattach the
consumer.
Approved by: re (glebius)
more topology change done that may require its attention. Add few missing
g_do_wither() calls in respective places to signal it.
This fixes potential infinite loop here when some provider is withered, but
still opened or connected for some reason and so can not be destroyed. For
example, see r227009 and r227510.
When we orphan/wither a provider, an attached geom+consumer could
end up being withered as a result and it may be in front of us in
the normal object scanning order so we need to do multi-pass. On
the other hand, there may be withering stuff we can't get rid off
(yet), so we need to keep track of both the existence of withering
stuff and if there is more we can do at this time.
Retire g_sanity() and corresponding debugflag (0x8)
Retire g_{stall,release}_events().
Under #ifdef DIAGNOSTIC:
Make g_valid_obj() an official function and have it return an an
non-zero integer which indicates the kind of object when found.
Implement G_VALID_{CLASS,GEOM,CONSUMER,PROVIDER}() macros based
on g_valid_obj().
Sprinkle calls to these macros liberally over the infrastructure.
Always check that we do not free a live object.
event posting functions varargs to fill these.
Attribute g_call_me() to appropriate g_geom's where necessary.
Add a flag argument to g_call_me() methods which will be used to signal
cancellation of events in the future.
This commit should be a no-op.
parts of it.
[*] I've been asked what "OAM" means: It's an acronym used in the
telecom industry, "Operations And Maintenance", and there it covers
anything from a single unlabeled led on the frontpanel the the full
nightmare of CMIP for SS7.
lower extremities.
Setting bit 4 in debugflags (sysctl kern.geom.debugflags=16) will
allow any open to succeed on rank#1 providers. This will generally
correspond to the physical disk devices: ad0, da0, md0 etc.
This fundamentally violates the mechanics of GEOMs autoconfiguration,
and is only provided as a debugging facility, so obviously error
reports on GEOM where this bit is or has been set will not be
accepted.
Insted of embedding a struct g_stat in consumers and providers, merely
include a pointer.
Remove a couple of <sys/time.h> includes now unneeded.
Add a special allocator for struct g_stat. This allocator will allocate
entire pages and hand out g_stat functions from there. The "id" field
indicates free/used status.
Add "/dev/geom.stats" device driver whic exports the pages from the
allocator to userland with mmap(2) in read-only mode.
This mmap(2) interface should be considered a non-public interface and
the functions in libgeom (not yet committed) should be used to access
the statistics data.
Add debug.sizeof.g_stat sysctl.
Set the id field of the g_stat when we create consumers and providers.
Remove biocount from consumer, we will use the counters in the g_stat
structure instead. Replace one field which will need to be atomically
manipulated with two fields which will not (stat.nop and stat.nend).
Change add companion field to bio_children: bio_inbed for the exact
same reason.
Don't output the biocount in the confdot output.
Fix KASSERT in g_io_request().
Add sysctl kern.geom.collectstats defaulting to off.
Collect the following raw statistics conditioned on this sysctl:
for each consumer and provider {
total number of operations started.
total number of operations completed.
time last operation completed.
sum of idle-time.
for each of BIO_READ, BIO_WRITE and BIO_DELETE {
number of operations completed.
number of bytes completed.
number of ENOMEM errors.
number of other errors.
sum of transaction time.
}
}
API for getting hold of these statistics data not included yet.
WARNING: This is not a published interface, it is a stopgap measure for
WARNING: libdisk so we can get 5.0-R out of the door.
Sponsored by: DARPA & NAI Labs
to be performed in the event-thread.
To do this, we need to lock the eventlist with g_eventlock (nee g_doorlock),
since g_call_me() being called from the UP/DOWN paths will not be able to
aquire g_topology_lock.
This also means that for now these events are not referenced on any
particular consumer/provider/geom.
For UP/DOWN path use, this will not become a problem since the access()
function will make sure we drain any bio's before we dismantle.
Sponsored by: DARPA & NAI Labs.