Commit Graph

48 Commits

Author SHA1 Message Date
hselasky
35b126e324 Pull in r267961 and r267973 again. Fix for issues reported will follow. 2014-06-28 03:56:17 +00:00
gjb
fc21f40567 Revert r267961, r267973:
These changes prevent sysctl(8) from returning proper output,
such as:

 1) no output from sysctl(8)
 2) erroneously returning ENOMEM with tools like truss(1)
    or uname(1)
 truss: can not get etype: Cannot allocate memory
2014-06-27 22:05:21 +00:00
hselasky
bd1ed65f0f Extend the meaning of the CTLFLAG_TUN flag to automatically check if
there is an environment variable which shall initialize the SYSCTL
during early boot. This works for all SYSCTL types both statically and
dynamically created ones, except for the SYSCTL NODE type and SYSCTLs
which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to
be used in the case a tunable sysctl has a custom initialisation
function allowing the sysctl to still be marked as a tunable. The
kernel SYSCTL API is mostly the same, with a few exceptions for some
special operations like iterating childrens of a static/extern SYSCTL
node. This operation should probably be made into a factored out
common macro, hence some device drivers use this. The reason for
changing the SYSCTL API was the need for a SYSCTL parent OID pointer
and not only the SYSCTL parent OID list pointer in order to quickly
generate the sysctl path. The motivation behind this patch is to avoid
parameter loading cludges inside the OFED driver subsystem. Instead of
adding special code to the OFED driver subsystem to post-load tunables
into dynamically created sysctls, we generalize this in the kernel.

Other changes:
- Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask"
to "hw.pcic.intr_mask".
- Removed redundant TUNABLE statements throughout the kernel.
- Some minor code rewrites in connection to removing not needed
TUNABLE statements.
- Added a missing SYSCTL_DECL().
- Wrapped two very long lines.
- Avoid malloc()/free() inside sysctl string handling, in case it is
called to initialize a sysctl from a tunable, hence malloc()/free() is
not ready when sysctls from the sysctl dataset are registered.
- Bumped FreeBSD version to indicate SYSCTL API change.

MFC after:	2 weeks
Sponsored by:	Mellanox Technologies
2014-06-27 16:33:43 +00:00
mav
37fe2a4bcb Do not increment bio_data in case of BIO_DELETE.
This fixes KASSERT() panic in g_io_request().
2014-04-10 10:12:56 +00:00
mav
4219fc0074 Merge GEOM direct dispatch changes from the projects/camlock branch.
When safety requirements are met, it allows to avoid passing I/O requests
to GEOM g_up/g_down thread, executing them directly in the caller context.
That allows to avoid CPU bottlenecks in g_up/g_down threads, plus avoid
several context switches per I/O.

The defined now safety requirements are:
 - caller should not hold any locks and should be reenterable;
 - callee should not depend on GEOM dual-threaded concurency semantics;
 - on the way down, if request is unmapped while callee doesn't support it,
   the context should be sleepable;
 - kernel thread stack usage should be below 50%.

To keep compatibility with GEOM classes not meeting above requirements
new provider and consumer flags added:
 - G_CF_DIRECT_SEND -- consumer code meets caller requirements (request);
 - G_CF_DIRECT_RECEIVE -- consumer code meets callee requirements (done);
 - G_PF_DIRECT_SEND -- provider code meets caller requirements (done);
 - G_PF_DIRECT_RECEIVE -- provider code meets callee requirements (request).
Capable GEOM class can set them, allowing direct dispatch in cases where
it is safe.  If any of requirements are not met, request is queued to
g_up or g_down thread same as before.

Such GEOM classes were reviewed and updated to support direct dispatch:
CONCAT, DEV, DISK, GATE, MD, MIRROR, MULTIPATH, NOP, PART, RAID, STRIPE,
VFS, ZERO, ZFS::VDEV, ZFS::ZVOL, all classes based on g_slice KPI (LABEL,
MAP, FLASHMAP, etc).

To declare direct completion capability disk(9) KPI got new flag equivalent
to G_PF_DIRECT_SEND -- DISKFLAG_DIRECT_COMPLETION.  da(4) and ada(4) disk
drivers got it set now thanks to earlier CAM locking work.

This change more then twice increases peak block storage performance on
systems with manu CPUs, together with earlier CAM locking changes reaching
more then 1 million IOPS (512 byte raw reads from 16 SATA SSDs on 4 HBAs to
256 user-level threads).

Sponsored by:	iXsystems, Inc.
MFC after:	2 months
2013-10-22 08:22:19 +00:00
ed
0c56cf839d Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.
The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.
2011-11-07 15:43:11 +00:00
mav
1cbd492c70 Refactor disk disconnection and geom destruction handling sequences.
Do not close/destroy opened consumer directly in case of disconnect. Instead
keep it existing until it will be closed in regular way in response to
upstream provider destruction. Delay geom destruction in the same way.
Previous implementation could destroy consumers still having active
requests and worked only because of global workaround made on GEOM level.
2011-11-01 17:04:42 +00:00
ae
972deb0b1f Include sys/sbuf.h directly.
Reviewed by:	pjd
2011-07-11 05:22:31 +00:00
ae
fc73075ad0 Remove "for a moment" assignment. struct g_geom zeroed when allocated.
MFC after:	1 week
2011-05-04 17:56:53 +00:00
mav
79493720f3 Implement relaxed comparision for hardcoded provider names to make it
ignore adX/adaY difference in both directions to simplify migration to
the CAM-based ATA or back.
2011-04-27 00:10:26 +00:00
netchild
6bf702a55b Add some FEATURE macros for various GEOM classes.
No FreeBSD version bump, the userland application to query the features will
be committed last and can serve as an indication of the availablility if
needed.

Sponsored by:	Google Summer of Code 2010
Submitted by:	kibab
Reviewed by:	silence on geom@ during 2 weeks
X-MFC after:	to be determined in last commit with code from this project
2011-02-25 10:24:35 +00:00
pjd
94d64f4ec3 Correct comment. 2010-02-18 22:28:12 +00:00
mav
5908bf1b9f Make geom_stripe report it's stripe size to upper layers. 2009-12-24 10:43:44 +00:00
pjd
f5413acb70 If provider is open for writing when we taste it, skip it for classes that
depend on on-disk metadata. This was we won't attach to providers that are used
by other classes. For example we don't want to configure partitions on da0 if
it is part of gmirror, what we really want is partitions on mirror/foo.

During regular work it works like this: if provider is open for writing a class
receives the spoiled event from GEOM and detaches, once provider is closed the
taste event is send again and class can rediscover its metadata if it is still
there.  This doesn't work that way when new class arrives, because GEOM gives
all existing providers for it to taste, also those open for writing. Classes
have to decided on their own if they want to deal with such providers (eg.
geom_dev) or not (classes modified by this commit).

Reported by:	des, Oliver Lehmann <lehmann@ans-netz.de>
Tested by:	des, Oliver Lehmann <lehmann@ans-netz.de>
Discussed with:	phk, marcel
Reviewed by:	marcel
MFC after:	3 days
2009-10-09 09:42:22 +00:00
mav
ff8e63fcdf Remove artificial MAX_IO_SIZE constant, equal to DFLTPHYS * 2. Use MAXPHYS
instead. It is NULL change for GENERIC kernel, but allows 'fast' mode to
work on systems with increased MAXPHYS.
2009-09-04 19:20:46 +00:00
des
c2c1c946ae Add sbuf_new_auto as a shortcut for the very common case of creating a
completely dynamic sbuf.

Obtained from:	Varnish
MFC after:	2 weeks
2008-08-09 11:14:05 +00:00
dwmalone
771efb08f5 Despite several examples in the kernel, the third argument of
sysctl_handle_int is not sizeof the int type you want to export.
The type must always be an int or an unsigned int.

Remove the instances where a sizeof(variable) is passed to stop
people accidently cut and pasting these examples.

In a few places this was sysctl_handle_int was being used on 64 bit
types, which would truncate the value to be exported.  In these
cases use sysctl_handle_quad to export them and change the format
to Q so that sysctl(1) can still print them.
2007-06-04 18:25:08 +00:00
pjd
cf33008f77 Change spaces to tabs where needed. 2006-11-01 22:16:53 +00:00
pjd
c33849dc41 Implement BIO_FLUSH handling by simply passing it down to the components.
Sponsored by:	home.pl
2006-10-31 21:23:51 +00:00
pjd
6f074b6d64 Remove trailing spaces. 2006-02-01 12:06:01 +00:00
rwatson
be4f357149 Normalize a significant number of kernel malloc type names:
- Prefer '_' to ' ', as it results in more easily parsed results in
  memory monitoring tools such as vmstat.

- Remove punctuation that is incompatible with using memory type names
  as file names, such as '/' characters.

- Disambiguate some collisions by adding subsystem prefixes to some
  memory types.

- Generally prefer lower case to upper case.

- If the same type is defined in multiple architecture directories,
  attempt to use the same name in additional cases.

Not all instances were caught in this change, so more work is required to
finish this conversion.  Similar changes are required for UMA zone names.
2005-10-31 15:41:29 +00:00
pjd
863deb3c00 Avoid code duplication and implement bitcount32() function in systm.h only.
Reviewed by:	cperciva
MFC after:	3 days
2005-08-19 22:10:19 +00:00
pjd
e180bbc58e Before calling g_orphan_provider(), add G_PF_WITHER flag, so GEOM will know
to destroy it.

PR:		kern/81758
Submitted by:	trasz <trasz@buziaczek.pl>
MFC after:	3 days
2005-07-17 13:15:02 +00:00
pjd
9d22b74b8f Check return value.
Found by:	Coverity Prevent analysis tool
2005-05-11 18:07:39 +00:00
pjd
668a028670 - Add md_provsize field to metadata, which will help with
shared-last-sector problem.
  After this change, even if there is more than one provider with the same
  last sector, the proper one will be chosen based on its size.
  It still doesn't fix the 'c' partition problem (when da0s1 can be confused
  with da0s1c) and situation when 'a' partition starts at offset 0
  (then da0s1a can be confused with da0s1 and da0s1c). One can use '-h'
  option there, when creating device or avoid sharing last sector.
  Actually, when providers share the same last sector and their size is equal,
  they provide exactly the same data, so the name (da0s1, da0s1a, da0s1c)
  isn't important at all.
- Provide backward compatibility.
- Update copyright's year.

MFC after:	1 week
2005-02-27 23:07:47 +00:00
pjd
1e033a07a4 Fix year in copyrights. 2005-02-16 22:26:34 +00:00
pjd
3342c80c8f - Turn off 'fast' mode by default and increase maximum memory to consume
when this mode is used.
- Manual page update.
2004-12-09 12:26:47 +00:00
pjd
d7954bf77f This is not needed anymore, it is forced in GEOM now.
Actually, it can even cause some problems, because GEOM requires sectorsize
to be more than 0 on first access, not on provider creation, so we can skip
valid providers by doing this check here.

Reported by:	Divacky Roman <xdivac02@stud.fit.vutbr.cz>
		Sven Willenberger <sven@dmv.com>
2004-09-20 17:26:25 +00:00
pjd
4689077c9e Allow to configure debug level from /boot/loader.conf. 2004-08-30 18:50:06 +00:00
pjd
7f46afc9bf Skip providers with not defined sector size.
Reported by:	kuriyama
2004-08-26 12:42:47 +00:00
pjd
a2512dc38e Dump disk number. 2004-08-25 12:14:44 +00:00
pjd
1f559bc298 Increase default kern.geom.stripe.maxmem to 50 elements. 2004-08-11 12:57:17 +00:00
pjd
4792c96714 Fix one of the lastest commit. This bio_caller1 should also be changed to
bio_driver1 (as all the rest).
This introduced a small memory leak, but it wasn't really critical,
because maximum memory for g_stripe_zone is always set, so after few
requests gstripe was working in "economic" mode.
2004-08-10 19:07:55 +00:00
pjd
a98f255700 - Introduce option for hardcoding providers' names into metadata.
It allows to fix problems when last provider's sector is shared between few
  providers.
- Bump version number for CONCAT and STRIPE and add code for backward
  compatibility.
- Do not bump version number of MIRROR, as it wasn't officially introduced yet.
  Even if someone started to play with it, there is no big deal, because
  wrong MD5 sum of metadata will deny those providers.
- Update manual pages.
- Add version history to g_(stripe|concat).h files.
2004-08-09 11:29:42 +00:00
pjd
8da7f212eb Do not use g_wither_geom(9). I doesn't work in the way which is expected
here anymore (after g_wither_washer() was introduced), i.e. geom and consumer
will not be immediately destroyed if possible.
2004-08-09 11:14:25 +00:00
phk
d8d2b01380 Tag all geom classes in the tree with a version number. 2004-08-08 07:57:53 +00:00
pjd
5ce00478e0 Add and document kern.geom.stripe.fast_failed sysctl, which shows how
many times "fast" mode failed.
2004-08-06 10:19:34 +00:00
pjd
2d5face9e3 Fields bio_caller[12] should be used by the consumer and fields
bio_driver[12] should be used by the provider!
2004-08-06 10:07:03 +00:00
pjd
636f0ba584 Fix I/O leakage. We're cloning bios in g_stripe_start_fast(), but when
something goes wrong while running in "fast" mode, we free all bios and
falling back to "economic" mode. Freeing bios, doesn't mean decrease
bio_children, so bio_inbed couldn't be equal to bio_children and request
was never finished.
Decrease bio_children manually when destroying bios.

Reported by:	Sam Lawrance <boris@brooknet.com.au>, simon
2004-08-06 09:55:40 +00:00
pjd
73a684d587 Improve geom(8)'s 'list' command to show geoms and their providers and
consumers. Teach STRIPE, CONCAT and NOP classes about this improvement.
2004-07-26 17:14:47 +00:00
pjd
3a2f13d5f5 Change naming scheme from /dev/<name>.stripe to /dev/stripe/<name>. 2004-07-26 16:10:27 +00:00
pjd
dca201cdfb M_WAITOK is ok here, while I'm using M_WAITOK later in this function. 2004-07-26 15:41:28 +00:00
pjd
6639216695 Fix copy&paste bug. 2004-07-18 16:51:58 +00:00
pjd
c684a2d6ed Minor sysctl description fixes.
Submitted by:	simon
2004-07-13 11:23:31 +00:00
pjd
227209a4da Implement "FAST" mode for GEOM_STRIPE class and turn it on by default.
In this mode you can setup even very small stripe size and you can be
sure that only one I/O request will be send to every disks in stripe.
It consumes some more memory, but if allocation fails, it will fall
back to "ECONOMIC" mode.

It is about 10 times faster for small stripe size than "ECONOMIC" mode
and other RAID0 implementations. It is even recommended to use this
mode and small stripe size, so our requests are always splitted.

One can still use "ECONOMIC" mode by setting kern.geom.stripe.fast to 0.
It is also possible to setup maximum memory which "FAST" mode can consume,
by setting kern.geom.stripe.maxmem from /boot/loader.conf.
2004-07-09 14:30:09 +00:00
pjd
7822814115 - Add 'stop' command, which works just like 'destroy' command, but sounds
less dangerous.
- Update manual pages and extend examples.
- Bump versions.
2004-07-05 21:16:37 +00:00
pjd
4a04a3b007 Dump some more informations:
- device state
	- list of used providers
	- total number of disks
	- number of disks online

Prodded by:	Alex Deiter <tiamat@komi.mts.ru>
2004-05-26 11:36:27 +00:00
pjd
86dae1fb27 Introduce STRIPE GEOM class. It implements RAID0 transformation and it
is intend to be fast. Just like CONCAT class it provides manual and
auto configuration methods.

Supported by:	Wheel - Open Technologies - http://www.wheel.pl
2004-05-20 10:20:49 +00:00