by md(4). Before this change, it was possible to by-pass these flags
by creating memory disks which used a file as a backing store and
writing to the device.
This was discussed by the security team, and although this is problematic,
it was decided that it was not critical as we never guarantee that root will
be restricted.
This change implements the following behavior changes:
-If the user specifies the readonly flag, unset write operations before
opening the file. If the FWRITE mask is unset, the device will be
created with the MD_READONLY mask set. (readonly)
-Add a check in g_md_access which checks to see if the MD_READONLY mask
is set, if so return EROFS
-Do not gracefully downgrade access modes without telling the user. Instead
make the user specify their intentions for the device (assuming the file is
read only). This seems like the more correct way to handle things.
This is a RELENG_6 candidate.
PR: kern/84635
Reviewed by: phk
memory disk is larger than the number of available sf_bufs, this improves
performance on SMPs by eliminating interprocessor TLB shootdowns. For
example, with 6656 sf_bufs, the default on my test machine, and a 256MB
swap-backed memory disk, I see the command
"dd if=/dev/md0 of=/dev/null bs=64k" achieve ~489MB/sec with the default,
shared mappings, and ~587MB/sec with CPU private mappings.
in mddestroy() to properly free already allocated memory.
This fixes a panic when we want to create too big memory backed device
with preallocate memory (-o reserve).
- Remove redundant { }.
MFC after: 1 week
show file name for 'mdconfig -l -u <x>' command.
This allows to preserve API/ABI compatibility with version 0 (that's why
I changed version number back to 0) and will allow to merge this change
to RELENG_5.
MFC after: 5 days
md(8). The former is generally not going to fail, but the latter can
fail when the underlying swap device returns an error.
There are still plenty of other places where vm_pager_get_pages() failing
will lead directly to crashes, so it's a good idea to put your swap on
RAID if you care enough to put any of your disks on RAID....
After this change it should be possible to use very big md(4) devices.
- Clean up and simplify the code a bit.
- Use humanize_number(3) to print size of md(4) devices.
- Add 't' suffix which stands for terabyte.
- Make '-S' to really work with all types of devices.
- Other minor changes.
it is only used in one function. While doing so, change its type to
vm_ooffset_t.
We are still limited for swap-backed devices to 16TB on 32-bit architectures
where PAGE_SIZE is 4096 bytes.
before returning. Device nodes are created via the "taste" mechanism,
so this is necessary in order to make sure that devfs entries are
created before mdconfig(8) returns.
This may be a MFC candidate for 5.3.
Suggested by: phk
for unknown events.
A number of modules return EINVAL in this instance, and I have left
those alone for now and instead taught MOD_QUIESCE to accept this
as "didn't do anything".
completely understand], md_takeroot() runs before md_preloaded(),
rendering both useless.
As a fix, move the body (effectively one line!) of md_takeroot()
into md_preloaded(), and get rid of the stuff that has become useless.
Bug and fix reported 10 days ago on -current, no reply.
mappings required by mdstart_swap(). On i386, if the ephemeral mapping
is already in the sf_buf mapping cache, a swap-backed md performs
similarly to a malloc-backed md. Even if the ephemeral mapping is not
cached, this implementation is still faster. On 64-bit platforms, this
change has the effect of using the direct virtual-to-physical mapping,
avoiding ephemeral mapping overheads, such as TLB shootdowns on SMPs.
On a 2.4GHz, 400MHz FSB P4 Xeon configured with 64K sf_bufs and
"mdmfs -S -o async -s 128m md /mnt"
before:
dd if=/dev/md0 of=/dev/null bs=64k
134217728 bytes transferred in 0.430923 secs (311465697 bytes/sec)
after with cold sf_buf cache:
dd if=/dev/md0 of=/dev/null bs=64k
134217728 bytes transferred in 0.367948 secs (364773576 bytes/sec)
after with warm sf_buf cache:
dd if=/dev/md0 of=/dev/null bs=64k
134217728 bytes transferred in 0.252826 secs (530870010 bytes/sec)
malloc-backed md:
dd if=/dev/md0 of=/dev/null bs=64k
134217728 bytes transferred in 0.253126 secs (530240978 bytes/sec)
On vnode backed md(4) devices over a certain, currently undetermined
size relative to the buffer cache our "lemming-syncer" can provoke
a buffer starvation which puts the md thread to sleep on wdrain.
This generally tends to grind the entire system to a stop because the
event that is supposed to wake up the thread will not happen until a fair
bit of the piled up I/O requests in the system finish, and since a lot
of those are on a md(4) vnode backed device which is currently waiting
on wdrain until a fair amount of the piled up ... you get the picture.
The cure is to issue all VOP_WRITES on the vnode backing the device
with IO_SYNC.
In addition to more closely emulating a real disk device with a
non-lying write-cache, this makes the writes exempt from rate-limited
(there to avoid starving the buffer cache) and consequently prevents
the deadlock.
Unfortunately performance takes a hit.
Add "async" option to give people who know what they are doing the
old behaviour.
swap-backed memory disks. This reduces filesystem allocation overhead
and makes swap-backed memory disks compatible with broken code (dd,
for example) which expects to see 512 byte sectors. The size of a
swap-backed memory disk must still be a multiple of the page size.
When performing page-aligned operations, this change has zero
performance impact.
Reviewed by: phk
Approved by: rwatson (mentor)
Introduce d_version field in struct cdevsw, this must always be
initialized to D_VERSION.
Flip sense of D_NOGIANT flag to D_NEEDGIANT, this involves removing
four D_NOGIANT flags and adding 145 D_NEEDGIANT flags.
the "old" SYSINIT. This makes sure things happen in the right order.
XXX: md(4) needs to be fully geom-ified and in particluar /dev/md.ctl
should be abandonded for the GEOM OaM api.
Approved by: re@
provide no methods does not make any sense, and is not used by any
driver.
It is a pretty hard to come up with even a theoretical concept of
a device driver which would always fail open and close with ENODEV.
Change the defaults to be nullopen() and nullclose() which simply
does nothing.
Remove explicit initializations to these from the drivers which
already used them.