freebsd-skq

Author	SHA1	Message	Date
ae	7f1d0d3f37	Include sys/sbuf.h directly.	2011-07-11 05:19:28 +00:00
mdf	3d3b036f95	Move the ZERO_REGION_SIZE to a machine-dependent file, as on many architectures (i386, for example) the virtual memory space may be constrained enough that 2MB is a large chunk. Use 64K for arches other than amd64 and ia64, with special handling for sparc64 due to differing hardware. Also commit the comment changes to kmem_init_zero_region() that I missed due to not saving the file. (Darn the unfamiliar development environment). Arch maintainers, please feel free to adjust ZERO_REGION_SIZE as you see fit. Requested by: alc MFC after: 1 week MFC with: r221853	2011-05-13 19:35:01 +00:00
mdf	9465c34001	Usa a globally visible region of zeros for both /dev/zero and the md device. There are likely other kernel uses of "blob of zeros" than can be converted. Reviewed by: alc MFC after: 1 week	2011-05-13 18:48:00 +00:00
des	f99737eb7b	Implement BIO_DELETE for vnode devices by simply overwriting the deleted sectors with all-zeroes. The zeroes come from a static buffer; null(4) uses a dynamic buffer for the same purpose (for /dev/zero). It might be a good idea to have a static, shared, read-only all-zeroes page somewhere in the kernel that md(4), null(4) and any other code that needs zeroes could use. Reviewed by: kib MFC after: 3 weeks	2011-04-29 21:18:41 +00:00
marcel	9e4a891196	Use the preload_fetch_addr() and preload_fetch_size() convenience functions and only create the MD device when we have a non-zero pointer and size. Sponsored by: Juniper Networks	2011-02-09 19:31:10 +00:00
kib	83917be69e	Add support for BIO_DELETE on swap-backed md(4). In the case of BIO_DELETE covering the whole page, free the page. Otherwise, clear the region and mark it clean. Not marking the page dirty could reinstantiate cleared data, but it is allowed by BIO_DELETE specification and saves unneeded write to swap. Reviewed by: alc Tested by: pho MFC after: 2 weeks	2011-01-27 16:10:25 +00:00
kib	25f8e1e95f	Bio shall not be accessed after g_io_deliver(9). Reported and tested by: pho Reviewed by: ae, phk MFC after: 1 week	2011-01-25 14:00:30 +00:00
kib	dc5706ffe3	Add missed (). Noted by: alc MFC after: 3 days	2011-01-19 16:48:07 +00:00
alc	6614c76cb4	There is no point in calling vm_object_set_writeable_dirty() on an object that is definitively known to be swap backed since its only effects are on vnode-backed objects. Reviewed by: kib	2011-01-19 15:43:54 +00:00
kib	cbd7f9d931	Add reporting of GEOM::candelete BIO_GETATTR for md(4) and geom_disk(4). Non-zero value of attribute means that device supports BIO_DELETE. Suggested and reviewed by: pjd Tested by: pho MFC after: 1 week	2010-12-29 12:11:07 +00:00
kib	41c444747f	Add sysctl vm.md_malloc_wait, non-zero value of which switches malloc-backed md(4) to using M_WAITOK malloc calls. M_NOWAITOK allocations may fail when enough memory could be freed, but not immediately. E.g. SU UFS becomes quite unhappy when metadata write return error, that would happen for failed malloc() call. Reported and tested by: pho MFC after: 1 week	2010-12-29 11:39:15 +00:00
marcel	9d3ef80ee1	Allow the MDIOCATTACH ioctl operation to originate from within the kernel. To protect against malicious software, we demand that the file name is at a particular location (i.e. appended to the mdio structure) for it to be treated as in-kernel.	2010-10-18 04:26:32 +00:00
jh	2aedf3a2fe	- Remove some extra white space. - Wrap g_md_dumpconf() prototype to 80 columns.	2010-07-26 10:37:14 +00:00
jh	1a7a2445e6	Convert md(4) to use alloc_unr(9) and alloc_unr_specific(9) for unit number allocation. The old approach had some problems such as it allowed an overflow to occur in the unit number calculation. PR: kern/122288	2010-07-22 10:24:28 +00:00
kib	cc677e94f4	Calculate nshift only once. Also noted by: avg MFC after: 1 week	2010-07-06 18:22:57 +00:00
alc	c907b418fb	Eliminate unnecessary page queues locking.	2010-06-15 18:37:31 +00:00
kib	44c384aeff	Lock the page around vm_page_activate() and vm_page_deactivate() calls where it was missed. The wrapped fragments now protect wire_count with page lock. Reviewed by: alc	2010-05-03 20:31:13 +00:00
trasz	e9d23bc38a	Fix panic on invalid 'mdconfig -at preload' usage. PR: kern/80136	2010-02-27 10:41:30 +00:00
antoine	bfd388c026	(S)LIST_HEAD_INITIALIZER takes a (S)LIST_HEAD as an argument. Fix some wrong usages. Note: this does not affect generated binaries as this argument is not used. PR: 137213 Submitted by: Eygene Ryabinkin (initial version) MFC after: 1 month	2009-12-28 22:56:30 +00:00
kib	fa686c638e	Implement global and per-uid accounting of the anonymous memory. Add rlimit RLIMIT_SWAP that limits the amount of swap that may be reserved for the uid. The accounting information (charge) is associated with either map entry, or vm object backing the entry, assuming the object is the first one in the shadow chain and entry does not require COW. Charge is moved from entry to object on allocation of the object, e.g. during the mmap, assuming the object is allocated, or on the first page fault on the entry. It moves back to the entry on forks due to COW setup. The per-entry granularity of accounting makes the charge process fair for processes that change uid during lifetime, and decrements charge for proper uid when region is unmapped. The interface of vm_pager_allocate(9) is extended by adding struct ucred *, that is used to charge appropriate uid when allocation if performed by kernel, e.g. md(4). Several syscalls, among them is fork(2), may now return ENOMEM when global or per-uid limits are enforced. In collaboration with: pho Reviewed by: alc Approved by: re (kensmith)	2009-06-23 20:45:22 +00:00
marcel	8b09116a5a	Add cpu_flush_dcache() for use after non-DMA based I/O so that a possible future I-cache coherency operation can succeed. On ARM for example the L1 cache can be (is) virtually mapped, which means that any I/O that uses temporary mappings will not see the I-cache made coherent. On ia64 a similar behaviour has been observed. By flushing the D-cache, execution of binaries backed by md(4) and/or NFS work reliably. For Book-E (powerpc), execution over NFS exhibits SIGILL once in a while as well, though cpu_flush_dcache() hasn't been implemented yet. Doing an explicit D-cache flush as part of the non-DMA based I/O read operation eliminates the need to do it as part of the I-cache coherency operation itself and as such avoids pessimizing the DMA-based I/O read operations for which D-cache are already flushed/invalidated. It also allows future optimizations whereby the bcopy() followed by the D-cache flush can be integrated in a single operation, which could be implemented using on-chips DMA engines, by-passing the D-cache altogether.	2009-05-18 18:37:18 +00:00
jhb	520acdaf69	Add a new internal mount flag (MNTK_EXTENDED_SHARED) to indicate that a filesystem supports additional operations using shared vnode locks. Currently this is used to enable shared locks for open() and close() of read-only file descriptors. - When an ISOPEN namei() request is performed with LOCKSHARED, use a shared vnode lock for the leaf vnode only if the mount point has the extended shared flag set. - Set LOCKSHARED in vn_open_cred() for requests that specify O_RDONLY but not O_CREAT. - Use a shared vnode lock around VOP_CLOSE() if the file was opened with O_RDONLY and the mountpoint has the extended shared flag set. - Adjust md(4) to upgrade the vnode lock on the vnode it gets back from vn_open() since it now may only have a shared vnode lock. - Don't enable shared vnode locks on FIFO vnodes in ZFS and UFS since FIFO's require exclusive vnode locks for their open() and close() routines. (My recent MPSAFE patches for UDF and cd9660 already included this change.) - Enable extended shared operations on UFS, cd9660, and UDF. Submitted by: ups Reviewed by: pjd (ZFS bits) MFC after: 1 month	2009-03-11 14:13:47 +00:00
alc	737844bb9e	Remove unnecessary page queues locking around vm_page_wakeup(). (This change is applicable to RELENG_7 but not RELENG_6.) MFC after: 1 week	2009-02-22 02:50:31 +00:00
trasz	81e2127caa	Add the possibility to specify "-o force" with "mdconfig -du". Reviewed by: scottl Approved by: rwatson (mentor) Sponsored by: FreeBSD Foundation	2009-01-10 17:17:18 +00:00
trasz	4039bb7c82	Fix forced mdconfig -du. E.g. the following would previously result in panic: mdconfig -af blah.img -o force mount /dev/md0 /mnt mdconfig -du 0 Reviewed by: scottl Approved by: rwatson (mentor) Sponsored by: FreeBSD Foundation	2008-12-16 20:59:27 +00:00
attilio	dbf35e279f	Decontextualize the couplet VOP_GETATTR / VOP_SETATTR as the passed thread was always curthread and totally unuseful. Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>	2008-08-28 15:23:18 +00:00
ed	5de6a45e07	Remove the distinction between device minor and unit numbers. Even though we got rid of device major numbers some time ago, device drivers still need to provide unique device minor numbers to make_dev(). These numbers are only used inside the kernel. They are not related to device major and minor numbers which are visible in devfs. These are actually based on the inode number of the device. It would eventually be nice to remove minor numbers entirely, but we don't want to be too agressive here. Because the 8-15 bits of the device number field (si_drv0) are still reserved for the major number, there is no 1:1 mapping of the device minor and unit numbers. Because this is now unused, remove the restrictions on these numbers. The MAXMAJOR definition was actually used for two purposes. It was used to convert both the userspace and kernelspace device numbers to their major/minor pair, which is why it is now named UMINORMASK. minor2unit() and unit2minor() have now become useless. Both minor() and dev2unit() now serve the same purpose. We should eventually remove some of them, at least turning them into macro's. If devfs would become completely minor number unaware, we could consider using si_drv0 directly, just like si_drv1 and si_drv2. Approved by: philip (mentor)	2008-05-29 12:50:46 +00:00
philip	a5ef2c95a1	Zero sc->vnode if mdsetcred() fails. This fixes the panic which happens when mdcreate_vnode() calls vn_close() and mddestroy() calls it again further down the error handling path. Reviewed by: kris, kib MFC after: 3 days	2008-02-28 18:31:54 +00:00
attilio	71b7824213	VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in conjuction with 'thread' argument passing which is always curthread. Remove the unuseful extra-argument and pass explicitly curthread to lower layer functions, when necessary. KPI results broken by this change, which should affect several ports, so version bumping and manpage update will be further committed. Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>	2008-01-13 14:44:15 +00:00
attilio	18d0a0dd51	vn_lock() is currently only used with the 'curthread' passed as argument. Remove this argument and pass curthread directly to underlying VOP_LOCK1() VFS method. This modify makes the code cleaner and in particular remove an annoying dependence helping next lockmgr() cleanup. KPI results, obviously, changed. Manpage and FreeBSD_version will be updated through further commits. As a side note, would be valuable to say that next commits will address a similar cleanup about VFS methods, in particular vop_lock1 and vop_unlock. Tested by: Diego Sardina <siarodx at gmail dot com>, Andrea Di Pasquale <whyx dot it at gmail dot com>	2008-01-10 01:10:58 +00:00
sobomax	e70726e1e8	Put back devstat support that was lost during GEOM transition. Initially, I've tried to move md(4) to use geom_disk class, like real disks do, but this requires major rework of some of the existing features such as configuration dumping for example. Therefore just putting devstat support directly into md(4) seems to be optimal solution. Now you can see md(4) stats in `systat -vm' again. MFC after: 2 weeks	2007-11-07 22:47:41 +00:00
julian	51d643caa6	Rename the kthread_xxx (e.g. kthread_create()) calls to kproc_xxx as they actually make whole processes. Thos makes way for us to add REAL kthread_create() and friends that actually make theads. it turns out that most of these calls actually end up being moved back to the thread version when it's added. but we need to make this cosmetic change first. I'd LOVE to do this rename in 7.0 so that we can eventually MFC the new kthread_xxx() calls.	2007-10-20 23:23:23 +00:00
jeff	91d1501790	Commit 14/14 of sched_lock decomposition. - Use thread_lock() rather than sched_lock for per-thread scheduling sychronization. - Use the per-process spinlock rather than the sched_lock for per-process scheduling synchronization. Tested by: kris, current@ Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc. Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)	2007-06-05 00:00:57 +00:00
kib	f13486a222	Revert UF_OPENING workaround for CURRENT. Change the VOP_OPEN(), vn_open() vnode operation and d_fdopen() cdev operation argument from being file descriptor index into the pointer to struct file. Proposed and reviewed by: jhb Reviewed by: daichi (unionfs) Approved by: re (kensmith)	2007-05-31 11:51:53 +00:00
kib	e7cdcb3240	Resolve two deadlocks that could be caused by busy md device backed by vnode. Allow for md thread and the thread that owns lock on vnode backing the md device to do the write even when runningbufspace is exhausted. Tested by: Peter Holm Reviewed by: tegge MFC after: 2 weeks	2006-12-14 11:34:07 +00:00
pjd	da7da1722d	Style nits.	2006-11-01 18:59:06 +00:00
pjd	6058d96715	Fix md(4) panic which occurs when I/O request different than BIO_READ/BIO_WRITE is sent to vnode-backed provider (BIO_DELETE or BIO_FLUSH). Reported by: ceri Add support for BIO_FLUSH to vnode-backed md(4) devices based on VOP_FSYNC().	2006-11-01 18:56:18 +00:00
jhb	63f561c624	- Conditionally acquire Giant in mdstart_vnode(), mdcreate_vnode(), and mddestroy() only if the file is from a non-MPSAFE VFS. - No longer unconditionally hold Giant in the md kthread for vnode-backed kthreads. - Improve the handling of the thread exit race when destroying an md device.	2006-03-28 21:25:11 +00:00
wkoszek	2a5afd7475	Teach md(4) and mdconfig(8) how to understand XML. Right now there won't be a problem with listing large number of md(4) devices. Either 'list' or 'query' mode uses XML. Additionally, new functionality was introduced. It's possible to pass multiple devices to -u: # ./mdconfig -l -u md0,md1 Approved by: cognet (mentor)	2006-03-26 23:21:11 +00:00
luigi	fc2a9f9e7a	make sure that the start and end preloaded MFS markers are in contiguous strings, and that the compiler does not optimize them away because it thinks they are unused.	2006-01-31 13:35:30 +00:00
pjd	4b80a65bfe	Call NDFREE() only when vn_open() succeeded. MFC after: 3 days	2006-01-27 11:27:55 +00:00
maxim	df025adb58	o Fix typos in the comments. Submitted by: Wojciech A. Koszek	2005-12-28 15:18:18 +00:00
rwatson	be4f357149	Normalize a significant number of kernel malloc type names: - Prefer '_' to ' ', as it results in more easily parsed results in memory monitoring tools such as vmstat. - Remove punctuation that is incompatible with using memory type names as file names, such as '/' characters. - Disambiguate some collisions by adding subsystem prefixes to some memory types. - Generally prefer lower case to upper case. - If the same type is defined in multiple architecture directories, attempt to use the same name in additional cases. Not all instances were caught in this change, so more work is required to finish this conversion. Similar changes are required for UMA zone names.	2005-10-31 15:41:29 +00:00
phk	a3fbbe3447	Make sure that the worker thread knows the type early enough to grab Giant for vnode backing. Found by: pho & tegge	2005-10-06 19:47:04 +00:00
phk	5376d97a97	Fix configuration locking in MD. Remove md_mtx. Remove GIANT from the mdctl device driver and avoid DROP_GIANT, PICKUP_GIANT and geom events since we can call into GEOM directly now. Pick up Giant around vn_close(). Apply an exclusive sx around mdctls ioctl and preloading to protect lists etc.. Don't initialize our lock (md_mtx or md_sx) from a SYSINIT when there is a perfectly good pair of _fini/_init functions to do it from. Prune any final fractional sector from the mediasize to keep GEOM happy. Cleanups: Unify MDIOVERSION check in (x)mdctlioctl() Add pointer to start() routine to softc to eliminate a switch{} Inline guts of mddetach(). Always pass error pointer to mdnew(), simplify implementation.	2005-09-19 06:55:27 +00:00
phk	fd5205fdd9	Do not destroy the queue mutex until the thread is done with it.	2005-09-11 12:35:32 +00:00
pjd	06122c40eb	- Add md_mtx lock to protect ID number and list of devices. - Always check mdnew() return value, as even in !autounit case kthread_create() can fail. Those two changes fix serval panics provked by simple stress test. Tested by: Kris The BugMagnet MFC after: 3 days	2005-08-31 19:45:11 +00:00
csjp	faafaf70f1	Ensure that file flags such as schg, sappnd (and others) are honored by md(4). Before this change, it was possible to by-pass these flags by creating memory disks which used a file as a backing store and writing to the device. This was discussed by the security team, and although this is problematic, it was decided that it was not critical as we never guarantee that root will be restricted. This change implements the following behavior changes: -If the user specifies the readonly flag, unset write operations before opening the file. If the FWRITE mask is unset, the device will be created with the MD_READONLY mask set. (readonly) -Add a check in g_md_access which checks to see if the MD_READONLY mask is set, if so return EROFS -Do not gracefully downgrade access modes without telling the user. Instead make the user specify their intentions for the device (assuming the file is read only). This seems like the more correct way to handle things. This is a RELENG_6 candidate. PR: kern/84635 Reviewed by: phk	2005-08-17 01:24:55 +00:00
alc	13e88b41ba	Request a CPU private mapping from sf_buf_alloc(). If the swap-backed memory disk is larger than the number of available sf_bufs, this improves performance on SMPs by eliminating interprocessor TLB shootdowns. For example, with 6656 sf_bufs, the default on my test machine, and a 256MB swap-backed memory disk, I see the command "dd if=/dev/md0 of=/dev/null bs=64k" achieve ~489MB/sec with the default, shared mappings, and ~587MB/sec with CPU private mappings.	2005-02-13 21:51:50 +00:00
phk	237e3ac2e9	Use MAXMINOR	2005-01-29 16:50:04 +00:00

1 2 3 4 5

201 Commits