freebsd-nq

Author	SHA1	Message	Date
Paul Saab	a6d545d8ed	Support shared vnode locks for write operations when the offset is provided on filesystems that support it. This really improves mysql + innodb performance on ZFS. Reviewed by: jhb, kmacy, jeffr	2009-06-04 16:18:07 +00:00
Doug Rabson	8be608b58c	Allow the bootfs property to be set for raidz pools on FreeBSD. Reviewed by: pjd	2009-05-31 11:59:32 +00:00
Kip Macy	762169b50a	fix xdrmem_control to be safe in an if statement fix zfs to depend on krpc remove xdr from zfs makefile Submitted by: dchagin@freebsd.org	2009-05-30 22:23:58 +00:00
Kip Macy	139ccddec0	work around snapshot shutdown race reported by Henri Hennebert	2009-05-30 19:26:35 +00:00
Jamie Gritton	76ca6f88da	Place hostnames and similar information fully under the prison system. The system hostname is now stored in prison0, and the global variable "hostname" has been removed, as has the hostname_mtx mutex. Jails may have their own host information, or they may inherit it from the parent/system. The proper way to read the hostname is via getcredhostname(), which will copy either the hostname associated with the passed cred, or the system hostname if you pass NULL. The system hostname can still be accessed directly (and without locking) at prison0.pr_host, but that should be avoided where possible. The "similar information" referred to is domainname, hostid, and hostuuid, which have also become prison parameters and had their associated global variables removed. Approved by: bz (mentor)	2009-05-29 21:27:12 +00:00
Attilio Rao	1ae1c2a3bd	Reverse the logic for ADAPTIVE_SX option and enable it by default. Introduce for this operation the reverse NO_ADAPTIVE_SX option. The flag SX_ADAPTIVESPIN to be passed to sx_init_flags(9) gets suppressed and the new flag, offering the reversed logic, SX_NOADAPTIVE is added. Additively implements adaptive spininning for sx held in shared mode. The spinning limit can be handled through sysctls in order to be tuned while the code doesn't reach the release, after which time they should be dropped probabilly. This change has made been necessary by recent benchmarks where it does improve concurrency of workloads in presence of high contention (ie. ZFS). KPI breakage is documented by __FreeBSD_version bumping, manpage and UPDATING updates. Requested by: jeff, kmacy Reviewed by: jeff Tested by: pho	2009-05-29 01:49:27 +00:00
Kip Macy	c334d2d544	MFdevbranch 192944 - add FreeBSD implementation of xdrmem_control needed by zfs - have zfs define xdr_ops using FreeBSD's definition - remove solaris xdr files from zfs compile	2009-05-28 08:18:12 +00:00
Stacey Son	a5aedd68b4	Add the OpenSolaris dtrace lockstat provider. The lockstat provider adds probes for mutexes, reader/writer and shared/exclusive locks to gather contention statistics and other locking information for dtrace scripts, the lockstat(1M) command and other potential consumers. Reviewed by: attilio jhb jb Approved by: gnn (mentor)	2009-05-26 20:28:22 +00:00
Edward Tomasz Napierala	b7014134a7	Change license to more bori^Wadul^Wcanonical. Submitted by: rwatson@	2009-05-26 11:42:06 +00:00
Edward Tomasz Napierala	0970b4bae0	MFp4 changes neccessary for NFSv4 ACLs support in ZFS. This is mostly about removing a few #ifdefs and providing compatibility wrappers and VOP implementations to get and set an ACL; ZFS does ACL enforcement all by itself. Note that the VOPs are ifdefed out for now, so this change should be a no-op. Reviewed by: pjd	2009-05-26 08:21:59 +00:00
Edward Tomasz Napierala	4076aa37dc	Don't allow non-owner to set SUID bit on a file. It doesn't make any difference now, but in NFSv4 ACLs, there is write_acl permission, which also affects mode changes. Reviewed by: pjd	2009-05-24 19:21:49 +00:00
Edward Tomasz Napierala	194f4d42de	Fix comment.	2009-05-24 15:48:48 +00:00
Dag-Erling Smørgrav	bba5cfd28b	Unexpand $FreeBSD$.	2009-05-23 16:01:58 +00:00
Dag-Erling Smørgrav	6feca53bed	Remove svn:keywords on a file that had fbsd:nokeywords (though I don't understand the reason for the latter)	2009-05-23 16:00:16 +00:00
Kip Macy	e95d34711b	- back out direct map hack - it is no longer needed	2009-05-19 01:14:37 +00:00
Kip Macy	0fe5460dbd	set createtxg prop name PR: bin/130105	2009-05-17 04:04:25 +00:00
Kip Macy	ea41c77517	SAVESTART implies SAVENAME	2009-05-17 01:31:28 +00:00
Kip Macy	2e9c90d55b	enable adaptive spinning on zfs locks	2009-05-16 23:56:45 +00:00
Kip Macy	be08aa8b59	- allow forced unmounts - don't assume snapshot was auto-mounted	2009-05-16 20:33:13 +00:00
Kip Macy	71bc1ce36e	only use direct map if system has more than 2GB	2009-05-16 20:09:07 +00:00
Kip Macy	32237d8492	apply band-aid to x86_64 systems with more physical memory than kmem by allocating from the direct map	2009-05-16 19:17:15 +00:00
Doug Rabson	e1899ef6c8	Add support for booting from raidz1 and raidz2 pools.	2009-05-16 10:48:20 +00:00
Attilio Rao	dfd233edd5	Remove the thread argument from the FSD (File-System Dependent) parts of the VFS. Now all the VFS_* functions and relating parts don't want the context as long as it always refers to curthread. In some points, in particular when dealing with VOPs and functions living in the same namespace (eg. vflush) which still need to be converted, pass curthread explicitly in order to retain the old behaviour. Such loose ends will be fixed ASAP. While here fix a bug: now, UFS_EXTATTR can be compiled alone without the UFS_EXTATTR_AUTOSTART option. VFS KPI is heavilly changed by this commit so thirdy parts modules needs to be recompiled. Bump __FreeBSD_version in order to signal such situation.	2009-05-11 15:33:26 +00:00
Kip Macy	469ef3e563	rename xdr support files to avoid conflicts when linking in to the kernel	2009-05-11 04:18:58 +00:00
Kip Macy	8569258bf8	- rename atomic.S and crc32.c to avoid collisions when linking zfs in to the kernel - update Makefile - ifdef out acl_{alloc, free}, they aren't used by zfs and conflict with existing in-kernel routines	2009-05-09 01:45:55 +00:00
Marko Zec	29b02909eb	Introduce a new virtualization container, provisionally named vprocg, to hold virtualized instances of hostname and domainname, as well as a new top-level virtualization struct vimage, which holds pointers to struct vnet and struct vprocg. Struct vprocg is likely to become replaced in the near future with a new jail management API import. As a consequence of this change, change struct ucred to point to a struct vimage, instead of directly pointing to a vnet. Merge vnet / vimage / ucred refcounting infrastructure from p4 / vimage branch. Permit kldload / kldunload operations to be executed only from the default vimage context. This change should have no functional impact on nooptions VIMAGE kernel builds. Reviewed by: bz Approved by: julian (mentor)	2009-05-08 14:11:06 +00:00
Kip Macy	a6827463ad	don't call vn_rele_async_fini in the !_KERNEL case	2009-05-07 23:34:41 +00:00
Kip Macy	c20fd07777	move VN_RELE_ASYNC to the compatibility layer with the rest of the VN_* defines	2009-05-07 23:02:15 +00:00
Kip Macy	6ef1a81d6e	avoid LOR and gratuitous extra lock acquisitions by moving user_evict list buffers to a temporary list	2009-05-07 21:51:13 +00:00
Kip Macy	77d0162c70	Allow the VM to provide backpressure on the ARC cache as it does on Solaris.	2009-05-07 20:57:06 +00:00
Kip Macy	62fa227ccd	Asynchronously release vnodes to avoid blocking on range locks when calling back in to zfs. This is based on a fix that went in to opensolaris on March 9th. However, it uses a dedicated thread instead of a Solaris' taskq to avoid doing a blocking memory allocation with the vnode interlock held. This fixes a long-time deadlock in ZFS. This is not, strictly speaking, an LOR. The spa_zio thread releases a vnode, this calls in to vn_reclaim which in turn needs to acquire range locks to sync dirty data out to disk. The range locks are already held by a user-level process waiting on a condition variable that it the process is waiting on a spa_zio thread to signal it on. The process could not be signalled because the spa_zio thread could not proceed. The nature of this problem was not apparent due to ZFS locks opting out of witness which meant that DDB did not know about the locks that were held by ZFS. Reviewed by: pjd MFC after: 7 days	2009-05-07 20:28:06 +00:00
Jamie Gritton	b38ff370e4	Introduce the extensible jail framework, using the same "name=value" interface as nmount(2). Three new system calls are added: * jail_set, to create jails and change the parameters of existing jails. This replaces jail(2). * jail_get, to read the parameters of existing jails. This replaces the security.jail.list sysctl. * jail_remove to kill off a jail's processes and remove the jail. Most jail parameters may now be changed after creation, and jails may be set to exist without any attached processes. The current jail(2) system call still exists, though it is now a stub to jail_set(2). Approved by: bz (mentor)	2009-04-29 21:14:15 +00:00
Robert Watson	885868cd8f	Remove VOP_LEASE and supporting functions. This hasn't been used since the removal of NQNFS, but was left in in case it was required for NFSv4. Since our new NFSv4 client and server can't use it for their requirements, GC the old mechanism, as well as other unused lease- related code and interfaces. Due to its impact on kernel programming and binary interfaces, this change should not be MFC'd. Proposed by: jeff Reviewed by: jeff Discussed with: rmacklem, zach loafman @ isilon	2009-04-10 10:52:19 +00:00
Andrew Thompson	853a10a581	Revert r190676,190677 The geom and CAM changes for root_hold are the wrong solution for USB design quirks. Requested by: scottl	2009-04-10 04:08:34 +00:00
Andrew Thompson	626fc9fe3d	Add a how argument to root_mount_hold() so it can be passed NOWAIT and be called in situations where sleeping isnt allowed.	2009-04-03 19:46:12 +00:00
Robert Watson	455f3aa24f	Move dtnfsclient.c in the cddl tree to nfs_kdtrace.c in the nfsclient directory, since it's under a BSD license, and this keeps NFS internals- aware tracing parts close to NFS. MFC after: 1 month Suggested by: jhb	2009-03-25 17:47:22 +00:00
Robert Watson	10263f0832	Add DTrace probes to the NFS access and attribute caches. Access cache events are: nfsclient:accesscache:flush:done nfsclient:accesscache:get:hit nfsclient:accesscache:get:miss nfsclient:accesscache:load:done They pass the vnode, uid, and requested or loaded access mode (if any); the load event may also report a load error if the RPC fails. The attribute cache events are: nfsclient:attrcache:flush:done nfsclient:attrcache:get:hit nfsclient:attrcache:get:miss nfsclient:attrcache:load:done They pass the vnode, optionally the vattr if one is present (hit or load), and in the case of a load event, also a possible RPC error. MFC after: 1 month Sponsored by: Google, Inc.	2009-03-24 17:14:34 +00:00
Robert Watson	47294818f9	Add dtnfsclient, a first cut at an NFSv2/v3 client reuest DTrace provider. The NFS client exposes 'start' and 'done' probes for NFSv2 and NFSv3 RPCs when using the new RPC implementation, passing in the vnode, mbuf chain, credential, and NFSv2 or NFSv3 procedure number. For 'done' probes, the error number is also available. Probes are named in the following way: ... nfsclient:nfs2:write:start nfsclient:nfs2:write:done ... nfsclient:nfs3:access:start nfsclient:nfs3:access:done ... Access to the unmarshalled arguments is not easily available at this point in the stack, but the passed probe arguments are sufficient to to a lot of interesting things in practice. Technically, these probes may cover multiple RPC retransmits, and even transactions if the transaction ID change as a result of authentication failure or a jukebox error from the server, but usefully capture the intent of a single NFS request, such as access, getattr, write, etc. Typical use might involve profiling RPC latency by system call, number of RPCs, how often a getattr leads to a call to access, when failed access control checks occur, etc. More detailed RPC information might best be provided by adding a krpc provider. It would also be useful to add NFS client probes for events such as the access cache or attribute cache satisfying requests without an RPC. Sponsored by: Google, Inc. MFC after: 1 month	2009-03-22 22:07:52 +00:00
John Baldwin	9fca7a854c	The zfs_get_xattrdir() function is used to find the extended attribute directory for a znode. When the directory already exists, it returns a referenced but unlocked vnode. When a directory does not yet exist, it calls zfs_make_xattrdir() to create a new one. zfs_make_xattrdir() returns the vnode both referenced and and locked and zfs_get_xattrdir() was leaking this vnode lock to its callers. Fix this by dropping the vnode lock if zfs_make_xattrdir() successfully creates a new extended attribute directory. Reviewed by: pjd	2009-03-18 16:19:44 +00:00
John Baldwin	33fc362512	Add a new internal mount flag (MNTK_EXTENDED_SHARED) to indicate that a filesystem supports additional operations using shared vnode locks. Currently this is used to enable shared locks for open() and close() of read-only file descriptors. - When an ISOPEN namei() request is performed with LOCKSHARED, use a shared vnode lock for the leaf vnode only if the mount point has the extended shared flag set. - Set LOCKSHARED in vn_open_cred() for requests that specify O_RDONLY but not O_CREAT. - Use a shared vnode lock around VOP_CLOSE() if the file was opened with O_RDONLY and the mountpoint has the extended shared flag set. - Adjust md(4) to upgrade the vnode lock on the vnode it gets back from vn_open() since it now may only have a shared vnode lock. - Don't enable shared vnode locks on FIFO vnodes in ZFS and UFS since FIFO's require exclusive vnode locks for their open() and close() routines. (My recent MPSAFE patches for UDF and cd9660 already included this change.) - Enable extended shared operations on UFS, cd9660, and UDF. Submitted by: ups Reviewed by: pjd (ZFS bits) MFC after: 1 month	2009-03-11 14:13:47 +00:00
Jamie Gritton	f86bce5ed0	Extend the "vfsopt" mount options for more general use. Make struct vfsopt and the vfs_buildopts function public, and add some new fields to struct vfsopt (pos and seen), and new functions vfs_getopt_pos and vfs_opterror. Further extend the interface to allow reading options from the kernel in addition to sending them to the kernel, with vfs_setopt and related functions. While this allows the "name=value" option interface to be used for more than just FS mounts (planned use is for jails), it retains the current "vfsopt" name and <sys/mount.h> requirement. Approved by: bz (mentor)	2009-03-02 23:26:30 +00:00
Ed Schouten	802cb57e34	Add memmove() to the kernel, making the kernel compile with Clang. When copying big structures, LLVM generates calls to memmove(), because it may not be able to figure out whether structures overlap. This caused linker errors to occur. memmove() is now implemented using bcopy(). Ideally it would be the other way around, but that can be solved in the future. On ARM we don't do add anything, because it already has memmove(). Discussed on: arch@ Reviewed by: rdivacky	2009-02-28 16:21:25 +00:00
John Baldwin	ea77ff0a15	Use shared vnode locks when invoking VOP_READDIR(). MFC after: 1 month	2009-02-13 18:18:14 +00:00
Ed Schouten	a4611ab612	Last step of splitting up minor and unit numbers: remove minor(). Inside the kernel, the minor() function was responsible for obtaining the device minor number of a character device. Because we made device numbers dynamically allocated and independent of the unit number passed to make_dev() a long time ago, it was actually a misnomer. If you really want to obtain the device number, you should use dev2udev(). We already converted all the drivers to use dev2unit() to obtain the device unit number, which is still used by a lot of drivers. I've noticed not a single driver passes NULL to dev2unit(). Even if they would, its behaviour would make little sense. This is why I've removed the NULL check. Ths commit removes minor(), minor2unit() and unit2minor() from the kernel. Because there was a naming collision with uminor(), we can rename umajor() and uminor() back to major() and minor(). This means that the makedev(3) manual page also applies to kernel space code now. I suspect umajor() and uminor() isn't used that often in external code, but to make it easier for other parties to port their code, I've increased __FreeBSD_version to 800062.	2009-01-28 17:57:16 +00:00
Warner Losh	78bc7eec0d	Put the MIPS support back in after it was removed in r185029.	2008-12-04 16:31:08 +00:00
Pawel Jakub Dawidek	35a15332f3	MFp4: Remove assertion that is no longer valid - we now use VOP_CLOSE() in more places (ie vdev_file.c).	2008-11-29 12:32:42 +00:00
Edward Tomasz Napierala	38cc5da78e	MFp4: We don't support TX_CREATE_ACL_ATTR nor TX_MKDIR_ACL_ATTR; code found in zfs_replay.c will panic if it encounters transactions of this type. Make sure we don't put these into the ZIL. Approved by: rwatson (mentor), pjd	2008-11-25 23:05:46 +00:00
Pawel Jakub Dawidek	ad35ee04f4	Fix locking (file descriptor table and Giant around VFS). Most submitted by: kib Reviewed by: kib	2008-11-25 21:14:00 +00:00
Ganbold Tsagaankhuu	79dae0aa0b	Remove unused variable. Found with: Coverity Prevent(tm) CID: 3669,3671 Approved by: jb	2008-11-25 19:25:54 +00:00
Pawel Jakub Dawidek	83080c1ece	Don't use PRIV_ROOT. Here we check if user can share ZFS file system, so PRIV_NFS_DAEMON seems best choice. Discussed with: rwatson	2008-11-23 20:14:19 +00:00

1 2 3 4 5

227 Commits