freebsd-skq

Author	SHA1	Message	Date
pjd	87b424f9b4	MFC r197219: Forced unmounts work just fine in my tests under heavy load. There might still be a problem, but it isn't worth a warning. Approved by: re (kib)	2009-09-15 12:19:34 +00:00
pjd	12d546b4f4	MFC r196456,r196457,r196458,r196662,r196702,r196703,r196919,r196927,r196928, r196943,r196944,r196947,r196950,r196953,r196954,r196965,r196978,r196979, r196980,r196982,r196985,r196992,r197131,r197133,r197150,r197151,r197152, r197153,r197167,r197172,r197177,r197200,r197201: r196456: - Give minclsyspri and maxclsyspri real values (consulted with kmacy). - Honour 'pri' argument for thread_create(). r196457: Set priority of vdev_geom threads and zvol threads to PRIBIO. r196458: - Hide ZFS kernel threads under zfskern process. - Use better (shorter) threads names: 'zvol:worker zvol/tank/vol00' -> 'zvol tank/vol00' 'vdev:worker da0' -> 'vdev da0' r196662: Add missing mountpoint vnode locking. This fixes panic on assertion with DEBUG_VFS_LOCKS and vfs.usermount=1 when regular user tries to mount dataset owned by him. r196702: Remove empty directory. r196703: Backport the 'dirtying dbuf' panic fix from newer ZFS version. Reported by: Thomas Backman <serenity@exscape.org> r196919: bzero() on-stack argument, so mutex_init() won't misinterpret that the lock is already initialized if we have some garbage on the stack. PR: kern/135480 Reported by: Emil Mikulic <emikulic@gmail.com> r196927: Changing provider size is not really supported by GEOM, but doing so when provider is closed should be ok. When administrator requests to change ZVOL size do it immediately if ZVOL is closed or do it on last ZVOL close. PR: kern/136942 Requested by: Bernard Buri <bsd@ask-us.at> r196928: Teach zdb(8) how to obtain GEOM provider size. PR: kern/133134 Reported by: Philipp Wuensche <cryx-freebsd@h3q.com> r196943: - Avoid holding mutex around M_WAITOK allocations. - Add locking for mnt_opt field. r196944: Don't recheck ownership on update mount. This will eliminate LOR between vfs_busy() and mount mutex. We check ownership in vfs_domount() anyway. Noticed by: kib Reviewed by: kib r196947: Defer thread start until we set priority. Reviewed by: kib r196950: Fix detection of file system being shared. Now zfs unshare/destroy/rename command will properly remove exported file systems. r196953: When snapshot mount point is busy (for example we are still in it) we will fail to unmount it, but it won't be removed from the tree, so in that case there is no need to reinsert it. Reported by: trasz r196954: If we have to use avl_find(), optimize a bit and use avl_insert() instead of avl_add() (the latter is actually a wrapper around avl_find() + avl_insert()). Fix similar case in the code that is currently commented out. r196965: Fix reference count leak for a case where snapshot's mount point is updated. r196978: Call ZFS_EXIT() after locking the vnode. r196979: On FreeBSD we don't have to look for snapshot's mount point, because fhtovp method is already called with proper mount point. r196980: When we automatically mount snapshot we want to return vnode of the mount point from the lookup and not covered vnode. This is one of the fixes for using .zfs/ over NFS. r196982: We don't export individual snapshots, so mnt_export field in snapshot's mount point is NULL. That's why when we try to access snapshots over NFS use mnt_export field from the parent file system. r196985: Only log successful commands! Without this fix we log even unsuccessful commands executed by unprivileged users. Action is not really taken, but it is logged to pool history, which might be confusing. Reported by: Denis Ahrens <denis@h3q.com> r196992: Implement __assert() for Solaris-specific code. Until now Solaris code was using Solaris prototype for __assert(), but FreeBSD's implementation. Both take different arguments, so we were either core-dumping in assert() or printing garbage. Reported by: avg r197131: Tighten up the check for race in zfs_zget() - ZTOV(zp) can not only contain NULL, but also can point to dead vnode, take that into account. PR: kern/132068 Reported by: Edward Fisk <7ogcg7g02@sneakemail.com>, kris Fix based on patch from: Jaakko Heinonen <jh@saunalahti.fi> r197133: - Protect reclaim with z_teardown_inactive_lock. - Be prepared for dbuf to disappear in zfs_reclaim_complete() and check if z_dbuf field is NULL - this might happen in case of rollback or forced unmount between zfs_freebsd_reclaim() and zfs_reclaim_complete(). - On forced unmount wait for all znodes to be destroyed - destruction can be done asynchronously via zfs_reclaim_complete(). r197150: There is a bug where mze_insert() can trigger an assert() of inserting the same entry twice. This bug is not fixed yet, but leads to situation where when try to access corrupted directory the kernel will panic. Until the bug is properly fixed, try to recover from it and log that it happened. Reported by: marck OpenSolaris bug: 6709336 r197151: Be sure not to overflow struct fid. r197152: Extend scope of the z_teardown_lock lock for consistency and "just in case". r197153: When zfs.ko is compiled with debug, make sure that znode and vnode point at each other. r197167: Work-around READDIRPLUS problem with .zfs/ and .zfs/snapshot/ directories by just returning EOPNOTSUPP. This will allow NFS server to fall back to regular READDIR. Note that converting inode number to snapshot's vnode is expensive operation. Snapshots are stored in AVL tree, but based on their names, not inode numbers, so to convert inode to snapshot vnode we have to interate over all snalshots. This is not a problem in OpenSolaris, because in their READDIRPLUS implementation they use VOP_LOOKUP() on d_name, instead of VFS_VGET() on d_fileno as we do. PR: kern/125149 Reported by: Weldon Godfrey <wgodfrey@ena.com> Analysis by: Jaakko Heinonen <jh@saunalahti.fi> r197172: Add missing \n. Reported by: marck r197177: Support both case: when snapshot is already mounted and when it is not yet mounted. r197200: Modify mount(8) to skip MNT_IGNORE file systems by default, just like df(1) does. This is not POLA violation, because there is no single file system in the base that use MNT_IGNORE currently, although ZFS snapshots will be mounted with MNT_IGNORE after next commit. Reviewed by: kib r197201: - Mount ZFS snapshots with MNT_IGNORE flag, so they are not visible in regular df(1) and mount(8) output. This is a bit smilar to OpenSolaris and follows ZFS route of not listing snapshots by default with 'zfs list' command. - Add UPDATING entry to note that ZFS snapshots are no longer visible in mount(8) and df(1) output by default. Reviewed by: kib Approved by: re (bz)	2009-09-15 11:13:40 +00:00
kib	d85421d7c3	MFC r196966: Lock Giant around vn_open_cred(). Remove innocent unnecessary call to NDFREE(). Approved by: re (kensmith)	2009-09-11 12:56:13 +00:00
pjd	79cd85d217	MFC r196395: Our libc doesn't implement control method for XDR (only kernel does) and it will always return failure. Fix this by bringing userland implementation of xdrmem_control() back. This allow 'zpool import' to work again. Reported by: Thomas Backman <serenity@exscape.org> Reviewed by: kmacy Approved by: re (kib)	2009-08-20 00:08:58 +00:00
pjd	d190fdd0db	MFC r196309: getcwd() (when __getcwd() fails) works by stating current directory, going up (..), calling readdir and looking for previous directory inode. In case of .zfs/ directory this doesn't work, because .zfs/ is hidden by default, so it won't be visible in readdir output. Fix this by implementing VPTOCNP for snapshot directories, so __getcwd() doesn't fail and getcwd() doesn't have to use readdir method. This fixes /bin/pwd from within .zfs/snapshot/<name>/. Suggested by: kib Approved by: re (rwatson)	2009-08-17 10:02:31 +00:00
pjd	5fbdccb6fe	MFC r196307: Manage asynchronous vnode release just like Solaris. Discussed with: kmacy Approved by: re (kib)	2009-08-17 09:55:58 +00:00
pjd	8227f77664	MFC r196303: - Reduce z_teardown_lock lock scope a bit. - The error variable is int, not bool. - Convert spaces to tabs where needed. Approved by: re (kib)	2009-08-17 09:30:31 +00:00
pjd	28e89f09a3	MFC r196301: If z_buf is NULL, we should free znode immediately. Noticed by: avg Approved by: re (kib)	2009-08-17 09:27:10 +00:00
pjd	a426d2a48b	MFC r196299: - We need to recycle vnode instead of freeing znode. Submitted by: avg - Add missing vnode interlock unlock. - Remove redundant znode locking. Approved by: re (kib)	2009-08-17 09:23:27 +00:00
pjd	636cf2c6d7	MFC r196297: Fix panic in zfs recv code. The last vnode (mountpoint's vnode) can have 0 usecount. Reported by: Thomas Backman <serenity@exscape.org> Approved by: re (kib)	2009-08-17 09:14:58 +00:00
pjd	fee81fb7eb	MFC r196295: Remove OpenSolaris taskq port (it performs very poorly in our kernel) and replace it with wrappers around our taskqueue(9). To make it possible implement taskqueue_member() function which returns 1 if the given thread was created by the given taskqueue. Approved by: re (kib)	2009-08-17 09:03:47 +00:00
pjd	c47c10fe80	MFC r196291: - Fix a race where /dev/zfs control device is created before ZFS is fully initialized. Also destroy /dev/zfs before doing other deinitializations. - Initialization through taskq is no longer needed and there is a race where one of the zpool/zfs command loads zfs.ko and tries to do some work immediately, but /dev/zfs is not there yet. Reported by: pav Approved by: re (kib)	2009-08-17 08:38:41 +00:00
pjd	b56bf151b7	MFC r196289: Remove files that are no longer used. Discussed with: kmacy Approved by: re (kib)	2009-08-17 08:09:46 +00:00
marcel	67060cf3b0	MFC revision 196269: Fix misalignment in nvpair_native_embedded() caused by the compiler replacing the bzero(). Approved by: re (kensmith)	2009-08-16 02:21:24 +00:00
trasz	6b213eca8e	InstaMFC 196179: Remove CDDL warning. Approved by: re (kib), core	2009-08-13 13:56:05 +00:00
pjd	c67ad86c81	We don't support ephemeral IDs in FreeBSD and without this fix ZFS can panic when in zfs_fuid_create_cred() when userid is negative. It is converted to unsigned value which makes IS_EPHEMERAL() macro to incorrectly report that this is ephemeral ID. The most reasonable solution for now is to always report that the given ID is not ephemeral. PR: kern/132337 Submitted by: Matthew West <freebsd@r.zeeb.org> Tested by: Thomas Backman <serenity@exscape.org>, Michael Reifenberger <mike@reifenberger.com> Approved by: re (kib) MFC after: 2 weeks	2009-07-27 14:52:34 +00:00
trasz	0157e2f2cf	Fix extattr_list_file(2) on ZFS in case the attribute directory doesn't exist and user doesn't have write access to the file. Without this fix, it returns bogus value instead of 0. For some reason this didn't manifest on my kernel compiled with -O0. PR: kern/136601 Submitted by: Jaakko Heinonen <jh at saunalahti dot fi> Approved by: re (kib)	2009-07-22 15:15:58 +00:00
trasz	2e0ead9bff	Fix permission handling for extended attributes in ZFS. Without this change, ZFS uses SunOS Alternate Data Streams semantics - each EA has its own permissions, which are set at EA creation time and - unlike SunOS - invisible to the user and impossible to change. From the user point of view, it's just broken: sometimes access is granted when it shouldn't be, sometimes it's denied when it shouldn't be. This patch makes it behave just like UFS, i.e. depend on current file permissions. Also, it fixes returned error codes (ENOATTR instead of ENOENT) and makes listextattr(2) return 0 instead of EPERM where there is no EA directory (i.e. the file never had any EA). Reviewed by: pjd (idea, not actual code) Approved by: re (kib)	2009-07-20 19:16:42 +00:00
avg	b898b874c6	dtrace_gethrtime: improve scaling of TSC ticks to nanoseconds Currently dtrace_gethrtime uses formula similar to the following for converting TSC ticks to nanoseconds: rdtsc() * 10^9 / tsc_freq The dividend overflows 64-bit type and wraps-around every 2^64/10^9 = 18446744073 ticks which is just a few seconds on modern machines. Now we instead use precalculated scaling factor of 10^9*2^N/tsc_freq < 2^32 and perform TSC value multiplication separately for each 32-bit half. This allows to avoid overflow of the dividend described above. The idea is taken from OpenSolaris. This has an added feature of always scaling TSC with invariant value regardless of TSC frequency changes. Thus the timestamps will not be accurate if TSC actually changes, but they are always proportional to TSC ticks and thus monotonic. This should be much better than current formula which produces wildly different non-monotonic results on when tsc_freq changes. Also drop write-only 'cp' variable from amd64 dtrace_gethrtime_init() to make it identical to the i386 twin. PR: kern/127441 Tested by: Thomas Backman <serenity@exscape.org> Reviewed by: jhb Discussed with: current@, bde, gnn Silence from: jb Approved by: re (gnn) MFC after: 1 week	2009-07-15 17:07:39 +00:00
kib	c7441b67e6	Add new msleep(9) flag PBDY that shall be specified together with PCATCH, to indicate that thread shall not be stopped upon receipt of SIGSTOP until it reaches the kernel->usermode boundary. Also change thread_single(SINGLE_NO_EXIT) to only stop threads at the user boundary unconditionally. Tested by: pho Reviewed by: jhb Approved by: re (kensmith)	2009-07-14 22:52:46 +00:00
marcel	f9e85cc362	In nvpair_native_embedded_array(), meaningless pointers are zeroed. The programmer was aware that alignment was not guaranteed in the packed structure and used bzero() to NULL out the pointers. However, on ia64, the compiler is quite agressive in finding ILP and calls to bzero() are often replaced by simple assignments (i.e. stores). Especially when the width or size in question corresponds with a store instruction (i.e. st1, st2, st4 or st8). The problem here is not a compiler bug. The address of the memory to zero-out was given by '&packed->nvl_priv' and given the type of the 'packed' pointer the compiler could assume proper alignment for the replacement of bzero() with an 8-byte wide store to be valid. The problem is with the programmer. The programmer knew that the address did not have the alignment guarantees needed for a regular assignment, but failed to inform the compiler of that fact. In fact, the programmer told the compiler the opposite: alignment is guaranteed. The fix is to avoid using a pointer of type "nvlist_t " and instead use a "char " pointer as the basis for calculating the address. This tells the compiler that only 1-byte alignment can be assumed and the compiler will either keep the bzero() call or instead replace it with a sequence of byte-wise stores. Both are valid. Approved by: re (kib)	2009-07-11 22:43:20 +00:00
avg	296f644406	dtrace/amd64: fix virtual address checks On amd64 KERNBASE/kernbase does not mean start of kernel memory. This should fix a KASSERT panic in dtrace_copycheck when copyin*() is used in D program. Also make checks for user memory a bit stricter. Reported by: Thomas Backman <serenity@exscape.org> Submitted by: wxs (kaddr part) Tested by: Thomas Backman (prototype), wxs Reviewed by: alc (concept), jhb, current@ Aprroved by: jb (concept) MFC after: 2 weeks PR: kern/134408	2009-06-24 16:03:57 +00:00
kib	117b33aa8d	O_NOFOLLOW shall be in flags, not in cmode. Noted by: bde	2009-06-22 10:08:48 +00:00
kib	171c37f865	Add another flags argument to vn_open_cred. Use it to specify that some vn_open_cred invocations shall not audit namei path. In particular, specify VN_OPEN_NOAUDIT for dotdot lookup performed by default implementation of vop_vptocnp, and for the open done for core file. vn_fullpath is called from the audit code, and vn_open there need to disable audit to avoid infinite recursion. Core file is created on return to user mode, that, in particular, happens during syscall return. The creation of the core file is audited by direct calls, and we do not want to overwrite audit information for syscall. Reported, reviewed and tested by: rwatson	2009-06-21 13:41:32 +00:00
jamie	f419891544	Rename the host-related prison fields to be the same as the host.* parameters they represent, and the variables they replaced, instead of abbreviated versions of them. Approved by: bz (mentor)	2009-06-13 15:39:12 +00:00
kmacy	8060c5388d	pjd has requested that I keep the tunable as zfs_prefetch_disable to minimize gratuitous differences with Opensolaris' ZFS Sorry for the churn	2009-06-11 22:24:08 +00:00
kmacy	f62faa0224	check against prefetch_enable	2009-06-11 09:51:21 +00:00
kmacy	4ff84b99d6	use default policy for enabling prefetching unless the TUNABLE is set	2009-06-10 21:05:37 +00:00
kmacy	50dfd13368	As far as I can tell systems that have less than 4GB are more often hurt by prefetched than helped. On i386 systems and systems with less than 4GB, prefetch is now disabled by default. I've added a prefetch enable tunable, to enable prefetching for those systems. The prefetch disable tunable will continue to unconditionally disable prefetching.	2009-06-10 01:21:32 +00:00
ps	4505aa56ed	Support shared vnode locks for write operations when the offset is provided on filesystems that support it. This really improves mysql + innodb performance on ZFS. Reviewed by: jhb, kmacy, jeffr	2009-06-04 16:18:07 +00:00
dfr	0cdc6579da	Allow the bootfs property to be set for raidz pools on FreeBSD. Reviewed by: pjd	2009-05-31 11:59:32 +00:00
kmacy	3b9ffe972e	fix xdrmem_control to be safe in an if statement fix zfs to depend on krpc remove xdr from zfs makefile Submitted by: dchagin@freebsd.org	2009-05-30 22:23:58 +00:00
kmacy	9452336efa	work around snapshot shutdown race reported by Henri Hennebert	2009-05-30 19:26:35 +00:00
jamie	572db1408a	Place hostnames and similar information fully under the prison system. The system hostname is now stored in prison0, and the global variable "hostname" has been removed, as has the hostname_mtx mutex. Jails may have their own host information, or they may inherit it from the parent/system. The proper way to read the hostname is via getcredhostname(), which will copy either the hostname associated with the passed cred, or the system hostname if you pass NULL. The system hostname can still be accessed directly (and without locking) at prison0.pr_host, but that should be avoided where possible. The "similar information" referred to is domainname, hostid, and hostuuid, which have also become prison parameters and had their associated global variables removed. Approved by: bz (mentor)	2009-05-29 21:27:12 +00:00
attilio	e05714ba70	Reverse the logic for ADAPTIVE_SX option and enable it by default. Introduce for this operation the reverse NO_ADAPTIVE_SX option. The flag SX_ADAPTIVESPIN to be passed to sx_init_flags(9) gets suppressed and the new flag, offering the reversed logic, SX_NOADAPTIVE is added. Additively implements adaptive spininning for sx held in shared mode. The spinning limit can be handled through sysctls in order to be tuned while the code doesn't reach the release, after which time they should be dropped probabilly. This change has made been necessary by recent benchmarks where it does improve concurrency of workloads in presence of high contention (ie. ZFS). KPI breakage is documented by __FreeBSD_version bumping, manpage and UPDATING updates. Requested by: jeff, kmacy Reviewed by: jeff Tested by: pho	2009-05-29 01:49:27 +00:00
kmacy	189b8f192f	MFdevbranch 192944 - add FreeBSD implementation of xdrmem_control needed by zfs - have zfs define xdr_ops using FreeBSD's definition - remove solaris xdr files from zfs compile	2009-05-28 08:18:12 +00:00
sson	c0d5996eb6	Add the OpenSolaris dtrace lockstat provider. The lockstat provider adds probes for mutexes, reader/writer and shared/exclusive locks to gather contention statistics and other locking information for dtrace scripts, the lockstat(1M) command and other potential consumers. Reviewed by: attilio jhb jb Approved by: gnn (mentor)	2009-05-26 20:28:22 +00:00
trasz	38205ec380	Change license to more bori^Wadul^Wcanonical. Submitted by: rwatson@	2009-05-26 11:42:06 +00:00
trasz	0bf624fc06	MFp4 changes neccessary for NFSv4 ACLs support in ZFS. This is mostly about removing a few #ifdefs and providing compatibility wrappers and VOP implementations to get and set an ACL; ZFS does ACL enforcement all by itself. Note that the VOPs are ifdefed out for now, so this change should be a no-op. Reviewed by: pjd	2009-05-26 08:21:59 +00:00
trasz	65e538f91c	Don't allow non-owner to set SUID bit on a file. It doesn't make any difference now, but in NFSv4 ACLs, there is write_acl permission, which also affects mode changes. Reviewed by: pjd	2009-05-24 19:21:49 +00:00
trasz	a460a65d22	Fix comment.	2009-05-24 15:48:48 +00:00
des	f354c73971	Unexpand $FreeBSD$.	2009-05-23 16:01:58 +00:00
des	159ae67ef7	Remove svn:keywords on a file that had fbsd:nokeywords (though I don't understand the reason for the latter)	2009-05-23 16:00:16 +00:00
kmacy	972fc5b174	- back out direct map hack - it is no longer needed	2009-05-19 01:14:37 +00:00
kmacy	fc0e3714cc	set createtxg prop name PR: bin/130105	2009-05-17 04:04:25 +00:00
kmacy	33504763e7	SAVESTART implies SAVENAME	2009-05-17 01:31:28 +00:00
kmacy	8cfacd71f9	enable adaptive spinning on zfs locks	2009-05-16 23:56:45 +00:00
kmacy	da0eac0afe	- allow forced unmounts - don't assume snapshot was auto-mounted	2009-05-16 20:33:13 +00:00
kmacy	0165e636bf	only use direct map if system has more than 2GB	2009-05-16 20:09:07 +00:00
kmacy	66456a72cd	apply band-aid to x86_64 systems with more physical memory than kmem by allocating from the direct map	2009-05-16 19:17:15 +00:00

1 2 3 4 5 ...

257 Commits