freebsd-skq

Author	SHA1	Message	Date
trasz	1e0811de99	Remove CDDL warning. Approved by: re (kib), core	2009-08-13 12:28:30 +00:00
pjd	c67ad86c81	We don't support ephemeral IDs in FreeBSD and without this fix ZFS can panic when in zfs_fuid_create_cred() when userid is negative. It is converted to unsigned value which makes IS_EPHEMERAL() macro to incorrectly report that this is ephemeral ID. The most reasonable solution for now is to always report that the given ID is not ephemeral. PR: kern/132337 Submitted by: Matthew West <freebsd@r.zeeb.org> Tested by: Thomas Backman <serenity@exscape.org>, Michael Reifenberger <mike@reifenberger.com> Approved by: re (kib) MFC after: 2 weeks	2009-07-27 14:52:34 +00:00
trasz	0157e2f2cf	Fix extattr_list_file(2) on ZFS in case the attribute directory doesn't exist and user doesn't have write access to the file. Without this fix, it returns bogus value instead of 0. For some reason this didn't manifest on my kernel compiled with -O0. PR: kern/136601 Submitted by: Jaakko Heinonen <jh at saunalahti dot fi> Approved by: re (kib)	2009-07-22 15:15:58 +00:00
trasz	2e0ead9bff	Fix permission handling for extended attributes in ZFS. Without this change, ZFS uses SunOS Alternate Data Streams semantics - each EA has its own permissions, which are set at EA creation time and - unlike SunOS - invisible to the user and impossible to change. From the user point of view, it's just broken: sometimes access is granted when it shouldn't be, sometimes it's denied when it shouldn't be. This patch makes it behave just like UFS, i.e. depend on current file permissions. Also, it fixes returned error codes (ENOATTR instead of ENOENT) and makes listextattr(2) return 0 instead of EPERM where there is no EA directory (i.e. the file never had any EA). Reviewed by: pjd (idea, not actual code) Approved by: re (kib)	2009-07-20 19:16:42 +00:00
avg	b898b874c6	dtrace_gethrtime: improve scaling of TSC ticks to nanoseconds Currently dtrace_gethrtime uses formula similar to the following for converting TSC ticks to nanoseconds: rdtsc() * 10^9 / tsc_freq The dividend overflows 64-bit type and wraps-around every 2^64/10^9 = 18446744073 ticks which is just a few seconds on modern machines. Now we instead use precalculated scaling factor of 10^9*2^N/tsc_freq < 2^32 and perform TSC value multiplication separately for each 32-bit half. This allows to avoid overflow of the dividend described above. The idea is taken from OpenSolaris. This has an added feature of always scaling TSC with invariant value regardless of TSC frequency changes. Thus the timestamps will not be accurate if TSC actually changes, but they are always proportional to TSC ticks and thus monotonic. This should be much better than current formula which produces wildly different non-monotonic results on when tsc_freq changes. Also drop write-only 'cp' variable from amd64 dtrace_gethrtime_init() to make it identical to the i386 twin. PR: kern/127441 Tested by: Thomas Backman <serenity@exscape.org> Reviewed by: jhb Discussed with: current@, bde, gnn Silence from: jb Approved by: re (gnn) MFC after: 1 week	2009-07-15 17:07:39 +00:00
kib	c7441b67e6	Add new msleep(9) flag PBDY that shall be specified together with PCATCH, to indicate that thread shall not be stopped upon receipt of SIGSTOP until it reaches the kernel->usermode boundary. Also change thread_single(SINGLE_NO_EXIT) to only stop threads at the user boundary unconditionally. Tested by: pho Reviewed by: jhb Approved by: re (kensmith)	2009-07-14 22:52:46 +00:00
marcel	f9e85cc362	In nvpair_native_embedded_array(), meaningless pointers are zeroed. The programmer was aware that alignment was not guaranteed in the packed structure and used bzero() to NULL out the pointers. However, on ia64, the compiler is quite agressive in finding ILP and calls to bzero() are often replaced by simple assignments (i.e. stores). Especially when the width or size in question corresponds with a store instruction (i.e. st1, st2, st4 or st8). The problem here is not a compiler bug. The address of the memory to zero-out was given by '&packed->nvl_priv' and given the type of the 'packed' pointer the compiler could assume proper alignment for the replacement of bzero() with an 8-byte wide store to be valid. The problem is with the programmer. The programmer knew that the address did not have the alignment guarantees needed for a regular assignment, but failed to inform the compiler of that fact. In fact, the programmer told the compiler the opposite: alignment is guaranteed. The fix is to avoid using a pointer of type "nvlist_t " and instead use a "char " pointer as the basis for calculating the address. This tells the compiler that only 1-byte alignment can be assumed and the compiler will either keep the bzero() call or instead replace it with a sequence of byte-wise stores. Both are valid. Approved by: re (kib)	2009-07-11 22:43:20 +00:00
avg	296f644406	dtrace/amd64: fix virtual address checks On amd64 KERNBASE/kernbase does not mean start of kernel memory. This should fix a KASSERT panic in dtrace_copycheck when copyin*() is used in D program. Also make checks for user memory a bit stricter. Reported by: Thomas Backman <serenity@exscape.org> Submitted by: wxs (kaddr part) Tested by: Thomas Backman (prototype), wxs Reviewed by: alc (concept), jhb, current@ Aprroved by: jb (concept) MFC after: 2 weeks PR: kern/134408	2009-06-24 16:03:57 +00:00
kib	117b33aa8d	O_NOFOLLOW shall be in flags, not in cmode. Noted by: bde	2009-06-22 10:08:48 +00:00
kib	171c37f865	Add another flags argument to vn_open_cred. Use it to specify that some vn_open_cred invocations shall not audit namei path. In particular, specify VN_OPEN_NOAUDIT for dotdot lookup performed by default implementation of vop_vptocnp, and for the open done for core file. vn_fullpath is called from the audit code, and vn_open there need to disable audit to avoid infinite recursion. Core file is created on return to user mode, that, in particular, happens during syscall return. The creation of the core file is audited by direct calls, and we do not want to overwrite audit information for syscall. Reported, reviewed and tested by: rwatson	2009-06-21 13:41:32 +00:00
jamie	f419891544	Rename the host-related prison fields to be the same as the host.* parameters they represent, and the variables they replaced, instead of abbreviated versions of them. Approved by: bz (mentor)	2009-06-13 15:39:12 +00:00
kmacy	8060c5388d	pjd has requested that I keep the tunable as zfs_prefetch_disable to minimize gratuitous differences with Opensolaris' ZFS Sorry for the churn	2009-06-11 22:24:08 +00:00
kmacy	f62faa0224	check against prefetch_enable	2009-06-11 09:51:21 +00:00
kmacy	4ff84b99d6	use default policy for enabling prefetching unless the TUNABLE is set	2009-06-10 21:05:37 +00:00
kmacy	50dfd13368	As far as I can tell systems that have less than 4GB are more often hurt by prefetched than helped. On i386 systems and systems with less than 4GB, prefetch is now disabled by default. I've added a prefetch enable tunable, to enable prefetching for those systems. The prefetch disable tunable will continue to unconditionally disable prefetching.	2009-06-10 01:21:32 +00:00
ps	4505aa56ed	Support shared vnode locks for write operations when the offset is provided on filesystems that support it. This really improves mysql + innodb performance on ZFS. Reviewed by: jhb, kmacy, jeffr	2009-06-04 16:18:07 +00:00
dfr	0cdc6579da	Allow the bootfs property to be set for raidz pools on FreeBSD. Reviewed by: pjd	2009-05-31 11:59:32 +00:00
kmacy	3b9ffe972e	fix xdrmem_control to be safe in an if statement fix zfs to depend on krpc remove xdr from zfs makefile Submitted by: dchagin@freebsd.org	2009-05-30 22:23:58 +00:00
kmacy	9452336efa	work around snapshot shutdown race reported by Henri Hennebert	2009-05-30 19:26:35 +00:00
jamie	572db1408a	Place hostnames and similar information fully under the prison system. The system hostname is now stored in prison0, and the global variable "hostname" has been removed, as has the hostname_mtx mutex. Jails may have their own host information, or they may inherit it from the parent/system. The proper way to read the hostname is via getcredhostname(), which will copy either the hostname associated with the passed cred, or the system hostname if you pass NULL. The system hostname can still be accessed directly (and without locking) at prison0.pr_host, but that should be avoided where possible. The "similar information" referred to is domainname, hostid, and hostuuid, which have also become prison parameters and had their associated global variables removed. Approved by: bz (mentor)	2009-05-29 21:27:12 +00:00
attilio	e05714ba70	Reverse the logic for ADAPTIVE_SX option and enable it by default. Introduce for this operation the reverse NO_ADAPTIVE_SX option. The flag SX_ADAPTIVESPIN to be passed to sx_init_flags(9) gets suppressed and the new flag, offering the reversed logic, SX_NOADAPTIVE is added. Additively implements adaptive spininning for sx held in shared mode. The spinning limit can be handled through sysctls in order to be tuned while the code doesn't reach the release, after which time they should be dropped probabilly. This change has made been necessary by recent benchmarks where it does improve concurrency of workloads in presence of high contention (ie. ZFS). KPI breakage is documented by __FreeBSD_version bumping, manpage and UPDATING updates. Requested by: jeff, kmacy Reviewed by: jeff Tested by: pho	2009-05-29 01:49:27 +00:00
kmacy	189b8f192f	MFdevbranch 192944 - add FreeBSD implementation of xdrmem_control needed by zfs - have zfs define xdr_ops using FreeBSD's definition - remove solaris xdr files from zfs compile	2009-05-28 08:18:12 +00:00
sson	c0d5996eb6	Add the OpenSolaris dtrace lockstat provider. The lockstat provider adds probes for mutexes, reader/writer and shared/exclusive locks to gather contention statistics and other locking information for dtrace scripts, the lockstat(1M) command and other potential consumers. Reviewed by: attilio jhb jb Approved by: gnn (mentor)	2009-05-26 20:28:22 +00:00
trasz	38205ec380	Change license to more bori^Wadul^Wcanonical. Submitted by: rwatson@	2009-05-26 11:42:06 +00:00
trasz	0bf624fc06	MFp4 changes neccessary for NFSv4 ACLs support in ZFS. This is mostly about removing a few #ifdefs and providing compatibility wrappers and VOP implementations to get and set an ACL; ZFS does ACL enforcement all by itself. Note that the VOPs are ifdefed out for now, so this change should be a no-op. Reviewed by: pjd	2009-05-26 08:21:59 +00:00
trasz	65e538f91c	Don't allow non-owner to set SUID bit on a file. It doesn't make any difference now, but in NFSv4 ACLs, there is write_acl permission, which also affects mode changes. Reviewed by: pjd	2009-05-24 19:21:49 +00:00
trasz	a460a65d22	Fix comment.	2009-05-24 15:48:48 +00:00
des	f354c73971	Unexpand $FreeBSD$.	2009-05-23 16:01:58 +00:00
des	159ae67ef7	Remove svn:keywords on a file that had fbsd:nokeywords (though I don't understand the reason for the latter)	2009-05-23 16:00:16 +00:00
kmacy	972fc5b174	- back out direct map hack - it is no longer needed	2009-05-19 01:14:37 +00:00
kmacy	fc0e3714cc	set createtxg prop name PR: bin/130105	2009-05-17 04:04:25 +00:00
kmacy	33504763e7	SAVESTART implies SAVENAME	2009-05-17 01:31:28 +00:00
kmacy	8cfacd71f9	enable adaptive spinning on zfs locks	2009-05-16 23:56:45 +00:00
kmacy	da0eac0afe	- allow forced unmounts - don't assume snapshot was auto-mounted	2009-05-16 20:33:13 +00:00
kmacy	0165e636bf	only use direct map if system has more than 2GB	2009-05-16 20:09:07 +00:00
kmacy	66456a72cd	apply band-aid to x86_64 systems with more physical memory than kmem by allocating from the direct map	2009-05-16 19:17:15 +00:00
dfr	0db82eb221	Add support for booting from raidz1 and raidz2 pools.	2009-05-16 10:48:20 +00:00
attilio	1dcb84131b	Remove the thread argument from the FSD (File-System Dependent) parts of the VFS. Now all the VFS_* functions and relating parts don't want the context as long as it always refers to curthread. In some points, in particular when dealing with VOPs and functions living in the same namespace (eg. vflush) which still need to be converted, pass curthread explicitly in order to retain the old behaviour. Such loose ends will be fixed ASAP. While here fix a bug: now, UFS_EXTATTR can be compiled alone without the UFS_EXTATTR_AUTOSTART option. VFS KPI is heavilly changed by this commit so thirdy parts modules needs to be recompiled. Bump __FreeBSD_version in order to signal such situation.	2009-05-11 15:33:26 +00:00
kmacy	fb9c7737d2	rename xdr support files to avoid conflicts when linking in to the kernel	2009-05-11 04:18:58 +00:00
kmacy	0b931b9b69	- rename atomic.S and crc32.c to avoid collisions when linking zfs in to the kernel - update Makefile - ifdef out acl_{alloc, free}, they aren't used by zfs and conflict with existing in-kernel routines	2009-05-09 01:45:55 +00:00
zec	639797b2e6	Introduce a new virtualization container, provisionally named vprocg, to hold virtualized instances of hostname and domainname, as well as a new top-level virtualization struct vimage, which holds pointers to struct vnet and struct vprocg. Struct vprocg is likely to become replaced in the near future with a new jail management API import. As a consequence of this change, change struct ucred to point to a struct vimage, instead of directly pointing to a vnet. Merge vnet / vimage / ucred refcounting infrastructure from p4 / vimage branch. Permit kldload / kldunload operations to be executed only from the default vimage context. This change should have no functional impact on nooptions VIMAGE kernel builds. Reviewed by: bz Approved by: julian (mentor)	2009-05-08 14:11:06 +00:00
kmacy	91894d014c	don't call vn_rele_async_fini in the !_KERNEL case	2009-05-07 23:34:41 +00:00
kmacy	386b2c2f90	move VN_RELE_ASYNC to the compatibility layer with the rest of the VN_* defines	2009-05-07 23:02:15 +00:00
kmacy	9a7f66b336	avoid LOR and gratuitous extra lock acquisitions by moving user_evict list buffers to a temporary list	2009-05-07 21:51:13 +00:00
kmacy	fea9d1bdc9	Allow the VM to provide backpressure on the ARC cache as it does on Solaris.	2009-05-07 20:57:06 +00:00
kmacy	54e76e600e	Asynchronously release vnodes to avoid blocking on range locks when calling back in to zfs. This is based on a fix that went in to opensolaris on March 9th. However, it uses a dedicated thread instead of a Solaris' taskq to avoid doing a blocking memory allocation with the vnode interlock held. This fixes a long-time deadlock in ZFS. This is not, strictly speaking, an LOR. The spa_zio thread releases a vnode, this calls in to vn_reclaim which in turn needs to acquire range locks to sync dirty data out to disk. The range locks are already held by a user-level process waiting on a condition variable that it the process is waiting on a spa_zio thread to signal it on. The process could not be signalled because the spa_zio thread could not proceed. The nature of this problem was not apparent due to ZFS locks opting out of witness which meant that DDB did not know about the locks that were held by ZFS. Reviewed by: pjd MFC after: 7 days	2009-05-07 20:28:06 +00:00
jamie	453b86f943	Introduce the extensible jail framework, using the same "name=value" interface as nmount(2). Three new system calls are added: * jail_set, to create jails and change the parameters of existing jails. This replaces jail(2). * jail_get, to read the parameters of existing jails. This replaces the security.jail.list sysctl. * jail_remove to kill off a jail's processes and remove the jail. Most jail parameters may now be changed after creation, and jails may be set to exist without any attached processes. The current jail(2) system call still exists, though it is now a stub to jail_set(2). Approved by: bz (mentor)	2009-04-29 21:14:15 +00:00
rwatson	fba90f2e03	Remove VOP_LEASE and supporting functions. This hasn't been used since the removal of NQNFS, but was left in in case it was required for NFSv4. Since our new NFSv4 client and server can't use it for their requirements, GC the old mechanism, as well as other unused lease- related code and interfaces. Due to its impact on kernel programming and binary interfaces, this change should not be MFC'd. Proposed by: jeff Reviewed by: jeff Discussed with: rmacklem, zach loafman @ isilon	2009-04-10 10:52:19 +00:00
thompsa	39714cb212	Revert r190676,190677 The geom and CAM changes for root_hold are the wrong solution for USB design quirks. Requested by: scottl	2009-04-10 04:08:34 +00:00
thompsa	fe5458f665	Add a how argument to root_mount_hold() so it can be passed NOWAIT and be called in situations where sleeping isnt allowed.	2009-04-03 19:46:12 +00:00

... 18 19 20 21 22 ...

1193 Commits