Commit Graph

279 Commits

Author SHA1 Message Date
Xin LI
a498e71df3 MFC r202964:
On FreeBSD, time_t is 64-bit for all platforms except i386 and powerpc,
where the type is 32-bit.  ZFS can handle 64-bit timestamp internally
but zfs_setattr() would check if the time value can fit, we change the
checking macros to match 64-bit timestamp if the platform supports it.

This change has some downsides like, while you can import zfs on 32-bit
platforms, the timestamp would overflow if they are out of the range.

This fixes the Y2.038K issue on platforms using 64-bit timestamps.

Reviewed by:	pjd
2010-02-25 00:46:51 +00:00
Pawel Jakub Dawidek
1b4509372f MFC r201684.
Teach the (gpt)zfsboot and zfsloader raidz code to use its buffers
more efficiently.

Before this patch, in the worst case memory use would increase
exponentially on the number of drives in the raidz vdev.

Submitted by:	Matt Reimer <mattjreimer@gmail.com>
Sponsored by:	VPOP Technologies, Inc.
Silence from:	dfr
2010-02-23 16:46:34 +00:00
Xin LI
18148841f8 MFC r203533:
Remove two files that are not needed by FreeBSD.

Approved by:  pjd
2010-02-22 22:27:26 +00:00
Xin LI
72c12306f9 Reduce diff against OpenSolaris - move Giant acquire/release to
zfs_znode.c.  As a side effect this also eliminates two potential
Giant leaks.

Approved by:  pjd
2010-02-01 09:29:32 +00:00
Edward Tomasz Napierala
7782c5efe8 MFC r196949:
Enable NFSv4 ACL support in ZFS.

MFC r197435:

In VOP_SETACL(9) and VOP_GETACL(9), specifying wrong ACL type should result
in EINVAL, not EOPNOTSUPP.
2010-01-31 02:11:14 +00:00
Xin LI
ac07939f0e MFC r201689:
Instead of assuming all vdevs are healthy, check the newest vdev label
for each vdev's status.  Booting from a degraded vdev should now be
more robust.

Submitted by:	Matt Reimer <mattjreimer at gmail.com>
Sponsored by:	VPOP Technologies, Inc.
2010-01-20 01:13:52 +00:00
Xin LI
d958a7e440 MFC r202129:
Report ZFS filesystem version instead of the zpool version when we say it.

Reported by:	Yuri Pankov (on -fs@)
Submitted by:	delphij
2010-01-18 05:03:40 +00:00
Xin LI
0801667fe3 MFC r201143:
Apply OpenSolaris revision 8021:b8fe9660eb2d which brings our zpool
to version 14, making it possible for zpools created on OpenSolaris
2009.06 be used on FreeBSD.

PR:		kern/141800
Submitted by:	mm
Reviewed by:	pjd, trasz
Obtained from:	OpenSolaris onnv-gate
2010-01-11 02:31:00 +00:00
Xin LI
f0c3103b9a MFC r201756:
Re-apply onnv-gate revisions 7994 and 8986 (corresponds to FreeBSD
revision 200726 and 200727).

Reviewed by:  mm@
2010-01-10 07:08:11 +00:00
Alexander Leidinger
84e096effd MFC r197816:
---snip---
    Prevent paging pressure from draining arc too much
    - always drain arc if above arc_c_max - never drain arc if arc is below
      arc_c_max
---snip---
2010-01-08 09:59:13 +00:00
Xin LI
faeec9b29a MFC r200727:
Apply fix for Solaris bug 6764159: restore_object() makes a call
that can block while having a tx open but not yet committed
(onnv revision 7994)

Submitted by:	mm
Approved by:	pjd
Obtained from:	OpenSolaris
2010-01-03 03:10:28 +00:00
Xin LI
617528f9ab MFC r200726:
Apply fix for Solaris bug 6801979: zfs recv can fail with E2BIG
(onnv revision 8986)

PR:		kern/141355
Requested by:	mm
Submitted by:	pjd
Obtained from:	OpenSolaris
2010-01-03 03:05:30 +00:00
Xin LI
66be3718aa MFC r200724:
Apply fix for Solaris bug  6462803 zfs snapshot -r failed because
filesystem was busy.

PR:		kern/141387
Submitted by:	mm
Approved by:	pjd
Obtained from:	OpenSolaris (onnv 8989:cfce31f4eebf)
2010-01-03 02:58:05 +00:00
Konstantin Belousov
9562e7e533 MFC r200162:
Change VOP_FSYNC for zfs vnode from VOP_PANIC to zfs_freebsd_fsync().
2009-12-12 14:44:04 +00:00
Pawel Jakub Dawidek
f142f57c04 MFC r200124,r200126,
r200124:

Avoid using additional variable for storing an error if we are not going
to do anything with it.

r200126:

Fix deadlock when ZVOLs are present and we are replacing dead component or
calling scrub when pool is in a degraded state. It will try to taste ZVOLs,
which will lead to deadlock, as ZVOL will try to acquire the same locks as
replace/scrub is holding already.

We can't simply skip provider based on their GEOM class, because ZVOL can have
providers build on top of it and we need to skip those as well.

We do it by asking for ZFS::iszvol attribute. Any ZVOL-based provider will give
us positive answer and we have to skip those providers.

This way we remove possibility to create ZFS pools on top of ZVOLs, but it is
not very useful anyway.

I believe deadlock is still possible in some very complex situations like when
we have MD provider on top of UFS file on top of ZVOL. When we try to replace
dead component in the pool mentioned ZVOL is based on, there might be a
deadlock when ZFS will try to taste MD provider. There is no easy way to detect
that, but it isn't very common.

r200125,r200158:

Fix order of looking for providers.

Before r200125 the order of looking for providers was wrong. It was:
1. Find provider by name.
2. Find provider by guid.
3. Find provider by name and guid.

Where it should have been:
1. Find provider by name and guid.
2. Find provider by guid.
3. Find provider by name.
2009-12-10 18:38:40 +00:00
Robert Noland
262b2ce076 MFC 198420
Correct some issues with zfs boot.

   - Teach it to read gang blocks. (essentially untested)
     If you see "ZFS: gang block detected!", please let
     me know, so we can either remove the printf if it
     works, or fix it if it doesn't.

   - If multiple partitions exist on a disk, probe them all.
     We also need to reset dsk->start to 0 to read the right
     sector here.

   - With GPT, we can have 128 partitions.

   - If the bootfs property has ever been set on a pool
     it seems that it never goes away.  zpool won't allow
     you to add to the pool with the bootfs property set.
     However, if you clear the property back to default
     we end up getting 0 for the object number and read
     a bogus block pointer and fail to boot.

   - Fix some error printfs. The printf in the loader is
     only capable of c,s and u formats.

   - Teach printf how to display %llu
2009-11-21 15:02:35 +00:00
Robert Noland
1de2992578 MFC r199241
This patch addresses an overflow in the the zfs boot code and allows
users to boot from zfs raidz volumes.  This has been tested by a number
of users and does not impact those which are not booting from zfs raidz
volumes.

Submitted by:	Matt Reimer <mattjreimer@gmail.com>
2009-11-14 16:14:51 +00:00
Pawel Jakub Dawidek
5f99f88975 MFC r198703,r199156,r199157:
r198703:

- zfs_zaccess() can handle VAPPEND too, so map V_APPEND to VAPPEND and call
  zfs_access() instead of vaccess() in this case as well.
- If VADMIN is specified with another V* flag (unlikely) call both
  zfs_access() and vaccess() after spliting V* flags.

This fixes "dirtying snapshot!" panic.

PR:	kern/139806
Reported by:	Carl Chave <carl@chave.us>
In co-operation with:	jh

r199156:

Avoid passing invalid mountpoint to getnewvnode().

Reported by:	rwatson
Tested by:	rwatson

r199157:

Be careful which vattr fields are set during setattr replay.
Without this fix strange things can appear after unclean shutdown like
files with mode set to 07777.

Reported by:	des
2009-11-14 11:59:59 +00:00
Edward Tomasz Napierala
1c7deea437 MFC r196863:
Improve wording.

MFC r196941:

Prevent the line from wrapping.

Approved by:	re (kib)
2009-10-22 16:26:38 +00:00
Pawel Jakub Dawidek
d1c95b4a34 MFC r197831,r197842,r197843,r197860,r197861:
r197831:

Fix situation where Mac OS X NFS client creates a file and when it tries
to set ownership and mode in the same setattr operation, the mode was
overwritten by secpolicy_vnode_setattr().

PR:	kern/118320
Submitted by:	Mark Thompson <info-gentoo@mark.thompson.bz>

r197842:

Fix white-spaces.

r197843:

On FreeBSD it is enough to report provider removal when orphan event is
received, we don't have to do it on every ENXIO error in I/O path.
Solaris has no GEOM so they have to handle it in a less clean way.

r197860:

File system owner is when uid matches and jail matches.

r197861:

Allow file system owner to modify system flags if securelevel permits.

Approved by:	re (kib)
2009-10-12 20:36:55 +00:00
Xin LI
9516a85cc3 MFC revision 197683:
Return EOPNOTSUPP instead of EINVAL when doing chflags(2) over an old
format ZFS, as defined in the manual page.

Submitted by:	pjd (response of my original patch but bugs are mine)
Approved by:	re (kib)
2009-10-04 09:07:29 +00:00
Pawel Jakub Dawidek
01985d6884 MFC r197287, r197289, r197351, r197426, r197458, r197459, r197497, r197498,
r197512, r197513, r197514, r197515, r197525:

r197287:

Purge namecache for the file system being rolled back, so it doesn't point at
invalid vnodes after the rollback resulting in EIO errors when trying to access
files which are in the namecache.

Reported by:	des

r197289:

Purge file system namecache when receiving incremental stream and rolling back
to it.

r197351:

Purge namecache in the same place OpenSolaris does.

r197426:

Restore BSD behaviour - when creating new directory entry use parent directory
gid to set group ownership and not process gid.

This was overlooked during v6 -> v13 switch.

PR:	kern/139076
Reported by:	Sean Winn <sean@gothic.net.au>

r197458:

Close race in zfs_zget(). We have to increase usecount first and then
check for VI_DOOMED flag. Before this change vnode could be reclaimed
between checking for the flag and increasing usecount.

r197459:

Before calling vflush(FORCECLOSE) mark file system as unmounted so the
following vnops will fail. This is very important, because without this change
vnode could be reclaimed at any point, even if we increased usecount. The only
way to ensure that vnode won't be reclaimed was to lock it, which would be very
hard to do in ZFS without changing a lot of code. With this change simply
increasing usecount is enough to be sure vnode won't be reclaimed from under
us. To be precise it can still be reclaimed but we won't be able to see it,
because every try to enter ZFS through VFS will result in EIO.

The only function that cannot return EIO, because it is needed for vflush() is
zfs_root(). Introduce ZFS_ENTER_NOERROR() macro that only locks
z_teardown_lock and never returns EIO.

r197497:

Switch to fletcher4 as the default checksum algorithm. Fletcher2 was proven to
be a bit weak and OpenSolaris also switched to fletcher4.

r197498:	head/cddl/contrib/opensolaris

Fletcher4 is not the default checksum algorithm.

r197512:

- Don't depend on value returned by gfs_*_inactive(), it doesn't work
  well with forced unmounts when GFS vnodes are referenced.
- Make other preparations to GFS for forced unmounts.

PR:	kern/139062
Reported by:	trasz

r197513:

Use traverse() function to find and return mount point's vnode instead of
covered vnode when snapshot is already mounted.

r197514:

On lookup error VFS expects *vpp to be set to NULL, be sure to do that.

r197515:

Handle cases where virtual (GFS) vnodes are referenced when doing forced
unmount. In that case we cannot depend on the proper order of invalidating
vnodes, so we have to free resources when we have a chance.

PR:	kern/139062
Reported by:	trasz

r197525:

Ensure that tv_sec is between INT32_MIN and INT32_MAX, so ZFS won't object.
This completes the fix from r185586.

PR:	kern/139059
Reported by:	Daniel Braniss <danny@cs.huji.ac.il>
Submitted by:	Jaakko Heinonen <jh@saunalahti.fi>
Tested by:	Daniel Braniss <danny@cs.huji.ac.il>

Approved by:	re (kib)
2009-09-29 10:53:06 +00:00
Pawel Jakub Dawidek
9c0bf68299 MFC r197218:
We believe ZFS is ready for production use. Remove a warning about it being
experimental. :)

Approved by:	re (kib)
2009-09-15 12:21:06 +00:00
Pawel Jakub Dawidek
26e71a6c1b MFC r197219:
Forced unmounts work just fine in my tests under heavy load. There might
still be a problem, but it isn't worth a warning.

Approved by:	re (kib)
2009-09-15 12:19:34 +00:00
Pawel Jakub Dawidek
18713ab672 MFC r196456,r196457,r196458,r196662,r196702,r196703,r196919,r196927,r196928,
r196943,r196944,r196947,r196950,r196953,r196954,r196965,r196978,r196979,
r196980,r196982,r196985,r196992,r197131,r197133,r197150,r197151,r197152,
r197153,r197167,r197172,r197177,r197200,r197201:

r196456:
- Give minclsyspri and maxclsyspri real values (consulted with kmacy).
- Honour 'pri' argument for thread_create().

r196457:
Set priority of vdev_geom threads and zvol threads to PRIBIO.

r196458:
- Hide ZFS kernel threads under zfskern process.
- Use better (shorter) threads names:
	'zvol:worker zvol/tank/vol00' -> 'zvol tank/vol00'
	'vdev:worker da0' -> 'vdev da0'

r196662:
Add missing mountpoint vnode locking.
This fixes panic on assertion with DEBUG_VFS_LOCKS and vfs.usermount=1 when
regular user tries to mount dataset owned by him.

r196702:
Remove empty directory.

r196703:
Backport the 'dirtying dbuf' panic fix from newer ZFS version.

Reported by:	Thomas Backman <serenity@exscape.org>

r196919:
bzero() on-stack argument, so mutex_init() won't misinterpret that the
lock is already initialized if we have some garbage on the stack.

PR:	kern/135480
Reported by:	Emil Mikulic <emikulic@gmail.com>

r196927:
Changing provider size is not really supported by GEOM, but doing so when
provider is closed should be ok.
When administrator requests to change ZVOL size do it immediately if ZVOL
is closed or do it on last ZVOL close.

PR:	kern/136942
Requested by:	Bernard Buri <bsd@ask-us.at>

r196928:
Teach zdb(8) how to obtain GEOM provider size.

PR:	kern/133134
Reported by:	Philipp Wuensche <cryx-freebsd@h3q.com>

r196943:
- Avoid holding mutex around M_WAITOK allocations.
- Add locking for mnt_opt field.

r196944:
Don't recheck ownership on update mount. This will eliminate LOR between
vfs_busy() and mount mutex. We check ownership in vfs_domount() anyway.

Noticed by:	kib
Reviewed by:	kib

r196947:
Defer thread start until we set priority.

Reviewed by:	kib

r196950:
Fix detection of file system being shared. Now zfs unshare/destroy/rename
command will properly remove exported file systems.

r196953:
When snapshot mount point is busy (for example we are still in it)
we will fail to unmount it, but it won't be removed from the tree,
so in that case there is no need to reinsert it.

Reported by:	trasz

r196954:
If we have to use avl_find(), optimize a bit and use avl_insert() instead of
avl_add() (the latter is actually a wrapper around avl_find() + avl_insert()).
Fix similar case in the code that is currently commented out.

r196965:
Fix reference count leak for a case where snapshot's mount point is updated.

r196978:
Call ZFS_EXIT() after locking the vnode.

r196979:
On FreeBSD we don't have to look for snapshot's mount point,
because fhtovp method is already called with proper mount point.

r196980:
When we automatically mount snapshot we want to return vnode of the mount point
from the lookup and not covered vnode. This is one of the fixes for using .zfs/
over NFS.

r196982:
We don't export individual snapshots, so mnt_export field in snapshot's
mount point is NULL. That's why when we try to access snapshots over NFS
use mnt_export field from the parent file system.

r196985:
Only log successful commands! Without this fix we log even unsuccessful
commands executed by unprivileged users. Action is not really taken, but it is
logged to pool history, which might be confusing.

Reported by:	Denis Ahrens <denis@h3q.com>

r196992:
Implement __assert() for Solaris-specific code. Until now Solaris code was
using Solaris prototype for __assert(), but FreeBSD's implementation.
Both take different arguments, so we were either core-dumping in assert()
or printing garbage.

Reported by:	avg

r197131:
Tighten up the check for race in zfs_zget() - ZTOV(zp) can not only contain
NULL, but also can point to dead vnode, take that into account.

PR:	kern/132068
Reported by:	Edward Fisk <7ogcg7g02@sneakemail.com>, kris
Fix based on patch from:	Jaakko Heinonen <jh@saunalahti.fi>

r197133:
- Protect reclaim with z_teardown_inactive_lock.
- Be prepared for dbuf to disappear in zfs_reclaim_complete() and check if
  z_dbuf field is NULL - this might happen in case of rollback or forced
  unmount between zfs_freebsd_reclaim() and zfs_reclaim_complete().
- On forced unmount wait for all znodes to be destroyed - destruction can be
  done asynchronously via zfs_reclaim_complete().

r197150:
There is a bug where mze_insert() can trigger an assert() of inserting
the same entry twice. This bug is not fixed yet, but leads to situation
where when try to access corrupted directory the kernel will panic.
Until the bug is properly fixed, try to recover from it and log that it
happened.

Reported by:	marck
OpenSolaris bug:	6709336

r197151:
Be sure not to overflow struct fid.

r197152:
Extend scope of the z_teardown_lock lock for consistency and "just in case".

r197153:
When zfs.ko is compiled with debug, make sure that znode and vnode point at
each other.

r197167:
Work-around READDIRPLUS problem with .zfs/ and .zfs/snapshot/ directories
by just returning EOPNOTSUPP. This will allow NFS server to fall back to
regular READDIR.
Note that converting inode number to snapshot's vnode is expensive operation.
Snapshots are stored in AVL tree, but based on their names, not inode numbers,
so to convert inode to snapshot vnode we have to interate over all snalshots.
This is not a problem in OpenSolaris, because in their READDIRPLUS
implementation they use VOP_LOOKUP() on d_name, instead of VFS_VGET() on
d_fileno as we do.

PR:	kern/125149
Reported by:	Weldon Godfrey <wgodfrey@ena.com>
Analysis by:	Jaakko Heinonen <jh@saunalahti.fi>

r197172:
Add missing \n.

Reported by:	marck

r197177:
Support both case: when snapshot is already mounted and when it is not yet
mounted.

r197200:
Modify mount(8) to skip MNT_IGNORE file systems by default, just like df(1)
does. This is not POLA violation, because there is no single file system in the
base that use MNT_IGNORE currently, although ZFS snapshots will be mounted with
MNT_IGNORE after next commit.

Reviewed by:	kib

r197201:
- Mount ZFS snapshots with MNT_IGNORE flag, so they are not visible in regular
  df(1) and mount(8) output. This is a bit smilar to OpenSolaris and follows
  ZFS route of not listing snapshots by default with 'zfs list' command.
- Add UPDATING entry to note that ZFS snapshots are no longer visible in
  mount(8) and df(1) output by default.

Reviewed by:	kib

Approved by:	re (bz)
2009-09-15 11:13:40 +00:00
Konstantin Belousov
8f0b752891 MFC r196966:
Lock Giant around vn_open_cred().
Remove innocent unnecessary call to NDFREE().

Approved by:	re (kensmith)
2009-09-11 12:56:13 +00:00
Pawel Jakub Dawidek
8844a10730 MFC r196395:
Our libc doesn't implement control method for XDR (only kernel does) and it
will always return failure. Fix this by bringing userland implementation of
xdrmem_control() back. This allow 'zpool import' to work again.

Reported by:	Thomas Backman <serenity@exscape.org>
Reviewed by:	kmacy
Approved by:	re (kib)
2009-08-20 00:08:58 +00:00
Pawel Jakub Dawidek
1aefcd39f0 MFC r196309:
getcwd() (when __getcwd() fails) works by stating current directory, going up
(..), calling readdir and looking for previous directory inode.  In case of
.zfs/ directory this doesn't work, because .zfs/ is hidden by default, so it
won't be visible in readdir output.

Fix this by implementing VPTOCNP for snapshot directories, so __getcwd()
doesn't fail and getcwd() doesn't have to use readdir method.

This fixes /bin/pwd from within .zfs/snapshot/<name>/.

Suggested by:	kib
Approved by:	re (rwatson)
2009-08-17 10:02:31 +00:00
Pawel Jakub Dawidek
5adfc444cf MFC r196307:
Manage asynchronous vnode release just like Solaris.

Discussed with:	kmacy
Approved by:	re (kib)
2009-08-17 09:55:58 +00:00
Pawel Jakub Dawidek
be7e2e42e1 MFC r196303:
- Reduce z_teardown_lock lock scope a bit.
- The error variable is int, not bool.
- Convert spaces to tabs where needed.

Approved by:	re (kib)
2009-08-17 09:30:31 +00:00
Pawel Jakub Dawidek
a64b735739 MFC r196301:
If z_buf is NULL, we should free znode immediately.

Noticed by:	avg
Approved by:	re (kib)
2009-08-17 09:27:10 +00:00
Pawel Jakub Dawidek
93be9449e4 MFC r196299:
- We need to recycle vnode instead of freeing znode.

Submitted by:	avg

- Add missing vnode interlock unlock.
- Remove redundant znode locking.

Approved by:	re (kib)
2009-08-17 09:23:27 +00:00
Pawel Jakub Dawidek
f0fb1d62c7 MFC r196297:
Fix panic in zfs recv code. The last vnode (mountpoint's vnode) can have
0 usecount.

Reported by:	Thomas Backman <serenity@exscape.org>
Approved by:	re (kib)
2009-08-17 09:14:58 +00:00
Pawel Jakub Dawidek
e43f173602 MFC r196295:
Remove OpenSolaris taskq port (it performs very poorly in our kernel) and
replace it with wrappers around our taskqueue(9).
To make it possible implement taskqueue_member() function which returns 1
if the given thread was created by the given taskqueue.

Approved by:	re (kib)
2009-08-17 09:03:47 +00:00
Pawel Jakub Dawidek
4aae820003 MFC r196291:
- Fix a race where /dev/zfs control device is created before ZFS is fully
  initialized. Also destroy /dev/zfs before doing other deinitializations.
- Initialization through taskq is no longer needed and there is a race
  where one of the zpool/zfs command loads zfs.ko and tries to do some work
  immediately, but /dev/zfs is not there yet.

Reported by:	pav
Approved by:	re (kib)
2009-08-17 08:38:41 +00:00
Pawel Jakub Dawidek
9e813d55e3 MFC r196289:
Remove files that are no longer used.

Discussed with:	kmacy
Approved by:	re (kib)
2009-08-17 08:09:46 +00:00
Marcel Moolenaar
7ea9d2977c MFC revision 196269:
Fix misalignment in nvpair_native_embedded() caused by the compiler
replacing the bzero().

Approved by:	re (kensmith)
2009-08-16 02:21:24 +00:00
Edward Tomasz Napierala
dba91be244 InstaMFC 196179: Remove CDDL warning.
Approved by:	re (kib), core
2009-08-13 13:56:05 +00:00
Pawel Jakub Dawidek
abd8353f5d We don't support ephemeral IDs in FreeBSD and without this fix ZFS can
panic when in zfs_fuid_create_cred() when userid is negative. It is
converted to unsigned value which makes IS_EPHEMERAL() macro to
incorrectly report that this is ephemeral ID. The most reasonable
solution for now is to always report that the given ID is not ephemeral.

PR:		kern/132337
Submitted by:	Matthew West <freebsd@r.zeeb.org>
Tested by:	Thomas Backman <serenity@exscape.org>, Michael Reifenberger <mike@reifenberger.com>
Approved by:	re (kib)
MFC after:	2 weeks
2009-07-27 14:52:34 +00:00
Edward Tomasz Napierala
d2ceff236a Fix extattr_list_file(2) on ZFS in case the attribute directory
doesn't exist and user doesn't have write access to the file.
Without this fix, it returns bogus value instead of 0.  For some
reason this didn't manifest on my kernel compiled with -O0.

PR:		kern/136601
Submitted by:	Jaakko Heinonen <jh at saunalahti dot fi>
Approved by:	re (kib)
2009-07-22 15:15:58 +00:00
Edward Tomasz Napierala
65588fd503 Fix permission handling for extended attributes in ZFS. Without
this change, ZFS uses SunOS Alternate Data Streams semantics - each
EA has its own permissions, which are set at EA creation time
and - unlike SunOS - invisible to the user and impossible to change.
From the user point of view, it's just broken: sometimes access
is granted when it shouldn't be, sometimes it's denied when
it shouldn't be.

This patch makes it behave just like UFS, i.e. depend on current
file permissions.  Also, it fixes returned error codes (ENOATTR
instead of ENOENT) and makes listextattr(2) return 0 instead
of EPERM where there is no EA directory (i.e. the file never had
any EA).

Reviewed by:	pjd (idea, not actual code)
Approved by:	re (kib)
2009-07-20 19:16:42 +00:00
Andriy Gapon
b064b6d1cd dtrace_gethrtime: improve scaling of TSC ticks to nanoseconds
Currently dtrace_gethrtime uses formula similar to the following for
converting TSC ticks to nanoseconds:
rdtsc() * 10^9 / tsc_freq
The dividend overflows 64-bit type and wraps-around every 2^64/10^9 =
18446744073 ticks which is just a few seconds on modern machines.

Now we instead use precalculated scaling factor of
10^9*2^N/tsc_freq < 2^32 and perform TSC value multiplication separately
for each 32-bit half.  This allows to avoid overflow of the dividend
described above.
The idea is taken from OpenSolaris.
This has an added feature of always scaling TSC with invariant value
regardless of TSC frequency changes. Thus the timestamps will not be
accurate if TSC actually changes, but they are always proportional to
TSC ticks and thus monotonic. This should be much better than current
formula which produces wildly different non-monotonic results on when
tsc_freq changes.

Also drop write-only 'cp' variable from amd64 dtrace_gethrtime_init()
to make it identical to the i386 twin.

PR:		kern/127441
Tested by:	Thomas Backman <serenity@exscape.org>
Reviewed by:	jhb
Discussed with:	current@, bde, gnn
Silence from:	jb
Approved by:	re (gnn)
MFC after:	1 week
2009-07-15 17:07:39 +00:00
Konstantin Belousov
f33a947b56 Add new msleep(9) flag PBDY that shall be specified together with
PCATCH, to indicate that thread shall not be stopped upon receipt of
SIGSTOP until it reaches the kernel->usermode boundary.

Also change thread_single(SINGLE_NO_EXIT) to only stop threads at
the user boundary unconditionally.

Tested by:	pho
Reviewed by:	jhb
Approved by:	re (kensmith)
2009-07-14 22:52:46 +00:00
Marcel Moolenaar
dd8a461e83 In nvpair_native_embedded_array(), meaningless pointers are zeroed.
The programmer was aware that alignment was not guaranteed in the
packed structure and used bzero() to NULL out the pointers.
However, on ia64, the compiler is quite agressive in finding ILP
and calls to bzero() are often replaced by simple assignments (i.e.
stores). Especially when the width or size in question corresponds
with a store instruction (i.e. st1, st2, st4 or st8).

The problem here is not a compiler bug. The address of the memory
to zero-out was given by '&packed->nvl_priv' and given the type of
the 'packed' pointer the compiler could assume proper alignment for
the replacement of bzero() with an 8-byte wide store to be valid.
The problem is with the programmer. The programmer knew that the
address did not have the alignment guarantees needed for a regular
assignment, but failed to inform the compiler of that fact. In
fact, the programmer told the compiler the opposite: alignment is
guaranteed.

The fix is to avoid using a pointer of type "nvlist_t *" and
instead use a "char *" pointer as the basis for calculating the
address. This tells the compiler that only 1-byte alignment can
be assumed and the compiler will either keep the bzero() call
or instead replace it with a sequence of byte-wise stores. Both
are valid.

Approved by:	re (kib)
2009-07-11 22:43:20 +00:00
Andriy Gapon
f340e9fe71 dtrace/amd64: fix virtual address checks
On amd64 KERNBASE/kernbase does not mean start of kernel memory.
This should fix a KASSERT panic in dtrace_copycheck when copyin*()
is used in D program.
Also make checks for user memory a bit stricter.

Reported by:	Thomas Backman <serenity@exscape.org>
Submitted by:	wxs (kaddr part)
Tested by:	Thomas Backman (prototype), wxs
Reviewed by:	alc (concept), jhb, current@
Aprroved by:	jb (concept)
MFC after:	2 weeks
PR:		kern/134408
2009-06-24 16:03:57 +00:00
Konstantin Belousov
a18a95db4a O_NOFOLLOW shall be in flags, not in cmode.
Noted by:	bde
2009-06-22 10:08:48 +00:00
Konstantin Belousov
e0c161b89c Add another flags argument to vn_open_cred. Use it to specify that some
vn_open_cred invocations shall not audit namei path.

In particular, specify VN_OPEN_NOAUDIT for dotdot lookup performed by
default implementation of vop_vptocnp, and for the open done for core
file. vn_fullpath is called from the audit code, and vn_open there need
to disable audit to avoid infinite recursion. Core file is created on
return to user mode, that, in particular, happens during syscall return.
The creation of the core file is audited by direct calls, and we do not
want to overwrite audit information for syscall.

Reported, reviewed and tested by: rwatson
2009-06-21 13:41:32 +00:00
Jamie Gritton
c1f192193d Rename the host-related prison fields to be the same as the host.*
parameters they represent, and the variables they replaced, instead of
abbreviated versions of them.

Approved by:	bz (mentor)
2009-06-13 15:39:12 +00:00
Kip Macy
f0c6b798a3 pjd has requested that I keep the tunable as zfs_prefetch_disable to minimize gratuitous
differences with Opensolaris' ZFS

Sorry for the churn
2009-06-11 22:24:08 +00:00
Kip Macy
e4e5e663e0 check against prefetch_enable 2009-06-11 09:51:21 +00:00