Commit Graph

104 Commits

Author SHA1 Message Date
Martin Matuska
5c92680fa9 Recommit r218169, enclosing with #ifdef _KERNEL
This change is sufficient for the ZFS kernel module.

Discussed with:	pjd
MFC after:	1 week
2011-02-01 23:12:13 +00:00
Alexander Kabaev
a9c28a203d Revert r218169 until it can be tested and fixed properly. 2011-02-01 21:15:35 +00:00
Martin Matuska
4530e5f790 For ZFS, change the type of clock_t to int64_t.
The clock_t type in OpenSolaris is long (int64_t on amd64).
On FreeBSD clock_t is int32_t. The clock_t type is used in several places
in the ZFS code to store system uptime in milliseconds ("seconds * hz").

With hz=1000 we have a 32-bit integer overflow in 24 days, 20 hours,
31 minutes and 23.648 seconds. This has a user reported negative impact
on l2arc_feed_thread() and may cause unexpected results from other functions
using clock_t.

Reported by:	Artem Belevich <fbsdlist@src.cx> on freebsd-fs@
MFC after:	1 week
2011-02-01 14:28:50 +00:00
Jayachandran C.
baa8c35cb4 CDDL fixes for MIPS n32.
Provide 64 bit atomic ops, and use 32 bit pointer.
2011-01-28 06:12:59 +00:00
Matthew D Fleming
cbc134ad03 Introduce signed and unsigned version of CTLTYPE_QUAD, renaming
existing uses.  Rename sysctl_handle_quad() to sysctl_handle_64().
2011-01-19 23:00:25 +00:00
Andriy Gapon
58f61ce4eb opensolaris cyclic: fix deadlock and make a little bit closer to upstream
The dealock was caused in the following way:
- thread T1 on CPU C1 holds a spin mutex, IPIs CPU C2 and waits for the
  IPI to be handled
- C2 executes timer interrupt filter, thus has interrupts disabled, and
  gets blocked on the spin mutex held by T1
The problem seems to have been introduced by simplifications made to
OpenSolaris code during porting.
The problem is fixed by reorganizing the code to more closely resemble
the upstream version.  Interrupt filter (cyclic_fire) now doesn't
acquire any locks, all per-CPU data accesses are performed on a
target CPU with preemption and interrupts disabled thus precluding
concurrent access to the data.
cyp_mtx spin mutex is used to disable preemtion and interrupts; it's not
used for classical mutual exclusion, because xcall already serializes
calls to a CPU.  It's an emulation of OpenSolaris
cyb_set_level(CY_HIGH_LEVEL) call, the spin mutexes could probably be
reduced to just a spinlock_enter()/_exit() pair.

Diff with upstream version is now reduced by ~500 lines, however it still
remains quite large - many things that are not needed (at the moment) or
are irrelevant on FreeBSD were simply ripped out during porting.
Examples of such things:
- support for CPU onlining/offlining
- support for suspend/resume
- support for running callouts at soft interrupt levels
- support for callout rebinding from CPU to CPU
- support for CPU partitions

Tested by:	Artem Belevich <fbsdlist@src.cx>
MFC after:	3 weeks
X-MFC with:	r216252
2010-12-07 12:25:26 +00:00
Edward Tomasz Napierala
de2a57325d Don't panic when we read an empty ACL from ZFS. Apparently this may happen
with filesystems created under MacOS X ZFS port.  This is kind of filesystem
corruption (we don't allow for setting empty ACLs), so make acl_get_file(3)
and related syscalls fail with EINVAL in that case.  In theory, we could
return empty ACL to userland, but I'm afraid this would break some code.

MFC after:	3 days
2010-11-30 21:04:05 +00:00
Rui Paulo
910a5e18ba Pass a format string to panic() and to taskqueue_start_threads().
Found with:	clang
2010-10-13 17:13:43 +00:00
Andriy Gapon
6c6aca1203 opensolaris_kmem kmem_size(): report lesser of vm_kmem_size and available
physical memory

This is needed to correctly autotune ZFS ARC size when vm_kmem_size is
set to value larger than available physical memory.

MFC after:	2 weeks
2010-10-07 18:16:14 +00:00
Martin Matuska
d1ee63f836 Fix kernel panic when moving a file to .zfs/shares
Fix possible loss of correct error return code in ZFS mount

OpenSolaris revisions and Bug IDs:

11824:53128e5db7cf
6863610	ZFS mount can lose correct error return

12079:13822b941977
6939941	problem with moving files in zfs (142901-12)

Approved by:	delphij (mentor)
Obtained from:	OpenSolaris (Bug ID 6863610, 6939941)
MFC after:	3 days
2010-09-15 19:55:26 +00:00
Andriy Gapon
8a3883cfb7 zfs vn_has_cached_data: take into account v_object->cache != NULL
This mirrors code in tmpfs.
This changge shouldn't affect much read path, it may cause unnecessary
vm_page_lookup calls in the case where v_object has no active or inactive
pages but has some cache pages.  I believe this situation to be non-essential.

In write path this change should allow us to properly detect the above
case and free a cache page when we write to a range that corresponds to it.
If this situation is undetected then we could have a discrepancy between
data in page cache and in ARC or on disk.

This change allows us to re-enable vn_has_cached_data() check in zfs_write.

NOTE: strictly speaking resident_page_count and cache fields of v_object
should be exmined under VM_OBJECT_LOCK, but for this particular usage
we may get away with it.

Discussed with:	alc, kib
Approved by:	pjd
Tested with:	tools/regression/fsx
MFC after:	3 weeks
2010-09-15 11:05:41 +00:00
Martin Matuska
8d87b396f8 Import changes from OpenSolaris that provide
- better ACL caching and speedup of ACL permission checks
- faster handling of stat()
- lowered mutex contention in the read/writer lock (rrwlock)
- several related bugfixes

Detailed information (OpenSolaris onnv changesets and Bug IDs):

9749:105f407a2680
6802734	Support for Access Based Enumeration (not used on FreeBSD)
6844861	inconsistent xattr readdir behavior with too-small buffer

9866:ddc5f1d8eb4e
6848431	zfs with rstchown=0 or file_chown_self privilege allows user to "take" ownership

9981:b4907297e740
6775100	stat() performance on files on zfs should be improved
6827779	rrwlock is overly protective of its counters

10143:d2d432dfe597
6857433	memory leaks found at: zfs_acl_alloc/zfs_acl_node_alloc
6860318	truncate() on zfsroot succeeds when file has a component of its path set without access permission

10232:f37b85f7e03e
6865875	zfs sometimes incorrectly giving search access to a dir

10250:b179ceb34b62
6867395	zpool_upgrade_007_pos testcase panic'd with BAD TRAP: type=e (#pf Page fault)

10269:2788675568fd
6868276	zfs_rezget() can be hazardous when znode has a cached ACL

10295:f7a18a1e9610
6870564	panic in zfs_getsecattr

Approved by:	delphij (mentor)
Obtained from:	OpenSolaris (multiple Bug IDs)
MFC after:	2 weeks
2010-08-28 09:24:11 +00:00
Rui Paulo
cd306d6fa1 Add a sysname char * to struct opensolaris_utsname.
Sponsored by:	The FreeBSD Foundation
2010-08-21 14:09:24 +00:00
Rui Paulo
e0be1c75f0 Add sysname to struct opensolaris_utsname. This is needed by one DTrace
test.

Sponsored by:	The FreeBSD Foundation
2010-08-21 11:41:32 +00:00
Martin Matuska
8fc257994d Merge ZFS version 15 and almost all OpenSolaris bugfixes referenced
in Solaris 10 updates 141445-09 and 142901-14.

Detailed information:
(OpenSolaris revisions and Bug IDs, Solaris 10 patch numbers)

7844:effed23820ae
6755435	zfs_open() and zfs_close() needs to use ZFS_ENTER/ZFS_VERIFY_ZP (141445-01)

7897:e520d8258820
6748436	inconsistent zpool.cache in boot_archive could panic a zfs root filesystem upon boot-up (141445-01)

7965:b795da521357
6740164	zpool attach can create an illegal root pool (141909-02)

8084:b811cc60d650
6769612	zpool_import() will continue to write to cachefile even if altroot is set (N/A)

8121:7fd09d4ebd9c
6757430	want an option for zdb to disable space map loading and leak tracking (141445-01)

8129:e4f45a0bfbb0
6542860	ASSERT: reason != VDEV_LABEL_REMOVE||vdev_inuse(vd, crtxg, reason, 0) (141445-01)

8188:fd00c0a81e80
6761100	want zdb option to select older uberblocks (141445-01)

8190:6eeea43ced42
6774886	zfs_setattr() won't allow ndmp to restore SUNWattr_rw (141445-01)

8225:59a9961c2aeb
6737463	panic while trying to write out config file if root pool import fails (141445-01)

8227:f7d7be9b1f56
6765294	Refactor replay (141445-01)

8228:51e9ca9ee3a5
6572357	libzfs should do more to avoid mnttab lookups (141909-01)
6572376	zfs_iter_filesystems and zfs_iter_snapshots get objset stats twice (141909-01)

8241:5a60f16123ba
6328632	zpool offline is a bit too conservative (141445-01)
6739487	ASSERT: txg <= spa_final_txg due to scrub/export race (141445-01)
6767129	ASSERT: cvd->vdev_isspare, in spa_vdev_detach() (141445-01)
6747698	checksum failures after offline -t / export / import / scrub (141445-01)
6745863	ZFS writes to disk after it has been offlined (141445-01)
6722540	50% slowdown on scrub/resilver with certain vdev configurations (141445-01)
6759999	resilver logic rewrites ditto blocks on both source and destination (141445-01)
6758107	I/O should never suspend during spa_load() (141445-01)
6776548	codereview(1) runs off the page when faced with multi-line comments (N/A)
6761406	AMD errata 91 workaround doesn't work on 64-bit systems (141445-01)

8242:e46e4b2f0a03
6770866	GRUB/ZFS should require physical path or devid, but not both (141445-01)

8269:03a7e9050cfd
6674216	"zfs share" doesn't work, but "zfs set sharenfs=on" does (141445-01)
6621164	$SRC/cmd/zfs/zfs_main.c seems to have a syntax error in the translation note (141445-01)
6635482	i18n problems in libzfs_dataset.c and zfs_main.c (141445-01)
6595194	"zfs get" VALUE column is as wide as NAME (141445-01)
6722991	vdev_disk.c: error checking for ddi_pathname_to_dev_t() must test for NODEV (141445-01)
6396518	ASSERT strings shouldn't be pre-processed (141445-01)

8274:846b39508aff
6713916	scrub/resilver needlessly decompress data (141445-01)

8343:655db2375fed
6739553	libzfs_status msgid table is out of sync (141445-01)
6784104	libzfs unfairly rejects numerical values greater than 2^63 (141445-01)
6784108	zfs_realloc() should not free original memory on failure (141445-01)

8525:e0e0e525d0f8
6788830	set large value to reservation cause core dump (141445-01)
6791064	want sysevents for ZFS scrub (141445-01)
6791066	need to be able to set cachefile on faulted pools (141445-01)
6791071	zpool_do_import() should not enable datasets on faulted pools (141445-01)
6792134	getting multiple properties on a faulted pool leads to confusion (141445-01)

8547:bcc7b46e5ff7
6792884	Vista clients cannot access .zfs (141445-01)

8632:36ef517870a3
6798384	It can take a village to raise a zio (141445-01)

8636:7e4ce9158df3
6551866	deadlock between zfs_write(), zfs_freesp(), and zfs_putapage() (141909-01)
6504953	zfs_getpage() misunderstands VOP_GETPAGE() interface (141909-01)
6702206	ZFS read/writer lock contention throttles sendfile() benchmark (141445-01)
6780491	Zone on a ZFS filesystem has poor fork/exec performance (141445-01)
6747596	assertion failed: DVA_EQUAL(BP_IDENTITY(&zio->io_bp_orig), BP_IDENTITY(zio->io_bp))); (141445-01)

8692:692d4668b40d
6801507	ZFS read aggregation should not mind the gap (141445-01)

8697:e62d2612c14d
6633095	creating a filesystem with many properties set is slow (141445-01)

8768:dfecfdbb27ed
6775697	oracle crashes when overwriting after hitting quota on zfs (141909-01)

8811:f8deccf701cf
6790687	libzfs mnttab caching ignores external changes (141445-01)
6791101	memory leak from libzfs_mnttab_init (141445-01)

8845:91af0d9c0790
6800942	smb_session_create() incorrectly stores IP addresses (N/A)
6582163	Access Control List (ACL) for shares (141445-01)
6804954	smb_search - shortname field should be space padded following the NULL terminator (N/A)
6800184	Panic at smb_oplock_conflict+0x35() (N/A)

8876:59d2e67b4b65
6803822	Reboot after replacement of system disk in a ZFS mirror drops to grub> prompt (141445-01)

8924:5af812f84759
6789318	coredump when issue zdb -uuuu poolname/ (141445-01)
6790345 zdb -dddd -e poolname coredump (141445-01)
6797109 zdb: 'zdb -dddddd pool_name/fs_name inode' coredump if the file with inode was deleted (141445-01)
6797118 zdb: 'zdb -dddddd poolname inum' coredump if I miss the fs name (141445-01)
6803343 shareiscsi=on failed, iscsitgtd failed request to share (141445-01)

9030:243fd360d81f
6815893	hang mounting a dataset after booting into a new boot environment (141445-01)

9056:826e1858a846
6809691	'zpool create -f' no longer overwrites ufs infomation (141445-01)

9179:d8fbd96b79b3
6790064	zfs needs to determine uid and gid earlier in create process (141445-01)

9214:8d350e5d04aa
6604992	forced unmount + being in .zfs/snapshot/<snap1> = not happy (141909-01)
6810367	assertion failed: dvp->v_flag & VROOT, file: ../../common/fs/gfs.c, line: 426 (141909-01)

9229:e3f8b41e5db4
6807765	ztest_dsl_dataset_promote_busy needs to clean up after ENOSPC (141445-01)

9230:e4561e3eb1ef
6821169	offlining a device results in checksum errors (141445-01)
6821170	ZFS should not increment error stats for unavailable devices (141445-01)
6824006	need to increase issue and interrupt taskqs threads in zfs (141445-01)

9234:bffdc4fc05c4
6792139	recovering from a suspended pool needs some work (141445-01)
6794830	reboot command hangs on a failed zfs pool (141445-01)

9246:67c03c93c071
6824062	System panicked in zfs_mount due to NULL pointer dereference when running btts and svvs tests (141909-01)

9276:a8a7fc849933
6816124	System crash running zpool destroy on broken zpool (141445-03)

9355:09928982c591
6818183	zfs snapshot -r is slow due to set_snap_props() doing txg_wait_synced() for each new snapshot (141445-03)

9391:413d0661ef33
6710376	log device can show incorrect status when other parts of pool are degraded (141445-03)

9396:f41cf682d0d3 (part already merged)
6501037	want user/group quotas on ZFS (141445-03)
6827260	assertion failed in arc_read(): hdr == pbuf->b_hdr (141445-03)
6815592	panic: No such hold X on refcount Y from zfs_znode_move (141445-03)
6759986	zfs list shows temporary %clone when doing online zfs recv (141445-03)

9404:319573cd93f8
6774713	zfs ignores canmount=noauto when sharenfs property != off (141445-03)

9412:4aefd8704ce0
6717022	ZFS DMU needs zero-copy support (141445-03)

9425:e7ffacaec3a8
6799895	spa_add_spares() needs to be protected by config lock (141445-03)
6826466	want to post sysevents on hot spare activation (141445-03)
6826468	spa 'allowfaulted' needs some work (141445-03)
6826469	kernel support for storing vdev FRU information (141445-03)
6826470	skip posting checksum errors from DTL regions of leaf vdevs (141445-03)
6826471	I/O errors after device remove probe can confuse FMA (141445-03)
6826472	spares should enjoy some of the benefits of cache devices (141445-03)

9443:2a96d8478e95
6833711	gang leaders shouldn't have to be logical (141445-03)

9463:d0bd231c7518
6764124	want zdb to be able to checksum metadata blocks only (141445-03)

9465:8372081b8019
6830237	zfs panic in zfs_groupmember() (141445-03)

9466:1fdfd1fed9c4
6833162	phantom log device in zpool status (141445-03)

9469:4f68f041ddcd
6824968	add ZFS userquota support to rquotad (141445-03)

9470:6d827468d7b5
6834217	godfather I/O should reexecute (141445-03)

9480:fcff33da767f
6596237	Stop looking and start ganging (141909-02)

9493:9933d599bc93
6623978	lwb->lwb_buf != NULL, file ../../../uts/common/fs/zfs/zil.c, line 787, function zil_lwb_commit (141445-06)

9512:64cafcbcc337
6801810	Commit of aligned streaming rewrites to ZIL device causes unwanted disk reads (N/A)

9515:d3b739d9d043
6586537	async zio taskqs can block out userland commands (142901-09)

9554:787363635b6a
6836768	zfs_userspace() callback has no way to indicate failure (N/A)

9574:1eb6a6ab2c57
6838062	zfs panics when an error is encountered in space_map_load() (141909-02)

9583:b0696cd037cc
6794136	Panic BAD TRAP: type=e when importing degraded zraid pool. (141909-03)

9630:e25a03f552e0
6776104	"zfs import" deadlock between spa_unload() and spa_async_thread() (141445-06)

9653:a70048a304d1
6664765	Unable to remove files when using fat-zap and quota exceeded on ZFS filesystem (141445-06)

9688:127be1845343
6841321	zfs userspace / zfs get userused@ doesn't work on mounted snapshot (N/A)
6843069	zfs get userused@S-1-... doesn't work (N/A)

9873:8ddc892eca6e
6847229	assertion failed: refcount_count(&tx->tx_space_written) + delta <= tx->tx_space_towrite in dmu_tx.c (141445-06)

9904:d260bd3fd47c
6838344	kernel heap corruption detected on zil while stress testing (141445-06)

9951:a4895b3dd543
6844900	zfs_ioc_userspace_upgrade leaks (N/A)

10040:38b25aeeaf7a
6857012	zfs panics on zpool import (141445-06)

10000:241a51d8720c
6848242	zdb -e no longer works as expected (N/A)

10100:4a6965f6bef8
6856634	snv_117 not booting: zfs_parse_bootfs: error2 (141445-07)

10160:a45b03783d44
6861983	zfs should use new name <-> SID interfaces (N/A)
6862984	userquota commands can hang (141445-06)

10299:80845694147f
6696858	zfs receive of incremental replication stream can dereference NULL pointer and crash (N/A)

10302:a9e3d1987706
6696858	zfs receive of incremental replication stream can dereference NULL pointer and crash (fix lint) (N/A)

10575:2a8816c5173b (partial merge)
6882227 spa_async_remove() shouldn't do a full clear (142901-14)

10800:469478b180d9
6880764	fsync on zfs is broken if writes are greater than 32kb on a hard crash and no log attached (142901-09)
6793430 zdb -ivvvv assertion failure: bp->blk_cksum.zc_word[2] == dmu_objset_id(zilog->zl_os) (N/A)

10801:e0bf032e8673 (partial merge)
6822816 assertion failed: zap_remove_int(ds_next_clones_obj) returns ENOENT (142901-09)

10810:b6b161a6ae4a
6892298 buf->b_hdr->b_state != arc_anon, file: ../../common/fs/zfs/arc.c, line: 2849 (142901-09)

10890:499786962772
6807339	spurious checksum errors when replacing a vdev (142901-13)

11249:6c30f7dfc97b
6906110 bad trap panic in zil_replay_log_record (142901-13)
6906946 zfs replay isn't handling uid/gid correctly (142901-13)

11454:6e69bacc1a5a
6898245 suspended zpool should not cause rest of the zfs/zpool commands to hang (142901-10)

11546:42ea6be8961b (partial merge)
6833999 3-way deadlock in dsl_dataset_hold_ref() and dsl_sync_task_group_sync() (142901-09)

Discussed with:	pjd
Approved by:	delphij (mentor)
Obtained from:	OpenSolaris (multiple Bug IDs)
MFC after:	2 months
2010-07-12 23:49:04 +00:00
Pawel Jakub Dawidek
cfb3e98d37 Add task structure to zio and use it instead of allocating one.
This eliminates the only place where we can sleep when calling zio_interrupt().
As a side-effect this can actually improve performance a little as we
allocate one less thing for every I/O.

Prodded by:	kib
MFC after:	1 week
2010-05-16 15:12:34 +00:00
Martin Matuska
c43d127a9a Import OpenSolaris revision 7837:001de5627df3
It includes the following changes:
- parallel reads in traversal code (Bug ID 6333409)
- faster traversal for zfs send (Bug ID 6418042)
- traversal code cleanup (Bug ID 6725675)
- fix for two scrub related bugs (Bug ID 6729696, 6730101)
- fix assertion in dbuf_verify (Bug ID 6752226)
- fix panic during zfs send with i/o errors (Bug ID 6577985)
- replace P2CROSS with P2BOUNDARY (Bug ID 6725680)

List of OpenSolaris Bug IDs:
6333409, 6418042, 6757112, 6725668, 6725675, 6725680,
6725698, 6729696, 6730101, 6752226, 6577985, 6755042

Approved by:	pjd, delphij (mentor)
Obtained from:	OpenSolaris (multiple Bug IDs)
MFC after:	1 week
2010-05-13 20:32:56 +00:00
Pawel Jakub Dawidek
c60c36a745 I added vfs_lowvnodes event, but it was only used for a short while and now
it is totally unused. Remove it.

MFC after:	3 days
2010-05-11 22:46:36 +00:00
Xin LI
0e568ab25c Partially MFp4 #176265 by pjd@:
- Properly initialize and destroy system_taskq.
 - Add a dummy implementation of taskq_create_proc().

Note: We do not currently use system_taskq in ZFS so this is mostly a
no-op at this time.  Proper system_taskq initialization is required
by newer ZFS code.

Ok'ed by:	pjd
MFC after:	2 weeks
2010-04-19 09:03:36 +00:00
Xin LI
63243c5c71 On FreeBSD, time_t is 64-bit for all platforms except i386 and powerpc,
where the type is 32-bit.  ZFS can handle 64-bit timestamp internally
but zfs_setattr() would check if the time value can fit, we change the
checking macros to match 64-bit timestamp if the platform supports it.

This change has some downsides like, while you can import zfs on 32-bit
platforms, the timestamp would overflow if they are out of the range.

This fixes the Y2.038K issue on platforms using 64-bit timestamps.

Reviewed by:	pjd
MFC after:	1 month
2010-01-25 07:52:54 +00:00
Pawel Jakub Dawidek
fd66267ffb - zfs_zaccess() can handle VAPPEND too, so map V_APPEND to VAPPEND and call
zfs_access() instead of vaccess() in this case as well.
- If VADMIN is specified with another V* flag (unlikely) call both
  zfs_access() and vaccess() after spliting V* flags.

This fixes "dirtying snapshot!" panic.

PR:		kern/139806
Reported by:	Carl Chave <carl@chave.us>
In co-operation with:	jh
MFC after:	3 days
2009-10-30 23:33:06 +00:00
Pawel Jakub Dawidek
c217b20ef6 Allow file system owner to modify system flags if securelevel permits.
MFC after:	3 days
2009-10-08 16:05:17 +00:00
Pawel Jakub Dawidek
68c53ef849 File system owner is when uid matches and jail matches.
MFC after:	3 days
2009-10-08 16:03:19 +00:00
Pawel Jakub Dawidek
63e1d3df27 - Mount ZFS snapshots with MNT_IGNORE flag, so they are not visible in regular
df(1) and mount(8) output. This is a bit smilar to OpenSolaris and follows
  ZFS route of not listing snapshots by default with 'zfs list' command.
- Add UPDATING entry to note that ZFS snapshots are no longer visible in
  mount(8) and df(1) output by default.

Reviewed by:	kib
MFC after:	3 days
2009-09-14 21:10:40 +00:00
Konstantin Belousov
211ddddce7 Lock Giant around vn_open_cred().
Remove innocent unnecessary call to NDFREE().

Reported by:	marcel
Reviewed and tested by:	pjd
MFC after:	3 days
2009-09-08 09:17:34 +00:00
Pawel Jakub Dawidek
08780916dd Defer thread start until we set priority.
Reviewed by:	kib
MFC after:	3 days
2009-09-07 19:22:44 +00:00
Pawel Jakub Dawidek
2ff6f0f89a - Avoid holding mutex around M_WAITOK allocations.
- Add locking for mnt_opt field.

MFC after:	1 week
2009-09-07 18:23:26 +00:00
Pawel Jakub Dawidek
5d5535163a - Hide ZFS kernel threads under zfskern process.
- Use better (shorter) threads names:
	'zvol:worker zvol/tank/vol00' -> 'zvol tank/vol00'
	'vdev:worker da0' -> 'vdev da0'
2009-08-23 11:33:46 +00:00
Pawel Jakub Dawidek
1869987e42 - Give minclsyspri and maxclsyspri real values (consulted with kmacy).
- Honour 'pri' argument for thread_create().
2009-08-23 11:22:46 +00:00
Pawel Jakub Dawidek
35ae9291c2 Our libc doesn't implement control method for XDR (only kernel does) and it
will always return failure. Fix this by bringing userland implementation of
xdrmem_control() back. This allow 'zpool import' to work again.

Reported by:	Thomas Backman <serenity@exscape.org>
Reviewed by:	kmacy
Approved by:	re (kib)
2009-08-20 00:05:29 +00:00
Pawel Jakub Dawidek
8461b0f043 Manage asynchronous vnode release just like Solaris.
Discussed with:	kmacy
Approved by:	re (kib)
2009-08-17 09:48:34 +00:00
Pawel Jakub Dawidek
159ef108e1 Remove OpenSolaris taskq port (it performs very poorly in our kernel) and
replace it with wrappers around our taskqueue(9).
To make it possible implement taskqueue_member() function which returns 1
if the given thread was created by the given taskqueue.

Approved by:	re (kib)
2009-08-17 09:01:20 +00:00
Pawel Jakub Dawidek
830940567b Remove files that are no longer used.
Discussed with:	kmacy
Approved by:	re (kib)
2009-08-17 08:03:02 +00:00
Edward Tomasz Napierala
abd370a36b Remove CDDL warning.
Approved by:	re (kib), core
2009-08-13 12:28:30 +00:00
Konstantin Belousov
f33a947b56 Add new msleep(9) flag PBDY that shall be specified together with
PCATCH, to indicate that thread shall not be stopped upon receipt of
SIGSTOP until it reaches the kernel->usermode boundary.

Also change thread_single(SINGLE_NO_EXIT) to only stop threads at
the user boundary unconditionally.

Tested by:	pho
Reviewed by:	jhb
Approved by:	re (kensmith)
2009-07-14 22:52:46 +00:00
Konstantin Belousov
a18a95db4a O_NOFOLLOW shall be in flags, not in cmode.
Noted by:	bde
2009-06-22 10:08:48 +00:00
Konstantin Belousov
e0c161b89c Add another flags argument to vn_open_cred. Use it to specify that some
vn_open_cred invocations shall not audit namei path.

In particular, specify VN_OPEN_NOAUDIT for dotdot lookup performed by
default implementation of vop_vptocnp, and for the open done for core
file. vn_fullpath is called from the audit code, and vn_open there need
to disable audit to avoid infinite recursion. Core file is created on
return to user mode, that, in particular, happens during syscall return.
The creation of the core file is audited by direct calls, and we do not
want to overwrite audit information for syscall.

Reported, reviewed and tested by: rwatson
2009-06-21 13:41:32 +00:00
Jamie Gritton
c1f192193d Rename the host-related prison fields to be the same as the host.*
parameters they represent, and the variables they replaced, instead of
abbreviated versions of them.

Approved by:	bz (mentor)
2009-06-13 15:39:12 +00:00
Jamie Gritton
76ca6f88da Place hostnames and similar information fully under the prison system.
The system hostname is now stored in prison0, and the global variable
"hostname" has been removed, as has the hostname_mtx mutex.  Jails may
have their own host information, or they may inherit it from the
parent/system.  The proper way to read the hostname is via
getcredhostname(), which will copy either the hostname associated with
the passed cred, or the system hostname if you pass NULL.  The system
hostname can still be accessed directly (and without locking) at
prison0.pr_host, but that should be avoided where possible.

The "similar information" referred to is domainname, hostid, and
hostuuid, which have also become prison parameters and had their
associated global variables removed.

Approved by:	bz (mentor)
2009-05-29 21:27:12 +00:00
Attilio Rao
1ae1c2a3bd Reverse the logic for ADAPTIVE_SX option and enable it by default.
Introduce for this operation the reverse NO_ADAPTIVE_SX option.
The flag SX_ADAPTIVESPIN to be passed to sx_init_flags(9) gets suppressed
and the new flag, offering the reversed logic, SX_NOADAPTIVE is added.

Additively implements adaptive spininning for sx held in shared mode.
The spinning limit can be handled through sysctls in order to be tuned
while the code doesn't reach the release, after which time they should
be dropped probabilly.

This change has made been necessary by recent benchmarks where it does
improve concurrency of workloads in presence of high contention
(ie. ZFS).

KPI breakage is documented by __FreeBSD_version bumping, manpage and
UPDATING updates.

Requested by:	jeff, kmacy
Reviewed by:	jeff
Tested by:	pho
2009-05-29 01:49:27 +00:00
Edward Tomasz Napierala
b7014134a7 Change license to more bori^Wadul^Wcanonical.
Submitted by:	rwatson@
2009-05-26 11:42:06 +00:00
Edward Tomasz Napierala
0970b4bae0 MFp4 changes neccessary for NFSv4 ACLs support in ZFS. This is mostly
about removing a few #ifdefs and providing compatibility wrappers and
VOP implementations to get and set an ACL; ZFS does ACL enforcement all
by itself.

Note that the VOPs are ifdefed out for now, so this change should be
a no-op.

Reviewed by:	pjd
2009-05-26 08:21:59 +00:00
Edward Tomasz Napierala
4076aa37dc Don't allow non-owner to set SUID bit on a file. It doesn't make
any difference now, but in NFSv4 ACLs, there is write_acl permission,
which also affects mode changes.

Reviewed by:	pjd
2009-05-24 19:21:49 +00:00
Kip Macy
2e9c90d55b enable adaptive spinning on zfs locks 2009-05-16 23:56:45 +00:00
Attilio Rao
dfd233edd5 Remove the thread argument from the FSD (File-System Dependent) parts of
the VFS.  Now all the VFS_* functions and relating parts don't want the
context as long as it always refers to curthread.

In some points, in particular when dealing with VOPs and functions living
in the same namespace (eg. vflush) which still need to be converted,
pass curthread explicitly in order to retain the old behaviour.
Such loose ends will be fixed ASAP.

While here fix a bug: now, UFS_EXTATTR can be compiled alone without the
UFS_EXTATTR_AUTOSTART option.

VFS KPI is heavilly changed by this commit so thirdy parts modules needs
to be recompiled.  Bump __FreeBSD_version in order to signal such
situation.
2009-05-11 15:33:26 +00:00
Marko Zec
29b02909eb Introduce a new virtualization container, provisionally named vprocg, to hold
virtualized instances of hostname and domainname, as well as a new top-level
virtualization struct vimage, which holds pointers to struct vnet and struct
vprocg.  Struct vprocg is likely to become replaced in the near future with
a new jail management API import.

As a consequence of this change, change struct ucred to point to a struct
vimage, instead of directly pointing to a vnet.

Merge vnet / vimage / ucred refcounting infrastructure from p4 / vimage
branch.

Permit kldload / kldunload operations to be executed only from the default
vimage context.

This change should have no functional impact on nooptions VIMAGE kernel
builds.

Reviewed by:	bz
Approved by:	julian (mentor)
2009-05-08 14:11:06 +00:00
Kip Macy
c20fd07777 move VN_RELE_ASYNC to the compatibility layer with the rest of the VN_* defines 2009-05-07 23:02:15 +00:00
Jamie Gritton
b38ff370e4 Introduce the extensible jail framework, using the same "name=value"
interface as nmount(2).  Three new system calls are added:
* jail_set, to create jails and change the parameters of existing jails.
  This replaces jail(2).
* jail_get, to read the parameters of existing jails.  This replaces the
  security.jail.list sysctl.
* jail_remove to kill off a jail's processes and remove the jail.
Most jail parameters may now be changed after creation, and jails may be
set to exist without any attached processes.  The current jail(2) system
call still exists, though it is now a stub to jail_set(2).

Approved by:	bz (mentor)
2009-04-29 21:14:15 +00:00
Robert Watson
885868cd8f Remove VOP_LEASE and supporting functions. This hasn't been used since
the removal of NQNFS, but was left in in case it was required for NFSv4.
Since our new NFSv4 client and server can't use it for their
requirements, GC the old mechanism, as well as other unused lease-
related code and interfaces.

Due to its impact on kernel programming and binary interfaces, this
change should not be MFC'd.

Proposed by:    jeff
Reviewed by:    jeff
Discussed with: rmacklem, zach loafman @ isilon
2009-04-10 10:52:19 +00:00
Jamie Gritton
f86bce5ed0 Extend the "vfsopt" mount options for more general use. Make struct
vfsopt and the vfs_buildopts function public, and add some new fields
to struct vfsopt (pos and seen), and new functions vfs_getopt_pos and
vfs_opterror.

Further extend the interface to allow reading options from the kernel
in addition to sending them to the kernel, with vfs_setopt and related
functions.

While this allows the "name=value" option interface to be used for more
than just FS mounts (planned use is for jails), it retains the current
"vfsopt" name and <sys/mount.h> requirement.

Approved by:	bz (mentor)
2009-03-02 23:26:30 +00:00