and CIFS file attributes as BSD stat(2) flags.
This work is intended to be compatible with ZFS, the Solaris CIFS
server's interaction with ZFS, somewhat compatible with MacOS X,
and of course compatible with Windows.
The Windows attributes that are implemented were chosen based on
the attributes that ZFS already supports.
The summary of the flags is as follows:
UF_SYSTEM: Command line name: "system" or "usystem"
ZFS name: XAT_SYSTEM, ZFS_SYSTEM
Windows: FILE_ATTRIBUTE_SYSTEM
This flag means that the file is used by the
operating system. FreeBSD does not enforce any
special handling when this flag is set.
UF_SPARSE: Command line name: "sparse" or "usparse"
ZFS name: XAT_SPARSE, ZFS_SPARSE
Windows: FILE_ATTRIBUTE_SPARSE_FILE
This flag means that the file is sparse. Although
ZFS may modify this in some situations, there is
not generally any special handling for this flag.
UF_OFFLINE: Command line name: "offline" or "uoffline"
ZFS name: XAT_OFFLINE, ZFS_OFFLINE
Windows: FILE_ATTRIBUTE_OFFLINE
This flag means that the file has been moved to
offline storage. FreeBSD does not have any special
handling for this flag.
UF_REPARSE: Command line name: "reparse" or "ureparse"
ZFS name: XAT_REPARSE, ZFS_REPARSE
Windows: FILE_ATTRIBUTE_REPARSE_POINT
This flag means that the file is a Windows reparse
point. ZFS has special handling code for reparse
points, but we don't currently have the other
supporting infrastructure for them.
UF_HIDDEN: Command line name: "hidden" or "uhidden"
ZFS name: XAT_HIDDEN, ZFS_HIDDEN
Windows: FILE_ATTRIBUTE_HIDDEN
This flag means that the file may be excluded from
a directory listing if the application honors it.
FreeBSD has no special handling for this flag.
The name and bit definition for UF_HIDDEN are
identical to the definition in MacOS X.
UF_READONLY: Command line name: "urdonly", "rdonly", "readonly"
ZFS name: XAT_READONLY, ZFS_READONLY
Windows: FILE_ATTRIBUTE_READONLY
This flag means that the file may not written or
appended, but its attributes may be changed.
ZFS currently enforces this flag, but Illumos
developers have discussed disabling enforcement.
The behavior of this flag is different than MacOS X.
MacOS X uses UF_IMMUTABLE to represent the DOS
readonly permission, but that flag has a stronger
meaning than the semantics of DOS readonly permissions.
UF_ARCHIVE: Command line name: "uarch", "uarchive"
ZFS_NAME: XAT_ARCHIVE, ZFS_ARCHIVE
Windows name: FILE_ATTRIBUTE_ARCHIVE
The UF_ARCHIVED flag means that the file has changed and
needs to be archived. The meaning is same as
the Windows FILE_ATTRIBUTE_ARCHIVE attribute, and
the ZFS XAT_ARCHIVE and ZFS_ARCHIVE attribute.
msdosfs and ZFS have special handling for this flag.
i.e. they will set it when the file changes.
sys/param.h: Bump __FreeBSD_version to 1000047 for the
addition of new stat(2) flags.
chflags.1: Document the new command line flag names
(e.g. "system", "hidden") available to the
user.
ls.1: Reference chflags(1) for a list of file flags
and their meanings.
strtofflags.c: Implement the mapping between the new
command line flag names and new stat(2)
flags.
chflags.2: Document all of the new stat(2) flags, and
explain the intended behavior in a little
more detail. Explain how they map to
Windows file attributes.
Different filesystems behave differently
with respect to flags, so warn the
application developer to take care when
using them.
zfs_vnops.c: Add support for getting and setting the
UF_ARCHIVE, UF_READONLY, UF_SYSTEM, UF_HIDDEN,
UF_REPARSE, UF_OFFLINE, and UF_SPARSE flags.
All of these flags are implemented using
attributes that ZFS already supports, so
the on-disk format has not changed.
ZFS currently doesn't allow setting the
UF_REPARSE flag, and we don't really have
the other infrastructure to support reparse
points.
msdosfs_denode.c,
msdosfs_vnops.c: Add support for getting and setting
UF_HIDDEN, UF_SYSTEM and UF_READONLY
in MSDOSFS.
It supported SF_ARCHIVED, but this has been
changed to be UF_ARCHIVE, which has the same
semantics as the DOS archive attribute instead
of inverse semantics like SF_ARCHIVED.
After discussion with Bruce Evans, change
several things in the msdosfs behavior:
Use UF_READONLY to indicate whether a file
is writeable instead of file permissions, but
don't actually enforce it.
Refuse to change attributes on the root
directory, because it is special in FAT
filesystems, but allow most other attribute
changes on directories.
Don't set the archive attribute on a directory
when its modification time is updated.
Windows and DOS don't set the archive attribute
in that scenario, so we are now bug-for-bug
compatible.
smbfs_node.c,
smbfs_vnops.c: Add support for UF_HIDDEN, UF_SYSTEM,
UF_READONLY and UF_ARCHIVE in SMBFS.
This is similar to changes that Apple has
made in their version of SMBFS (as of
smb-583.8, posted on opensource.apple.com),
but not quite the same.
We map SMB_FA_READONLY to UF_READONLY,
because UF_READONLY is intended to match
the semantics of the DOS readonly flag.
The MacOS X code maps both UF_IMMUTABLE
and SF_IMMUTABLE to SMB_FA_READONLY, but
the immutable flags have stronger meaning
than the DOS readonly bit.
stat.h: Add definitions for UF_SYSTEM, UF_SPARSE,
UF_OFFLINE, UF_REPARSE, UF_ARCHIVE, UF_READONLY
and UF_HIDDEN.
The definition of UF_HIDDEN is the same as
the MacOS X definition.
Add commented-out definitions of
UF_COMPRESSED and UF_TRACKED. They are
defined in MacOS X (as of 10.8.2), but we
do not implement them (yet).
ufs_vnops.c: Add support for getting and setting
UF_ARCHIVE, UF_HIDDEN, UF_OFFLINE, UF_READONLY,
UF_REPARSE, UF_SPARSE, and UF_SYSTEM in UFS.
Alphabetize the flags that are supported.
These new flags are only stored, UFS does
not take any action if the flag is set.
Sponsored by: Spectra Logic
Reviewed by: bde (earlier version)
system crash which happen after successfull fsync() return, the data
is accessible. For msdosfs, this means that FAT entries for the file
must be written.
Since we do not track the FAT blocks containing entries for the
current file, just do a sloppy sync of the devvp vnode for the mount,
which buffers, among other things, contain FAT blocks.
Simultaneously, for deupdat():
- optimize by clearing the modified flags before short-circuiting a
return, if the mount is read-only;
- only ignore the rest of the function for denode with DE_MODIFIED
flag clear when the waitfor argument is false. The directory buffer
for the entry might be of delayed write;
- microoptimize by comparing the updated directory entry with the
current block content;
- try to cluster the write, fall back to bawrite() if low on
resources.
Based on the submission by: bde
MFC after: 2 weeks
cluster_write() and cluster_wbuild() functions. The flags to be
allowed are a subset of the GB_* flags for getblk().
Sponsored by: The FreeBSD Foundation
Tested by: pho
a directory to a subdir of the root directory from somewhere else.
For all directory moves that change the parent directory, the dotdot
entry must be fixed up. For msdosfs, the root directory is magic for
non-FAT32. It is less magic for FAT32, but needs the same magic for
the dotdot fixup. It didn't have it.
Both chkdsk and fsck_msdosfs fix the corrupt directory entries with no
problems.
The fix is to use the same magic for dotdot in msdosfs_rename() as in
msdosfs_mkdir().
For msdosfs_mkdir(), document the magic. When writing the dotdot entry
in mkdir, use explicitly set pcl variable instead on relying on the
start cluster of the root directory typically has a value < 65536.
Submitted by: bde
MFC after: 1 week
Add the sysctl debug.iosize_max_clamp, enabled by default. Setting the
sysctl to zero allows to perform the SSIZE_MAX-sized i/o requests from
the usermode.
Discussed with: bde, das (previous versions)
MFC after: 1 month
a single directory entry. As a consequnce, name cache purge done by lookup
for fvp when DELETE op for namei is specified, might be not enough to
expunge all namecache entries that were installed for this direntry.
Explicitely call cache_purge(fvp) when msdosfs_rename() succeeded.
PR: kern/93634
MFC after: 1 week
- Update bpb structs with reserved fields.
- In direntry struct join deName with deExtension. Although a
fix was attempted in the past, these fields were being overflowed,
Now this is consistent with the spec, and we can now share the
WinChksum code with NetBSD.
Submitted by: Pedro F. Giffuni <giffunip tutopia com>
Mostly obtained from: NetBSD
Reviewed by: bde
MFC after: 2 weeks
anything other than 0. Make it so. This fixes
"panic: VOP_STRATEGY failed bp=0xc320dd90 vp=0xc3b9f648",
encountered when writing to an orphaned filesystem. Reason
for the panic was the following assert:
KASSERT(i == 0, ("VOP_STRATEGY failed bp=%p vp=%p", bp, bp->b_vp));
at vfs_bio:bufstrategy().
Reviewed by: scottl, phk
Approved by: rwatson (mentor)
Sponsored by: FreeBSD Foundation
to add more V* constants, and the variables changed by this patch were often
being assigned to mode_t variables, which is 16 bit.
Approved by: rwatson (mentor)
NODEV is more appropriate when va_rdev doesn't have a meaningful value.
Submitted by: Jaakko Heinonen <jh saunalahti fi>
Suggested by: bde
Discussed on: freebsd-fs
MFC after: 1 month
filesystem-specific vnode data to the struct vnode. Provide the
default implementation for the vop_advlock and vop_advlockasync.
Purge the locks on the vnode reclaim by using the lf_purgelocks().
The default implementation is augmented for the nfs and smbfs.
In the nfs_advlock, push the Giant inside the nfs_dolock.
Before the change, the vop_advlock and vop_advlockasync have taken the
unlocked vnode and dereferenced the fs-private inode data, racing with
with the vnode reclamation due to forced unmount. Now, the vop_getattr
under the shared vnode lock is used to obtain the inode size, and
later, in the lf_advlockasync, after locking the vnode interlock, the
VI_DOOMED flag is checked to prevent an operation on the doomed vnode.
The implementation of the lf_purgelocks() is submitted by dfr.
Reported by: kris
Tested by: kris, pho
Discussed with: jeff, dfr
MFC after: 2 weeks
user-mode lock manager, build a kernel with the NFSLOCKD option and
add '-k' to 'rpc_lockd_flags' in rc.conf.
Highlights include:
* Thread-safe kernel RPC client - many threads can use the same RPC
client handle safely with replies being de-multiplexed at the socket
upcall (typically driven directly by the NIC interrupt) and handed
off to whichever thread matches the reply. For UDP sockets, many RPC
clients can share the same socket. This allows the use of a single
privileged UDP port number to talk to an arbitrary number of remote
hosts.
* Single-threaded kernel RPC server. Adding support for multi-threaded
server would be relatively straightforward and would follow
approximately the Solaris KPI. A single thread should be sufficient
for the NLM since it should rarely block in normal operation.
* Kernel mode NLM server supporting cancel requests and granted
callbacks. I've tested the NLM server reasonably extensively - it
passes both my own tests and the NFS Connectathon locking tests
running on Solaris, Mac OS X and Ubuntu Linux.
* Userland NLM client supported. While the NLM server doesn't have
support for the local NFS client's locking needs, it does have to
field async replies and granted callbacks from remote NLMs that the
local client has contacted. We relay these replies to the userland
rpc.lockd over a local domain RPC socket.
* Robust deadlock detection for the local lock manager. In particular
it will detect deadlocks caused by a lock request that covers more
than one blocking request. As required by the NLM protocol, all
deadlock detection happens synchronously - a user is guaranteed that
if a lock request isn't rejected immediately, the lock will
eventually be granted. The old system allowed for a 'deferred
deadlock' condition where a blocked lock request could wake up and
find that some other deadlock-causing lock owner had beaten them to
the lock.
* Since both local and remote locks are managed by the same kernel
locking code, local and remote processes can safely use file locks
for mutual exclusion. Local processes have no fairness advantage
compared to remote processes when contending to lock a region that
has just been unlocked - the local lock manager enforces a strict
first-come first-served model for both local and remote lockers.
Sponsored by: Isilon Systems
PR: 95247 107555 115524 116679
MFC after: 2 weeks
conjuction with 'thread' argument passing which is always curthread.
Remove the unuseful extra-argument and pass explicitly curthread to lower
layer functions, when necessary.
KPI results broken by this change, which should affect several ports, so
version bumping and manpage update will be further committed.
Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>
Remove this argument and pass curthread directly to underlying
VOP_LOCK1() VFS method. This modify makes the code cleaner and in
particular remove an annoying dependence helping next lockmgr() cleanup.
KPI results, obviously, changed.
Manpage and FreeBSD_version will be updated through further commits.
As a side note, would be valuable to say that next commits will address
a similar cleanup about VFS methods, in particular vop_lock1 and
vop_unlock.
Tested by: Diego Sardina <siarodx at gmail dot com>,
Andrea Di Pasquale <whyx dot it at gmail dot com>
This is much simpler than for ffs since there are many fewer places
where we need to choose between a delayed write and a sync write --
just 5 in msdosfs and more than 30 in ffs.
This is more complete and correct than in ffs. Several places in ffs
are are still missing the choice. ffs_update() has a layering violation
that breaks callers which want to force a sync update (mainly fsync(2)
and O_SYNC write(2)).
However, fsync(2) and O_SYNC write(2) are still more broken than in
ffs, since they are broken for default (non-sync non-async) mounts
too. Both fail to sync the FAT in all cases, and both fail to sync
the directory entry in some cases after losing a race. Async everything
is probably safer than the half-baked sync of metadata given by default
mounts.
(except indirectly for the size pseudo-attribute). If anything deserves
a sync update, then it is ids and immutable flags, since these are
related to security, but ffs never synced these and msdosfs doesn't
support them. (ufs_setattr() only does an update in one case where
it is least needed (for timestamps); it did pessimal sync updates for
timestamps until 1998/03/08 but was changed for unlogged reasons related
to soft updates.)
Now msdosfs calls deupdat() with waitfor == 0, which normally gives a
delayed update to disk but always gives a sync update of timestamps
in core, while for ffs everything is delayed until the syncer daemon
or other activity causes an update (except for timestamps).
This gives a large optimization mainly for things like cp -p, where
attribute adjustment could easily triple the number of physical I/O's
if it is done synchronously (but cp -p to msdosfs is not as bad as
that, since msdosfs doesn't support many attributes so null adjustments
are more common, and msdosfs doesn't support ctimes so even if cp
doesn't weed out null adjustments they don't become non-null after
clobbering the ctime).
can easily block in bread(), and then there was nothing to prevent the
static buffer (nambuf_{ptr,len,last_id}) being clobbered by another
thread.
The effects of the bug seem to have been limited to failed lookups and
mangled names in readdir(), since Giant locking provides enough
serialization to prevent concurrent calls to the functions that access
the buffer. They were very obvious for multiple concurrent tree walks,
especially with a small cluster size.
The bug was introduced in msdosfs_conv.c 1.34 and associated changes,
and is in all releases starting with 5.2.
The fix is to allocate the buffer as a local variable and pass around
pointers to it like "_r" functions in libc do. Stack use from this
is large but not too large. This also fixes a memory leak on module
unload.
Reviewed by: kib
Approved by: re (kensmith)
(uio_offset < 0) since this can't happen. If this happens, then the
general code handles the problem safely (better than before for reading,
returning 0 (EOF) instead of the bogus errno EINVAL, and the same as
before for writing, returning EFBIG).
In msdosfs_read(), don't check for (uio_resid < 0). msdosfs_write()
already didn't check.
In msdosfs_read(), document in a comment our assumptions that the caller
passed a valid uio_offset and uio_resid. ffs checks using KASSERT(),
and that is enough sanity checking. In the same comment, partly document
there is no need to check for the EOVERFLOW case, unlike in ffs where this
case can happen at least in theory.
In msdosfs_write(), add a comment about why the checking of
(uio_resid == 0) is explicit, unlike in ffs.
In msdosfs_write(), check for impossibly large final offsets before
checking if the file size rlimit would be exceeded, so that we don't
have an overflow bug in the rlimit check and are consistent with ffs.
We now return EFBIG instead of EFBIG plus a SIGXFSZ signal if the final
offset would be impossibly large but not so large as to cause overflow.
Overflow normally gave the benign behaviour of no signal.
Approved by: re (kensmith) (blanket)
This gives a very large speedup for small block sizes (in my tests,
about 5 times for write and 3 times for read with a block size of 512,
if clustering is possible) and a moderate speedup for the moderatatly
large block sizes that should be used on non-small media (4K is the
best size in most cases, and the speedup for that is about 1.3 times
for write and 1.2 times for read). mmap() should benefit from clustering
like read()/write(), but the current implementation of vm only supports
clustering (at least for getpages) if the fs block size is >= PAGE SIZE.
msdosfs is now only slightly slower than ffs with soft updates for
writing and slightly faster for reading when both use their best block
sizes. Writing is slower for msdosfs because of more sync writes.
Reading is faster for msdosfs because indirect blocks interfere with
clustering in ffs.
The changes in msdosfs_read() and msdosfs_write() are simpler merges
of corresponding code in ffs (after fixing some style bugs in ffs).
msdosfs_bmap() needs fs-specific code. This implementation loops
calling a lower level bmap function to do the hard parts. This is a
bit inefficient, but is efficient enough since msdsfs_bmap() is only
called when there is physical i/o to do.
Approved by: re (hrs)
In msdosfs_read(), mainly reorder the main loop to the same order as in
ffs_read().
In msdosfs_write() and extendfile(), use vfs_bio_clrbuf() instead of
clrbuf(). I think this just just a bogus optimization, but ffs always
does it and msdosfs already did it in one place, and it is what I've
tested.
In msdosfs_write(), merge good bits from a comment in ffs_write(), and
fix 1 style bug.
In the main comment for msdosfs_pcbmap(), improve wording and catch
up with 13 years of changes in the function. This comment belongs in
VOP_BMAP.9 but that doesn't exist.
In msdosfs_bmap(), return EFBIG if the requested cluster number is out
of bounds instead of blindly truncating it, and fix many style bugs.
Approved by: re (hrs)
some cases, move to priv_check() if it was an operation on a thread and
no other flags were present.
Eliminate caller-side jail exception checking (also now-unused); jail
privilege exception code now goes solely in kern_jail.c.
We can't yet eliminate suser() due to some cases in the KAME code where
a privilege check is performed and then used in many different deferred
paths. Do, however, move those prototypes to priv.h.
Reviewed by: csjp
Obtained from: TrustedBSD Project
This way we may support multiple structures in v_data vnode field within
one file system without using black magic.
Vnode-to-file-handle should be VOP in the first place, but was made VFS
operation to keep interface as compatible as possible with SUN's VFS.
BTW. Now Solaris also implements vnode-to-file-handle as VOP operation.
VFS_VPTOFH() was left for API backward compatibility, but is marked for
removal before 8.0-RELEASE.
Approved by: mckusick
Discussed with: many (on IRC)
Tested with: ufs, msdosfs, cd9660, nullfs and zfs
#ifdef MSDOSFS_LARGE to run-time checks to see if "-o large" was specified.
Test case provided by Oliver Fromme:
truncate -s 200G test.img
mdconfig -a -t vnode -f test.img -u 9
newfs_msdos -s 419430400 -n 1 /dev/md9 zip250
mount -t msdosfs /dev/md9 /mnt # should fail
mount -t msdosfs -o large /dev/md9 /mnt # should succeed
PR: 105964
Requested by: Oliver Fromme <olli lurza secnetix de>
Tested by: trhodes
MFC after: 2 weeks
set birthtime to FAT CTime (creation time) and in the other cases
set birthtime to -1.
o Set ctime to mtime instead of FAT CTime which has completely
different meaning.
PR: kern/106018
Submitted by: Oliver Fromme
MFC after: 1 month
specific privilege names to a broad range of privileges. These may
require some future tweaking.
Sponsored by: nCircle Network Security, Inc.
Obtained from: TrustedBSD Project
Discussed on: arch@
Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri,
Alex Lyashkov <umka at sevcity dot net>,
Skip Ford <skip dot ford at verizon dot net>,
Antoine Brodin <antoine dot brodin at laposte dot net>
directory. vrele() may lock the passed vnode, which in these cases would
give an invalid lock order of child -> parent. These situations are
deadlock prone although do not typically deadlock because the vrele
is typically not releasing the last reference to the vnode. Users of
vrele must consider it as a call to vn_lock() and order it appropriately.
MFC After: 1 week
Sponsored by: Isilon Systems, Inc.
Tested by: kkenn