Commit Graph

1012 Commits

Author SHA1 Message Date
Jeff Roberson
8823f1b6db - Don't use the interlock to protect v_writecount. 2002-09-25 02:44:55 +00:00
Poul-Henning Kamp
cf09d67418 We don't need to #include <sys/disklabel.h>.
We don't need to #include <sys/disklabel.h> second time either.

Sponsored by:	DARPA & NAI Labs.
2002-09-20 16:42:33 +00:00
Don Lewis
fa288043e2 VOP_FSYNC() requires that it's vnode argument be locked, which nfs_link()
wasn't doing.  Rather than just lock and unlock the vnode around the call
to VOP_FSYNC(), implement rwatson's suggestion to lock the file vnode
in kern_link() before calling VOP_LINK(), since the other filesystems
also locked the file vnode right away in their link methods.  Remove the
locking and and unlocking from the leaf filesystem link methods.

Reviewed by:	rwatson, bde  (except for the unionfs_link() changes)
2002-09-19 13:32:45 +00:00
David E. O'Brien
47a561263d intmax_t is printed with %jd, not %lld. 2002-09-19 03:55:30 +00:00
Nate Lawson
86ed6d45ac Remove any VOP_PRINT that redundantly prints the tag.
Move lockmgr_printinfo() into vprint() for everyone's benefit.

Suggested by: bde
2002-09-18 20:42:04 +00:00
Nate Lawson
06be2aaa83 Remove all use of vnode->v_tag, replacing with appropriate substitutes.
v_tag is now const char * and should only be used for debugging.

Additionally:
1. All users of VT_NTS now check vfsconf->vf_type VFCF_NETWORK
2. The user of VT_PROCFS now checks for the new flag VV_PROCDEP, which
is propagated by pseudofs to all child vnodes if the fs sets PFS_PROCDEP.

Suggested by:   phk
Reviewed by:    bde, rwatson (earlier version)
2002-09-14 09:02:28 +00:00
Bruce Evans
d3a7b5e70e vfs_syscalls.c:
Changed rename(2) to follow the letter of the POSIX spec.  POSIX
requires rename() to have no effect if its args "resolve to the same
existing file".  I think "file" can only reasonably be read as referring
to the inode, although the rationale and "resolve" seem to say that
sameness is at the level of (resolved) directory entries.

ext2fs_vnops.c, ufs_vnops.c:
Replaced code that gave the historical BSD behaviour of removing one
link name by checks that this code is now unreachable.  This fixes
some races.  All vnodes needed to be unlocked for the removal, and
locking at another level using something like IN_RENAME was not even
attempted, so it was possible for rename(x, y) to return with both x
and y removed even without any unlink(2) syscalls (one process can
remove x using rename(x, y) and another process can remove y using
rename(y, x)).

Prodded by:	alfred
MFC after:	8 weeks
PR:		42617
2002-09-10 11:09:13 +00:00
Poul-Henning Kamp
0e168822b2 Implement the VOP_OPENEXTATTR() and VOP_CLOSEEXTATTR() methods.
Use extattr_check_cred() to check access to EAs.

This is still a WIP.

Sponsored by:   DARPA & NAI Labs.
2002-09-05 20:59:42 +00:00
Poul-Henning Kamp
190a4963d0 Use canonical extattr_check_cred() instead of private implementation of the
same policy.

Sponsored by:	DARPA & NAI Labs.
2002-09-05 20:39:36 +00:00
Poul-Henning Kamp
04205dc4be Fix credentials check: do not leak ENOATTR until we know if they're
supposed to know.

Sponsored by:	DARPA & NAI Labs.
2002-09-05 20:28:24 +00:00
Bruce Evans
8f767abf71 Include <sys/malloc.h> instead of depending on namespace pollution 2
layers deep in <sys/proc.h> or <sys/vnode.h>.

Include <sys/vmmeter.h> instead of depending on namespace pollution in
<sys/pcpu.h>.

Sorted includes as much as possible.
2002-09-05 09:43:24 +00:00
Robert Watson
2fc6567e9a Since we have vp and td cached in local variables, use those instead
of derefencing the VOP arguments again when calling the UFS code.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-09-01 16:06:40 +00:00
Poul-Henning Kamp
d0e9b8dbc4 Correctly handle setting, getting and deleting EA's with zero length content.
Sponsored by:	DARPA & NAI Labs.
2002-08-30 08:57:09 +00:00
Philippe Charnier
93b0017f88 Replace various spelling with FALLTHROUGH which is lint()able 2002-08-25 13:23:09 +00:00
Alan Cox
fff6062ab6 o Retire vm_page_zero_fill() and vm_page_zero_fill_area(). Ever since
pmap_zero_page() and pmap_zero_page_area() were modified to accept
   a struct vm_page * instead of a physical address, vm_page_zero_fill()
   and vm_page_zero_fill_area() have served no purpose.
2002-08-25 00:22:31 +00:00
Poul-Henning Kamp
7428de69d2 Implement list of EA return functionality.
Correctly delete EA's when the content length is set to zero.

Sponsored by:	DARPA & NAI Labs.
2002-08-20 11:34:58 +00:00
Poul-Henning Kamp
0176455bc8 First snapshot of UFS2 EA support.
Sponsored by: DARPA & NAI Labs.
2002-08-19 07:01:55 +00:00
Robert Watson
9ca435893b In order to better support flexible and extensible access control,
make a series of modifications to the credential arguments relating
to file read and write operations to cliarfy which credential is
used for what:

- Change fo_read() and fo_write() to accept "active_cred" instead of
  "cred", and change the semantics of consumers of fo_read() and
  fo_write() to pass the active credential of the thread requesting
  an operation rather than the cached file cred.  The cached file
  cred is still available in fo_read() and fo_write() consumers
  via fp->f_cred.  These changes largely in sys_generic.c.

For each implementation of fo_read() and fo_write(), update cred
usage to reflect this change and maintain current semantics:

- badfo_readwrite() unchanged
- kqueue_read/write() unchanged
  pipe_read/write() now authorize MAC using active_cred rather
  than td->td_ucred
- soo_read/write() unchanged
- vn_read/write() now authorize MAC using active_cred but
  VOP_READ/WRITE() with fp->f_cred

Modify vn_rdwr() to accept two credential arguments instead of a
single credential: active_cred and file_cred.  Use active_cred
for MAC authorization, and select a credential for use in
VOP_READ/WRITE() based on whether file_cred is NULL or not.  If
file_cred is provided, authorize the VOP using that cred,
otherwise the active credential, matching current semantics.

Modify current vn_rdwr() consumers to pass a file_cred if used
in the context of a struct file, and to always pass active_cred.
When vn_rdwr() is used without a file_cred, pass NOCRED.

These changes should maintain current semantics for read/write,
but avoid a redundant passing of fp->f_cred, as well as making
it more clear what the origin of each credential is in file
descriptor read/write operations.

Follow-up commits will make similar changes to other file descriptor
operations, and modify the MAC framework to pass both credentials
to MAC policy modules so they can implement either semantic for
revocation.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-15 20:55:08 +00:00
Poul-Henning Kamp
18280bc653 Expand the arguments to ffs_ext{read,write}() to their component
parts rather than use vop_{read,write}_args.  Access to these
functions will ultimately not be available through the
"vop_{read,write}+IO_EXT" API but this functionality is retained
for debugging purposes for now.

Sponsored by: DARPA & NAI Labs.
2002-08-13 11:33:01 +00:00
Poul-Henning Kamp
d6fe88e475 Unravel the UFS_EXTATTR incest between FFS and UFS: UFS_EXTATTR is an
UFS only thing, and FFS should in principle not know if it is enabled
or not.

This commit cleans ffs_vnops.c for such knowledge, but not ffs_vfsops.c

Sponsored by: DARPA and NAI Labs.
2002-08-13 10:33:57 +00:00
Poul-Henning Kamp
9bf1a75697 Introduce typedefs for the member functions of struct vfsops and employ
these in the main filesystems.  This does not change the resulting code
but makes the source a little bit more grepable.

Sponsored by:	DARPA and NAI Labs.
2002-08-13 10:05:50 +00:00
Robert Watson
c08b677fb5 Pass IO_NOMACCHECK to vn_rdwr() in the following checks to prevent
enforcement of MAC policy on the read or write operations:

- In ext2fs, don't enforce MAC on loop-back reads and writes supporting
  directory read operations in lookup(), directory modifications in
  rename(), directory write operations in mkdir(), symlink write
  operations in symlink().

- In the NFS client locking code, perform vn_rdwr() on the NFS locking
  socket without enforcing MAC, since the write is done on behalf of
  the kernel NFS implementation rather than the user process.

- In UFS, don't enforce MAC on loop-back reads and writes supporting
  directory read operations in lookup(), and symlink write operations
  in symlink().

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-12 16:43:04 +00:00
Poul-Henning Kamp
e179b40f14 Stop pretending that the FFS file ufs_readwrite.c is a UFS file.
Instead of #including it, pull it into ffs_vnops.c and name things
correctly.

Sponsored by:	DARPA & NAI Labs.
2002-08-12 10:32:56 +00:00
Poul-Henning Kamp
851da5d6cf Fix a comment. 2002-08-12 09:22:11 +00:00
Ian Dowse
98caa2e4e9 Don't call softdep_slowdown() if soft updates are not active on the
filesystem. This causes a panic for kernels compiled without
softupdates.

Reported by:	luigi
2002-08-05 17:59:20 +00:00
Jeff Roberson
e6e370a7fe - Replace v_flag with v_iflag and v_vflag
- v_vflag is protected by the vnode lock and is used when synchronization
   with VOP calls is needed.
 - v_iflag is protected by interlock and is used for dealing with vnode
   management issues.  These flags include X/O LOCK, FREE, DOOMED, etc.
 - All accesses to v_iflag and v_vflag have either been locked or marked with
   mp_fixme's.
 - Many ASSERT_VOP_LOCKED calls have been added where the locking was not
   clear.
 - Many functions in vfs_subr.c were restructured to provide for stronger
   locking.

Idea stolen from:	BSD/OS
2002-08-04 10:29:36 +00:00
Robert Watson
af05e056ec Introduce support for Mandatory Access Control and extensible
kernel access control.

Instrument UFS to support per-inode MAC labels.  In particular,
invoke MAC framework entry points for generically supporting the
backing of MAC labels into extended attributes.  This ends up
introducing new vnode operation vector entries point at the MAC
framework entry points, as well as some explicit entry point
invocations for file and directory creation events so that the
MAC framework can push labels to disk before the directory names
become persistent (this will work better once EAs in UFS2 are
hooked into soft updates).  The generic EA MAC entry points
support executing with the file system in either single label
or multilabel operation, and will fall back to the mount label
if multilabel is not specified at mount-time.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-07-31 16:05:30 +00:00
Poul-Henning Kamp
c3a0d1d4e1 I forgot this bit of uglyness in the fsck_ffs cleanup. 2002-07-31 07:01:18 +00:00
Poul-Henning Kamp
9fbc6a330d Fix braino in last commit. 2002-07-30 12:02:41 +00:00
Poul-Henning Kamp
17b1994bbe Move ffs_isfreeblock() to ffs_alloc.c and make it static.
Sponsored by: DARPA & NAI Labs.
2002-07-30 11:54:48 +00:00
Alan Cox
1e8fabc097 Lock page queue accesses by vm_page_free(). 2002-07-28 08:01:48 +00:00
Benno Rice
683eac8dbb Add a missing argument to the stub for softdep_setup_freeblocks.
Forgotten by:	mckusick
2002-07-20 04:07:15 +00:00
Peter Wemm
382f95d332 Fix a warning:
ffs_softdep.c:1630: warning: int format, different type arg (arg 2)
2002-07-20 01:09:35 +00:00
Kirk McKusick
7aca6291e3 Add support to UFS2 to provide storage for extended attributes.
As this code is not actually used by any of the existing
interfaces, it seems unlikely to break anything (famous
last words).

The internal kernel interface to manipulate these attributes
is invoked using two new IO_ flags: IO_NORMAL and IO_EXT.
These flags may be specified in the ioflags word of VOP_READ,
VOP_WRITE, and VOP_TRUNCATE. Specifying IO_NORMAL means that
you want to do I/O to the normal data part of the file and
IO_EXT means that you want to do I/O to the extended attributes
part of the file. IO_NORMAL and IO_EXT are mutually exclusive
for VOP_READ and VOP_WRITE, but may be specified individually
or together in the case of VOP_TRUNCATE. For example, when
removing a file, VOP_TRUNCATE is called with both IO_NORMAL
and IO_EXT set. For backward compatibility, if neither IO_NORMAL
nor IO_EXT is set, then IO_NORMAL is assumed.

Note that the BA_ and IO_ flags have been `merged' so that they
may both be used in the same flags word. This merger is possible
by assigning the IO_ flags to the low sixteen bits and the BA_
flags the high sixteen bits. This works because the high sixteen
bits of the IO_ word is reserved for read-ahead and help with
write clustering so will never be used for flags. This merge
lets us get away from code of the form:

        if (ioflags & IO_SYNC)
                flags |= BA_SYNC;

For the future, I have considered adding a new field to the
vattr structure, va_extsize. This addition could then be
exported through the stat structure to allow applications to
find out the size of the extended attribute storage and also
would provide a more standard interface for truncating them
(via VOP_SETATTR rather than VOP_TRUNCATE).

I am also contemplating adding a pathconf parameter (for
concreteness, lets call it _PC_MAX_EXTSIZE) which would
let an application determine the maximum size of the extended
atribute storage.

Sponsored by:	DARPA & NAI Labs.
2002-07-19 07:29:39 +00:00
Kirk McKusick
fb36a3d847 Change utimes to set the file creation time (for filesystems that
support creation times such as UFS2) to the value of the
modification time if the value of the modification time is older
than the current creation time. See utimes(2) for further details.

Sponsored by:	DARPA & NAI Labs.
2002-07-17 02:03:19 +00:00
Kirk McKusick
faab4e2722 Change the name of st_createtime to st_birthtime. This change is
made to reduce confusion between st_ctime and st_createtime.

Submitted by:	Eric Allman <eric@sendmail.org>
Sponsored by:	DARPA & NAI Labs.
2002-07-16 22:36:00 +00:00
Tom Rhodes
ae76f60046 Fix a type: s/your are/you are/ 2002-07-12 19:56:31 +00:00
Bruce Evans
2daf9dc825 Fixed some printf format errors (4 new ones reported by gcc and 5 nearby
old ones not reported by gcc).  This helps unbreak LINT.
2002-07-08 12:42:29 +00:00
Ian Dowse
6bd521df93 Use indirect function pointer hooks instead of #ifdef SOFTUPDATES
direct calls for the two places where the kernel calls into soft
updates code. Set up the hooks in softdep_initialize() and NULL
them out in softdep_uninitialize(). This change allows soft updates
to function correctly when ufs is loaded as a module.

Reviewed by:	mckusick
2002-07-01 17:59:40 +00:00
Ian Dowse
5346934fe7 Add the ffs bits necessary to support unloading of the ufs kernel
module. This adds an ffs_uninit() function that calls ufs_uninit()
and also calls a new softdep_uninitialize() function. Add a stub
for softdep_uninitialize() to cover the non-SOFTUPDATES case.

Reviewed by:	mckusick
2002-07-01 11:00:47 +00:00
Ian Dowse
3423b21c09 Remove the bogus SYSINIT from ufs_dirhash.c and instead add a call
to ufsdirhash_init() from ufs_init(). Add uninit() functions
corresponding the ufs, dirhash, quota and ihash init() functions.
2002-06-30 02:49:39 +00:00
Ian Dowse
8f42fb8fc9 Remove the kernel file-size limit for UFS2, so that only the limit
imposed by the filesystem structure itself remains. With 16k blocks,
the maximum file size is now just over 128TB.

For now, the UFS1 file size limit is left unchanged so as to remain
consistent with RELENG_4, but it too could be removed in the future.

Reviewed by:	mckusick
2002-06-26 18:34:51 +00:00
Kenneth D. Merry
98cb733c67 At long last, commit the zero copy sockets code.
MAKEDEV:	Add MAKEDEV glue for the ti(4) device nodes.

ti.4:		Update the ti(4) man page to include information on the
		TI_JUMBO_HDRSPLIT and TI_PRIVATE_JUMBOS kernel options,
		and also include information about the new character
		device interface and the associated ioctls.

man9/Makefile:	Add jumbo.9 and zero_copy.9 man pages and associated
		links.

jumbo.9:	New man page describing the jumbo buffer allocator
		interface and operation.

zero_copy.9:	New man page describing the general characteristics of
		the zero copy send and receive code, and what an
		application author should do to take advantage of the
		zero copy functionality.

NOTES:		Add entries for ZERO_COPY_SOCKETS, TI_PRIVATE_JUMBOS,
		TI_JUMBO_HDRSPLIT, MSIZE, and MCLSHIFT.

conf/files:	Add uipc_jumbo.c and uipc_cow.c.

conf/options:	Add the 5 options mentioned above.

kern_subr.c:	Receive side zero copy implementation.  This takes
		"disposable" pages attached to an mbuf, gives them to
		a user process, and then recycles the user's page.
		This is only active when ZERO_COPY_SOCKETS is turned on
		and the kern.ipc.zero_copy.receive sysctl variable is
		set to 1.

uipc_cow.c:	Send side zero copy functions.  Takes a page written
		by the user and maps it copy on write and assigns it
		kernel virtual address space.  Removes copy on write
		mapping once the buffer has been freed by the network
		stack.

uipc_jumbo.c:	Jumbo disposable page allocator code.  This allocates
		(optionally) disposable pages for network drivers that
		want to give the user the option of doing zero copy
		receive.

uipc_socket.c:	Add kern.ipc.zero_copy.{send,receive} sysctls that are
		enabled if ZERO_COPY_SOCKETS is turned on.

		Add zero copy send support to sosend() -- pages get
		mapped into the kernel instead of getting copied if
		they meet size and alignment restrictions.

uipc_syscalls.c:Un-staticize some of the sf* functions so that they
		can be used elsewhere.  (uipc_cow.c)

if_media.c:	In the SIOCGIFMEDIA ioctl in ifmedia_ioctl(), avoid
		calling malloc() with M_WAITOK.  Return an error if
		the M_NOWAIT malloc fails.

		The ti(4) driver and the wi(4) driver, at least, call
		this with a mutex held.  This causes witness warnings
		for 'ifconfig -a' with a wi(4) or ti(4) board in the
		system.  (I've only verified for ti(4)).

ip_output.c:	Fragment large datagrams so that each segment contains
		a multiple of PAGE_SIZE amount of data plus headers.
		This allows the receiver to potentially do page
		flipping on receives.

if_ti.c:	Add zero copy receive support to the ti(4) driver.  If
		TI_PRIVATE_JUMBOS is not defined, it now uses the
		jumbo(9) buffer allocator for jumbo receive buffers.

		Add a new character device interface for the ti(4)
		driver for the new debugging interface.  This allows
		(a patched version of) gdb to talk to the Tigon board
		and debug the firmware.  There are also a few additional
		debugging ioctls available through this interface.

		Add header splitting support to the ti(4) driver.

		Tweak some of the default interrupt coalescing
		parameters to more useful defaults.

		Add hooks for supporting transmit flow control, but
		leave it turned off with a comment describing why it
		is turned off.

if_tireg.h:	Change the firmware rev to 12.4.11, since we're really
		at 12.4.11 plus fixes from 12.4.13.

		Add defines needed for debugging.

		Remove the ti_stats structure, it is now defined in
		sys/tiio.h.

ti_fw.h:	12.4.11 firmware.

ti_fw2.h:	12.4.11 firmware, plus selected fixes from 12.4.13,
		and my header splitting patches.  Revision 12.4.13
		doesn't handle 10/100 negotiation properly.  (This
		firmware is the same as what was in the tree previously,
		with the addition of header splitting support.)

sys/jumbo.h:	Jumbo buffer allocator interface.

sys/mbuf.h:	Add a new external mbuf type, EXT_DISPOSABLE, to
		indicate that the payload buffer can be thrown away /
		flipped to a userland process.

socketvar.h:	Add prototype for socow_setup.

tiio.h:		ioctl interface to the character portion of the ti(4)
		driver, plus associated structure/type definitions.

uio.h:		Change prototype for uiomoveco() so that we'll know
		whether the source page is disposable.

ufs_readwrite.c:Update for new prototype of uiomoveco().

vm_fault.c:	In vm_fault(), check to see whether we need to do a page
		based copy on write fault.

vm_object.c:	Add a new function, vm_object_allocate_wait().  This
		does the same thing that vm_object allocate does, except
		that it gives the caller the opportunity to specify whether
		it should wait on the uma_zalloc() of the object structre.

		This allows vm objects to be allocated while holding a
		mutex.  (Without generating WITNESS warnings.)

		vm_object_allocate() is implemented as a call to
		vm_object_allocate_wait() with the malloc flag set to
		M_WAITOK.

vm_object.h:	Add prototype for vm_object_allocate_wait().

vm_page.c:	Add page-based copy on write setup, clear and fault
		routines.

vm_page.h:	Add page based COW function prototypes and variable in
		the vm_page structure.

Many thanks to Drew Gallatin, who wrote the zero copy send and receive
code, and to all the other folks who have tested and reviewed this code
over the years.
2002-06-26 03:37:47 +00:00
Kirk McKusick
a7d50c22a6 Force the quota update to be done when an inode is released in
ufs_inactive. This avoid a panic when checking a NULL credential
in suser_cred().
2002-06-25 01:02:28 +00:00
Jonathan Lemon
c86c4abf99 Prototype fixes (long newinum --> ino_t newinum). 2002-06-24 17:20:19 +00:00
Maxime Henrion
cfbf0a4678 Warning fixes for 64 bits platforms. This eliminates all the
warnings I have had in the FFS code on sparc64.

Reviewed by:	mckusick
2002-06-23 18:17:27 +00:00
Matthew Dillon
10cfbc1978 Rename the BALLOC flags from B_* to BA_* to avoid confusion with the
struct buf B_ flags.

Approved by:	mckusick
2002-06-23 06:12:22 +00:00
Kirk McKusick
5006e77609 This patch fixes a problem whereby filesystems that ran
out of inodes in a cylinder group would fail to check for
free inodes in other cylinder groups. This bug was introduced
in the UFS2 code merge two days ago.

An inode is allocated by calling ffs_valloc which calls
ffs_hashalloc to do the filesystem scan. Ffs_hashalloc
walks around the cylinder groups calling its passed allocator
(ffs_nodealloccg in this case) until the allocator returns a
non-zero result. The bug is that ffs_hashalloc expects the
passed allocator function to return a 64-bit ufs2_daddr_t.
When allocating inodes, it calls ffs_nodealloccg which was
returning a 32-bit ino_t. The ffs_hashalloc code checked
a 64-bit return value and usually found random non-zero bits in
the high 32-bits so decided that the allocation had succeeded
(in this case in the only cylinder group that it checked).
When the result was passed back to ffs_valloc it looked at
only the bottom 32-bits, saw zero and declared the system
out of inodes. But ffs_hashalloc had really only checked
one cylinder group.

The fix is to change ffs_nodealloccg to return 64-bit results.

Sponsored by:	DARPA & NAI Labs.
Submitted by:	Poul-Henning Kamp <phk@critter.freebsd.dk>
Reviewed by:	Maxime Henrion <mux@freebsd.org>
2002-06-22 21:24:58 +00:00
Kirk McKusick
1c85e6a35d This commit adds basic support for the UFS2 filesystem. The UFS2
filesystem expands the inode to 256 bytes to make space for 64-bit
block pointers. It also adds a file-creation time field, an ability
to use jumbo blocks per inode to allow extent like pointer density,
and space for extended attributes (up to twice the filesystem block
size worth of attributes, e.g., on a 16K filesystem, there is space
for 32K of attributes). UFS2 fully supports and runs existing UFS1
filesystems. New filesystems built using newfs can be built in either
UFS1 or UFS2 format using the -O option. In this commit UFS1 is
the default format, so if you want to build UFS2 format filesystems,
you must specify -O 2. This default will be changed to UFS2 when
UFS2 proves itself to be stable. In this commit the boot code for
reading UFS2 filesystems is not compiled (see /sys/boot/common/ufsread.c)
as there is insufficient space in the boot block. Once the size of the
boot block is increased, this code can be defined.

Things to note: the definition of SBSIZE has changed to SBLOCKSIZE.
The header file <ufs/ufs/dinode.h> must be included before
<ufs/ffs/fs.h> so as to get the definitions of ufs2_daddr_t and
ufs_lbn_t.

Still TODO:
Verify that the first level bootstraps work for all the architectures.
Convert the utility ffsinfo to understand UFS2 and test growfs.
Add support for the extended attribute storage. Update soft updates
to ensure integrity of extended attribute storage. Switch the
current extended attribute interfaces to use the extended attribute
storage. Add the extent like functionality (framework is there,
but is currently never used).

Sponsored by: DARPA & NAI Labs.
Reviewed by:	Poul-Henning Kamp <phk@freebsd.org>
2002-06-21 06:18:05 +00:00
Matthew Dillon
a37313d234 In rev 1.72 a situation related to write/mmap was fixed which could result
in a user process gaining visibility into the 'old' contents of a filesystem
block.  There were two cases:  (1) when uiomove() fails (user process issues
illegal write), and (2) when uiomove() overlaps a mmap() of the same file at
the same offset (fault -> recursive buffer I/O reads contents of old block).

Unfortunately 1.72 also had the unintended effect of forcing the filesystem
to do a read-before-write in the case of a full-block-write (non append case),
e.g. 'dd if=/dev/zero of=test.dat bs=1m count=256 conv=notrunc'.  This
destroys performance.. not only is a read forced for every write, but
clustering breaks as well.

The solution is to clear the buffer manually in the full-block case rather
then asking BALLOC to do it (BALLOC issues the read-before-write).  In the
partial-block case we want BALLOC to do it because the read-before-write
is necessary.  This patch should greatly improve database and news-feed
server performance.

Found by: MKI <mki@mozone.net>
MFC after:	3 days
2002-06-19 09:39:41 +00:00
Semen Ustimenko
13866b3fd2 Fix a typo in my recently added comment: s/beleived/believed/
Submitted by:	keramida
2002-06-06 20:43:03 +00:00
Alfred Perlstein
ba5a4d6c02 Backout/modify previous revision:
"empty default cases shouldn't be removed, they should have a break;
  statement added to them."

Requested by: billf
2002-06-01 20:54:21 +00:00
Alfred Perlstein
37e1dd483d Silence warnings, remove some empty 'default' switch cases. 2002-06-01 20:40:42 +00:00
Semen Ustimenko
f576a00d1b Remove lock from ffs_vget introduced by v1.24. Instead of locking the
vnode creation globaly, we allow processes to create vnodes concurently.
In case of concurent creation of vnode for the one ino, we allow processes
to race and then check who wins.

Assuming that concurent creation of vnode for same ino is really rare case,
this is belived to be an improvement, as it just allows concurent creation
of vnodes.

Idea by:	bp
Reviewed by:	dillon
MFC after:	1 month
2002-05-30 22:04:17 +00:00
Robert Watson
2bab796d96 Remove IFS from 5.0-CURRENT. This facilitates introducing UFS2 as
IFS had its fingers deep in the belly of the UFS/FFS split.  IFS
will be reimplemented by the maintainer at a later date.

Requested by:	adrian (maintainer)
2002-05-19 00:11:08 +00:00
Ian Dowse
ed6ca8732c Fix two casts to "daddr_t *" that should have been "ufs_daddr_t *". 2002-05-18 19:03:00 +00:00
Ian Dowse
e116910b8d Fix a typo where sizeof(daddr_t) was specified instead of sizeof(doff_t).
Now that daddr_t is 64-bit, this caused hash blocks to be allocated
twice as large as they need to be.
2002-05-18 18:58:27 +00:00
Ian Dowse
00b162d018 Remove um_i_effnlink_valid, i_spare[] and the ufsmount_u and inode_u
unions, since these were only necessary when ext2fs used ufs code.

Reviewed by:	mckusick
2002-05-18 18:51:14 +00:00
Poul-Henning Kamp
8fdbc99b69 Fix ufs_daddr_t/daddr_t type problems.
Sponsored by:	DARPA & NAI labs.
2002-05-17 18:59:53 +00:00
Poul-Henning Kamp
c7ffbdd995 Call ufs_bmaparray() with right parameter type.
Sponsored by: DARPA & NAI Labs.
2002-05-17 18:53:29 +00:00
Tom Rhodes
d394511de3 More s/file system/filesystem/g 2002-05-16 21:28:32 +00:00
Poul-Henning Kamp
98b0c78978 Make daddr_t and u_daddr_t 64bits wide.
Retire daddr64_t and use daddr_t instead.

Sponsored by:	DARPA & NAI Labs.
2002-05-14 11:09:43 +00:00
Poul-Henning Kamp
05f4ff5da1 Remove register keyword.
Sponsored by:	DARPA & NAI Labs.
Submitted by:	mckusick
2002-05-13 09:22:31 +00:00
Poul-Henning Kamp
2b2df79fad Remove two "register" and a blank line.
Submitted by:	mckusick
Sponsored by:	DARPA & NAI Labs.
2002-05-12 22:54:48 +00:00
Poul-Henning Kamp
7110af7577 ARGH! SBLOCK is not unused. Try to get this right.
BBSIZE belongs in <sys/disklabel.h> (but shouldn't be a constant).

Define SBLOCK again, using the right math.

Sponsored by: DARPA & NAI Labs.
2002-05-12 20:21:40 +00:00
Poul-Henning Kamp
7cb71b749c Remove #define for BBOFF, it is assumed == 0 so many places that we might
as well forget about it.  In fact the only thing which used it was the
SBOFF macro.

Sponsored by: DARPA & NAI Labs.
2002-05-12 20:00:21 +00:00
Poul-Henning Kamp
16910634dd Remove unused BBLOCK and SBLOCK #defines.
Sponsored by: DARPA & NAI Labs.
2002-05-12 19:56:31 +00:00
Alan Cox
c0b6bbb80b o Condition the compilation and use of vm_freeze_copyopts()
on ENABLE_VFS_IOOPT.
2002-05-06 05:45:57 +00:00
Poul-Henning Kamp
d08961bec3 Move some UFS related stuff home where it belongs. 2002-05-05 20:04:33 +00:00
Jeff Roberson
5df148630f Include systm.h so panic(9) is defined when doing DEBUG_ALL_VFS_LOCKS. 2002-05-04 02:40:37 +00:00
Poul-Henning Kamp
afe564a200 Name ufs_vop_[gs]etextattr() consistently with the rest of our VOPs and
put then in the ufs_vnops where they belong, rather than in the ffs_vnops.

Ok'ed by:	rwatson
Sponsored by:	DARPA & NAI Labs.
2002-05-03 08:40:33 +00:00
Poul-Henning Kamp
d65b3c73d7 Use vop_panic() instead of our home-rolled version. 2002-05-02 19:15:52 +00:00
Alfred Perlstein
5a6ce14c42 Remove support for using soon to be retired "special" poll(2) ops.
Replace with kevent(2) ops.

This is untested, but the code would rot even further if this wasn't
applied.  I've chosen to apply this to prompt some cleanup.

Submitted by: bde
2002-04-18 14:52:28 +00:00
Jeff Roberson
5dacf95488 Don't peak into the malloc_type structure for limits. The desired vnodes
check should be sufficient.  This is required for the pending removal of
malloc_type limits.
2002-04-15 03:35:35 +00:00
Poul-Henning Kamp
2dd527b3ac Move generic disk ioctls from <sys/disklabel.h> to <sys/disk.h>.
Sponsored by:	DARPA & NAI Labs
2002-04-08 09:20:07 +00:00
John Baldwin
6008862bc2 Change callers of mtx_init() to pass in an appropriate lock type name. In
most cases NULL is passed, but in some cases such as network driver locks
(which use the MTX_NETWORK_LOCK macro) and UMA zone locks, a name is used.

Tested on:	i386, alpha, sparc64
2002-04-04 21:03:38 +00:00
Poul-Henning Kamp
a463023d6d Move the FFS parameter MAXFRAG from <sys/param.h> to <ufs/ffs/fs.h>
Sponsored by:	DARPA & NAI Labs.
2002-04-03 20:39:27 +00:00
Poul-Henning Kamp
46a67eaced Use DIOCGSECTORSIZE instead of the bogus DIOCGPART ioctl. 2002-04-02 11:23:14 +00:00
John Baldwin
44731cab3b Change the suser() API to take advantage of td_ucred as well as do a
general cleanup of the API.  The entire API now consists of two functions
similar to the pre-KSE API.  The suser() function takes a thread pointer
as its only argument.  The td_ucred member of this thread must be valid
so the only valid thread pointers are curthread and a few kernel threads
such as thread0.  The suser_cred() function takes a pointer to a struct
ucred as its first argument and an integer flag as its second argument.
The flag is currently only used for the PRISON_ROOT flag.

Discussed on:	smp@
2002-04-01 21:31:13 +00:00
Bruce Evans
0508986cce In ffs_mountffs(), set mnt_iosize_max to si_iosize_max unconditionally
provided the latter is nonzero.  At this point, the former is a fairly
arbitrary default value (DFTPHYS), so changing it to any reasonable
value specified by the device driver is safe.  Using the maximum of
these limits broke ffs clustered i/o for devices whose si_iosize_max
is < DFLTPHYS.  Using the minimum would break device drivers' ability
to increase the active limit from DFTLPHYS up to MAXPHYS.

Copied the code for this and the associated (unnecessary?) fixup of
mp_iosize_max to all other filesystems that use clustering (ext2fs and
msdosfs).  It was completely missing.

PR:		36309
MFC-after:	1 week
2002-03-30 15:12:57 +00:00
David Malone
527f5ce021 Two minor changes to dirhash, which result in some marginal benchmark
improvements.

1) If deleting an entry results in a chain of deleted slots ending in an
   empty slot, then we can be a bit more aggressive about marking slots as
   empty.

2) The last stage of the FNV hash is to xor the last byte of data
   into the hash. This means that filenames which differ only in
   the last byte will be placed close to one another in the hash
   table, which forms longer chains. To work around this common
   case, we also hash in the address of the dirhash structure.

     news/cancel = news/articles/control/cancel for a tradspool inn server
     squid2 = squid level 2 directory (dirs called 00->FF)
     squid3 = squid level 3 directory (files called 00001F00->00001FFF)

                             mean #probes for
                  home dir  mh inbox  news/cancel  tmp    squid2  squid3
old   successful  1.02      3.19      4.07         1.10    7.85   2.06
new   successful  1.04      1.32      1.27         1.04    1.93   1.17

old unsuccessful  1.08      4.50      5.37         1.17   10.76   2.69
new unsuccessful  1.08      1.73      1.64         1.17    2.89   1.37

Reviewed by:	iedowse
MFC after:	2 weeks
2002-03-20 17:58:02 +00:00
Jeff Roberson
e2f8f8a6b6 Remove references to vm_zone.h and switch over to the new uma API. 2002-03-20 08:48:07 +00:00
Alfred Perlstein
6f1e855112 Remove __P. 2002-03-19 22:40:48 +00:00
Bruce Evans
367b50a28f Fixed some printf format errors (hopefully all of the remaining daddr64_t
ones for GENERIC, and all others on the same line as those).  Reformat
the printfs if necessary to avoid new long lones or old format printf
errors.
2002-03-19 04:09:21 +00:00
Kirk McKusick
a0595d0249 Add a flags parameter to VFS_VGET to pass through the desired
locking flags when acquiring a vnode. The immediate purpose is
to allow polling lock requests (LK_NOWAIT) needed by soft updates
to avoid deadlock when enlisting other processes to help with
the background cleanup. For the future it will allow the use of
shared locks for read access to vnodes. This change touches a
lot of files as it affects most filesystems within the system.
It has been well tested on FFS, loopback, and CD-ROM filesystems.
only lightly on the others, so if you find a problem there, please
let me (mckusick@mckusick.com) know.
2002-03-17 01:25:47 +00:00
Kirk McKusick
0d2af52141 Introduce the new 64-bit size disk block, daddr64_t. Change
the bio and buffer structures to have daddr64_t bio_pblkno,
b_blkno, and b_lblkno fields which allows access to disks
larger than a Terabyte in size. This change also requires
that the VOP_BMAP vnode operation accept and return daddr64_t
blocks. This delta should not affect system operation in
any way. It merely sets up the necessary interfaces to allow
the development of disk drivers that work with these larger
disk block addresses. It also allows for the development of
UFS2 which will use 64-bit block addresses.
2002-03-15 18:49:47 +00:00
David E. O'Brien
f0c8652ed4 Quiet a warning on the Alpha. 2002-03-15 04:06:10 +00:00
Kirk McKusick
9721068f95 This corrects the first of two known deadlock conditions that
come from the presence of a snapshot file.
2002-03-14 01:21:13 +00:00
Ian Dowse
23bd68a426 Fix a bug in ufsdirhash_adjfree() that caused it to incorrectly
update the free-space statistics in some cases. The problem affected
directory blocks when the free space dropped below the size of the
maximum allowed entry size. When this happened, the free-space
summary information could claim that there are no further blocks
that can fit a maximum-size entry, even if there are.

The effect of this bug is that the directory may be enlarged even
though there is space within the directory for the new entry. This
wastes disk space and has a negative impact on performance.

Fix it by correctly computing the dh_firstfree array index, adding
a helper macro for clarity. Put an extra sanity check into
ufsdirhash_checkblock() to detect the situation in future.

Found by:	dwmalone
Reviewed by:	dwmalone
MFC after:	1 week
2002-03-11 19:13:22 +00:00
Poul-Henning Kamp
063f776327 I missed one VOP_CLOSE in the previous commit.
Pointed out by:	bde
2002-03-11 16:27:04 +00:00
Poul-Henning Kamp
3dbceccb78 As a XXX bandaid open the mounted device READ/WRITE even if we only mount
read-only.

The trouble here is that we don't reopen the device in read/write mode
when we remount in read/write mode resulting in a filesystem sending
write requests to a device which was only opened read/only.

I'm not quite sure how such a reopen would best be done and defer
the problem to more agile hackers.
2002-03-11 13:53:00 +00:00
Robert Watson
409b188022 Update DBA for NAI. We have several. We used the wrong one. :-) 2002-03-07 17:49:06 +00:00
Brian Feldman
9d9737ecb2 Add new errno ``ENOATTR''. 2002-03-07 15:13:44 +00:00
Matthew Dillon
2cfaf1e315 cleanup readability syntax prior to ongoing b_resid work commits.
MFC after:	1 day
2002-03-06 00:44:30 +00:00
John Baldwin
fdcc1cc09f Use thread0.td_ucred instead of proc0.p_ucred. This change is cosmetic
and isn't strictly required.  However, it lowers the number of false
positives found when grep'ing the kernel sources for p_ucred to ensure
proper locking.
2002-02-27 19:18:10 +00:00
John Baldwin
a854ed9893 Simple p_ucred -> td_ucred changes to start using the per-thread ucred
reference.
2002-02-27 18:32:23 +00:00
Poul-Henning Kamp
986066d065 Replace bowrite() with BUF_WRITE in ufs.
Remove bowrite(), it is now unused.

This is the first step in getting entirely rid of BIO_ORDERED which is
a generally accepted evil thing.

Approved by:	mckusick
2002-02-22 09:03:00 +00:00
Robert Watson
15b27e726e o Minor style fix on #endif, missing '_' in comment. 2002-02-20 15:44:43 +00:00
Poul-Henning Kamp
68edc1b939 Make v_addpollinfo() visible and non-inline.
Have callers only call it as needed.
Add necessary call in ufs_kqfilter().

Test-case found by:	Andrew Gallatin <gallatin@cs.duke.edu>
2002-02-18 16:18:02 +00:00
Poul-Henning Kamp
4b55dbe36b Move the stuff related to select and poll out of struct vnode.
The use of the zone allocator may or may not be overkill.
There is an XXX: over in ufs/ufs/ufs_vnops.c that jlemon may need
to revisit.

This shaves about 60 bytes of struct vnode which on my laptop means
600k less RAM used for vnodes.
2002-02-17 21:15:36 +00:00