freebsd-skq

Author	SHA1	Message	Date
mckusick	b44cb5787c	Add support to UFS2 to provide storage for extended attributes. As this code is not actually used by any of the existing interfaces, it seems unlikely to break anything (famous last words). The internal kernel interface to manipulate these attributes is invoked using two new IO_ flags: IO_NORMAL and IO_EXT. These flags may be specified in the ioflags word of VOP_READ, VOP_WRITE, and VOP_TRUNCATE. Specifying IO_NORMAL means that you want to do I/O to the normal data part of the file and IO_EXT means that you want to do I/O to the extended attributes part of the file. IO_NORMAL and IO_EXT are mutually exclusive for VOP_READ and VOP_WRITE, but may be specified individually or together in the case of VOP_TRUNCATE. For example, when removing a file, VOP_TRUNCATE is called with both IO_NORMAL and IO_EXT set. For backward compatibility, if neither IO_NORMAL nor IO_EXT is set, then IO_NORMAL is assumed. Note that the BA_ and IO_ flags have been `merged' so that they may both be used in the same flags word. This merger is possible by assigning the IO_ flags to the low sixteen bits and the BA_ flags the high sixteen bits. This works because the high sixteen bits of the IO_ word is reserved for read-ahead and help with write clustering so will never be used for flags. This merge lets us get away from code of the form: if (ioflags & IO_SYNC) flags \|= BA_SYNC; For the future, I have considered adding a new field to the vattr structure, va_extsize. This addition could then be exported through the stat structure to allow applications to find out the size of the extended attribute storage and also would provide a more standard interface for truncating them (via VOP_SETATTR rather than VOP_TRUNCATE). I am also contemplating adding a pathconf parameter (for concreteness, lets call it _PC_MAX_EXTSIZE) which would let an application determine the maximum size of the extended atribute storage. Sponsored by: DARPA & NAI Labs.	2002-07-19 07:29:39 +00:00
mckusick	3abb526f86	Change utimes to set the file creation time (for filesystems that support creation times such as UFS2) to the value of the modification time if the value of the modification time is older than the current creation time. See utimes(2) for further details. Sponsored by: DARPA & NAI Labs.	2002-07-17 02:03:19 +00:00
mckusick	a3ff90696f	Change the name of st_createtime to st_birthtime. This change is made to reduce confusion between st_ctime and st_createtime. Submitted by: Eric Allman <eric@sendmail.org> Sponsored by: DARPA & NAI Labs.	2002-07-16 22:36:00 +00:00
trhodes	931ad6a4ff	Fix a type: s/your are/you are/	2002-07-12 19:56:31 +00:00
bde	ec6a3fc0c3	Fixed some printf format errors (4 new ones reported by gcc and 5 nearby old ones not reported by gcc). This helps unbreak LINT.	2002-07-08 12:42:29 +00:00
iedowse	4416f82706	Use indirect function pointer hooks instead of #ifdef SOFTUPDATES direct calls for the two places where the kernel calls into soft updates code. Set up the hooks in softdep_initialize() and NULL them out in softdep_uninitialize(). This change allows soft updates to function correctly when ufs is loaded as a module. Reviewed by: mckusick	2002-07-01 17:59:40 +00:00
iedowse	603a270887	Add the ffs bits necessary to support unloading of the ufs kernel module. This adds an ffs_uninit() function that calls ufs_uninit() and also calls a new softdep_uninitialize() function. Add a stub for softdep_uninitialize() to cover the non-SOFTUPDATES case. Reviewed by: mckusick	2002-07-01 11:00:47 +00:00
iedowse	948e368072	Remove the bogus SYSINIT from ufs_dirhash.c and instead add a call to ufsdirhash_init() from ufs_init(). Add uninit() functions corresponding the ufs, dirhash, quota and ihash init() functions.	2002-06-30 02:49:39 +00:00
iedowse	c0323a07fb	Remove the kernel file-size limit for UFS2, so that only the limit imposed by the filesystem structure itself remains. With 16k blocks, the maximum file size is now just over 128TB. For now, the UFS1 file size limit is left unchanged so as to remain consistent with RELENG_4, but it too could be removed in the future. Reviewed by: mckusick	2002-06-26 18:34:51 +00:00
ken	0d3a835f3f	At long last, commit the zero copy sockets code. MAKEDEV: Add MAKEDEV glue for the ti(4) device nodes. ti.4: Update the ti(4) man page to include information on the TI_JUMBO_HDRSPLIT and TI_PRIVATE_JUMBOS kernel options, and also include information about the new character device interface and the associated ioctls. man9/Makefile: Add jumbo.9 and zero_copy.9 man pages and associated links. jumbo.9: New man page describing the jumbo buffer allocator interface and operation. zero_copy.9: New man page describing the general characteristics of the zero copy send and receive code, and what an application author should do to take advantage of the zero copy functionality. NOTES: Add entries for ZERO_COPY_SOCKETS, TI_PRIVATE_JUMBOS, TI_JUMBO_HDRSPLIT, MSIZE, and MCLSHIFT. conf/files: Add uipc_jumbo.c and uipc_cow.c. conf/options: Add the 5 options mentioned above. kern_subr.c: Receive side zero copy implementation. This takes "disposable" pages attached to an mbuf, gives them to a user process, and then recycles the user's page. This is only active when ZERO_COPY_SOCKETS is turned on and the kern.ipc.zero_copy.receive sysctl variable is set to 1. uipc_cow.c: Send side zero copy functions. Takes a page written by the user and maps it copy on write and assigns it kernel virtual address space. Removes copy on write mapping once the buffer has been freed by the network stack. uipc_jumbo.c: Jumbo disposable page allocator code. This allocates (optionally) disposable pages for network drivers that want to give the user the option of doing zero copy receive. uipc_socket.c: Add kern.ipc.zero_copy.{send,receive} sysctls that are enabled if ZERO_COPY_SOCKETS is turned on. Add zero copy send support to sosend() -- pages get mapped into the kernel instead of getting copied if they meet size and alignment restrictions. uipc_syscalls.c:Un-staticize some of the sf* functions so that they can be used elsewhere. (uipc_cow.c) if_media.c: In the SIOCGIFMEDIA ioctl in ifmedia_ioctl(), avoid calling malloc() with M_WAITOK. Return an error if the M_NOWAIT malloc fails. The ti(4) driver and the wi(4) driver, at least, call this with a mutex held. This causes witness warnings for 'ifconfig -a' with a wi(4) or ti(4) board in the system. (I've only verified for ti(4)). ip_output.c: Fragment large datagrams so that each segment contains a multiple of PAGE_SIZE amount of data plus headers. This allows the receiver to potentially do page flipping on receives. if_ti.c: Add zero copy receive support to the ti(4) driver. If TI_PRIVATE_JUMBOS is not defined, it now uses the jumbo(9) buffer allocator for jumbo receive buffers. Add a new character device interface for the ti(4) driver for the new debugging interface. This allows (a patched version of) gdb to talk to the Tigon board and debug the firmware. There are also a few additional debugging ioctls available through this interface. Add header splitting support to the ti(4) driver. Tweak some of the default interrupt coalescing parameters to more useful defaults. Add hooks for supporting transmit flow control, but leave it turned off with a comment describing why it is turned off. if_tireg.h: Change the firmware rev to 12.4.11, since we're really at 12.4.11 plus fixes from 12.4.13. Add defines needed for debugging. Remove the ti_stats structure, it is now defined in sys/tiio.h. ti_fw.h: 12.4.11 firmware. ti_fw2.h: 12.4.11 firmware, plus selected fixes from 12.4.13, and my header splitting patches. Revision 12.4.13 doesn't handle 10/100 negotiation properly. (This firmware is the same as what was in the tree previously, with the addition of header splitting support.) sys/jumbo.h: Jumbo buffer allocator interface. sys/mbuf.h: Add a new external mbuf type, EXT_DISPOSABLE, to indicate that the payload buffer can be thrown away / flipped to a userland process. socketvar.h: Add prototype for socow_setup. tiio.h: ioctl interface to the character portion of the ti(4) driver, plus associated structure/type definitions. uio.h: Change prototype for uiomoveco() so that we'll know whether the source page is disposable. ufs_readwrite.c:Update for new prototype of uiomoveco(). vm_fault.c: In vm_fault(), check to see whether we need to do a page based copy on write fault. vm_object.c: Add a new function, vm_object_allocate_wait(). This does the same thing that vm_object allocate does, except that it gives the caller the opportunity to specify whether it should wait on the uma_zalloc() of the object structre. This allows vm objects to be allocated while holding a mutex. (Without generating WITNESS warnings.) vm_object_allocate() is implemented as a call to vm_object_allocate_wait() with the malloc flag set to M_WAITOK. vm_object.h: Add prototype for vm_object_allocate_wait(). vm_page.c: Add page-based copy on write setup, clear and fault routines. vm_page.h: Add page based COW function prototypes and variable in the vm_page structure. Many thanks to Drew Gallatin, who wrote the zero copy send and receive code, and to all the other folks who have tested and reviewed this code over the years.	2002-06-26 03:37:47 +00:00
mckusick	ea2f279985	Force the quota update to be done when an inode is released in ufs_inactive. This avoid a panic when checking a NULL credential in suser_cred().	2002-06-25 01:02:28 +00:00
jlemon	fe48a4f5d5	Prototype fixes (long newinum --> ino_t newinum).	2002-06-24 17:20:19 +00:00
mux	0a55bc4167	Warning fixes for 64 bits platforms. This eliminates all the warnings I have had in the FFS code on sparc64. Reviewed by: mckusick	2002-06-23 18:17:27 +00:00
dillon	b0b3177408	Rename the BALLOC flags from B_* to BA_* to avoid confusion with the struct buf B_ flags. Approved by: mckusick	2002-06-23 06:12:22 +00:00
mckusick	2730f1975d	This patch fixes a problem whereby filesystems that ran out of inodes in a cylinder group would fail to check for free inodes in other cylinder groups. This bug was introduced in the UFS2 code merge two days ago. An inode is allocated by calling ffs_valloc which calls ffs_hashalloc to do the filesystem scan. Ffs_hashalloc walks around the cylinder groups calling its passed allocator (ffs_nodealloccg in this case) until the allocator returns a non-zero result. The bug is that ffs_hashalloc expects the passed allocator function to return a 64-bit ufs2_daddr_t. When allocating inodes, it calls ffs_nodealloccg which was returning a 32-bit ino_t. The ffs_hashalloc code checked a 64-bit return value and usually found random non-zero bits in the high 32-bits so decided that the allocation had succeeded (in this case in the only cylinder group that it checked). When the result was passed back to ffs_valloc it looked at only the bottom 32-bits, saw zero and declared the system out of inodes. But ffs_hashalloc had really only checked one cylinder group. The fix is to change ffs_nodealloccg to return 64-bit results. Sponsored by: DARPA & NAI Labs. Submitted by: Poul-Henning Kamp <phk@critter.freebsd.dk> Reviewed by: Maxime Henrion <mux@freebsd.org>	2002-06-22 21:24:58 +00:00
mckusick	88d85c15ef	This commit adds basic support for the UFS2 filesystem. The UFS2 filesystem expands the inode to 256 bytes to make space for 64-bit block pointers. It also adds a file-creation time field, an ability to use jumbo blocks per inode to allow extent like pointer density, and space for extended attributes (up to twice the filesystem block size worth of attributes, e.g., on a 16K filesystem, there is space for 32K of attributes). UFS2 fully supports and runs existing UFS1 filesystems. New filesystems built using newfs can be built in either UFS1 or UFS2 format using the -O option. In this commit UFS1 is the default format, so if you want to build UFS2 format filesystems, you must specify -O 2. This default will be changed to UFS2 when UFS2 proves itself to be stable. In this commit the boot code for reading UFS2 filesystems is not compiled (see /sys/boot/common/ufsread.c) as there is insufficient space in the boot block. Once the size of the boot block is increased, this code can be defined. Things to note: the definition of SBSIZE has changed to SBLOCKSIZE. The header file <ufs/ufs/dinode.h> must be included before <ufs/ffs/fs.h> so as to get the definitions of ufs2_daddr_t and ufs_lbn_t. Still TODO: Verify that the first level bootstraps work for all the architectures. Convert the utility ffsinfo to understand UFS2 and test growfs. Add support for the extended attribute storage. Update soft updates to ensure integrity of extended attribute storage. Switch the current extended attribute interfaces to use the extended attribute storage. Add the extent like functionality (framework is there, but is currently never used). Sponsored by: DARPA & NAI Labs. Reviewed by: Poul-Henning Kamp <phk@freebsd.org>	2002-06-21 06:18:05 +00:00
dillon	7aa1bb9d05	In rev 1.72 a situation related to write/mmap was fixed which could result in a user process gaining visibility into the 'old' contents of a filesystem block. There were two cases: (1) when uiomove() fails (user process issues illegal write), and (2) when uiomove() overlaps a mmap() of the same file at the same offset (fault -> recursive buffer I/O reads contents of old block). Unfortunately 1.72 also had the unintended effect of forcing the filesystem to do a read-before-write in the case of a full-block-write (non append case), e.g. 'dd if=/dev/zero of=test.dat bs=1m count=256 conv=notrunc'. This destroys performance.. not only is a read forced for every write, but clustering breaks as well. The solution is to clear the buffer manually in the full-block case rather then asking BALLOC to do it (BALLOC issues the read-before-write). In the partial-block case we want BALLOC to do it because the read-before-write is necessary. This patch should greatly improve database and news-feed server performance. Found by: MKI <mki@mozone.net> MFC after: 3 days	2002-06-19 09:39:41 +00:00
semenu	f48bc32790	Fix a typo in my recently added comment: s/beleived/believed/ Submitted by: keramida	2002-06-06 20:43:03 +00:00
alfred	28ce0e9ae5	Backout/modify previous revision: "empty default cases shouldn't be removed, they should have a break; statement added to them." Requested by: billf	2002-06-01 20:54:21 +00:00
alfred	04d357d9e8	Silence warnings, remove some empty 'default' switch cases.	2002-06-01 20:40:42 +00:00
semenu	b9960bece1	Remove lock from ffs_vget introduced by v1.24. Instead of locking the vnode creation globaly, we allow processes to create vnodes concurently. In case of concurent creation of vnode for the one ino, we allow processes to race and then check who wins. Assuming that concurent creation of vnode for same ino is really rare case, this is belived to be an improvement, as it just allows concurent creation of vnodes. Idea by: bp Reviewed by: dillon MFC after: 1 month	2002-05-30 22:04:17 +00:00
rwatson	930f7599ed	Remove IFS from 5.0-CURRENT. This facilitates introducing UFS2 as IFS had its fingers deep in the belly of the UFS/FFS split. IFS will be reimplemented by the maintainer at a later date. Requested by: adrian (maintainer)	2002-05-19 00:11:08 +00:00
iedowse	caf2c5a16f	Fix two casts to "daddr_t " that should have been "ufs_daddr_t ".	2002-05-18 19:03:00 +00:00
iedowse	cc9d214396	Fix a typo where sizeof(daddr_t) was specified instead of sizeof(doff_t). Now that daddr_t is 64-bit, this caused hash blocks to be allocated twice as large as they need to be.	2002-05-18 18:58:27 +00:00
iedowse	794dddf9bf	Remove um_i_effnlink_valid, i_spare[] and the ufsmount_u and inode_u unions, since these were only necessary when ext2fs used ufs code. Reviewed by: mckusick	2002-05-18 18:51:14 +00:00
phk	e127001407	Fix ufs_daddr_t/daddr_t type problems. Sponsored by: DARPA & NAI labs.	2002-05-17 18:59:53 +00:00
phk	152c7cfbca	Call ufs_bmaparray() with right parameter type. Sponsored by: DARPA & NAI Labs.	2002-05-17 18:53:29 +00:00
trhodes	28d42899b7	More s/file system/filesystem/g	2002-05-16 21:28:32 +00:00
phk	8536ea3cdb	Make daddr_t and u_daddr_t 64bits wide. Retire daddr64_t and use daddr_t instead. Sponsored by: DARPA & NAI Labs.	2002-05-14 11:09:43 +00:00
phk	b591373208	Remove register keyword. Sponsored by: DARPA & NAI Labs. Submitted by: mckusick	2002-05-13 09:22:31 +00:00
phk	e76663bf87	Remove two "register" and a blank line. Submitted by: mckusick Sponsored by: DARPA & NAI Labs.	2002-05-12 22:54:48 +00:00
phk	e27fed0950	ARGH! SBLOCK is not unused. Try to get this right. BBSIZE belongs in <sys/disklabel.h> (but shouldn't be a constant). Define SBLOCK again, using the right math. Sponsored by: DARPA & NAI Labs.	2002-05-12 20:21:40 +00:00
phk	4aa59982cb	Remove #define for BBOFF, it is assumed == 0 so many places that we might as well forget about it. In fact the only thing which used it was the SBOFF macro. Sponsored by: DARPA & NAI Labs.	2002-05-12 20:00:21 +00:00
phk	57e4ad9a8b	Remove unused BBLOCK and SBLOCK #defines. Sponsored by: DARPA & NAI Labs.	2002-05-12 19:56:31 +00:00
alc	d60a525036	o Condition the compilation and use of vm_freeze_copyopts() on ENABLE_VFS_IOOPT.	2002-05-06 05:45:57 +00:00
phk	f075f5e39a	Move some UFS related stuff home where it belongs.	2002-05-05 20:04:33 +00:00
jeff	86e123b512	Include systm.h so panic(9) is defined when doing DEBUG_ALL_VFS_LOCKS.	2002-05-04 02:40:37 +00:00
phk	a1f99d3509	Name ufs_vop_[gs]etextattr() consistently with the rest of our VOPs and put then in the ufs_vnops where they belong, rather than in the ffs_vnops. Ok'ed by: rwatson Sponsored by: DARPA & NAI Labs.	2002-05-03 08:40:33 +00:00
phk	860ddc59bc	Use vop_panic() instead of our home-rolled version.	2002-05-02 19:15:52 +00:00
alfred	d1526107c3	Remove support for using soon to be retired "special" poll(2) ops. Replace with kevent(2) ops. This is untested, but the code would rot even further if this wasn't applied. I've chosen to apply this to prompt some cleanup. Submitted by: bde	2002-04-18 14:52:28 +00:00
jeff	b972756970	Don't peak into the malloc_type structure for limits. The desired vnodes check should be sufficient. This is required for the pending removal of malloc_type limits.	2002-04-15 03:35:35 +00:00
phk	33405073ec	Move generic disk ioctls from <sys/disklabel.h> to <sys/disk.h>. Sponsored by: DARPA & NAI Labs	2002-04-08 09:20:07 +00:00
jhb	db9aa81e23	Change callers of mtx_init() to pass in an appropriate lock type name. In most cases NULL is passed, but in some cases such as network driver locks (which use the MTX_NETWORK_LOCK macro) and UMA zone locks, a name is used. Tested on: i386, alpha, sparc64	2002-04-04 21:03:38 +00:00
phk	65c4a4cb9d	Move the FFS parameter MAXFRAG from <sys/param.h> to <ufs/ffs/fs.h> Sponsored by: DARPA & NAI Labs.	2002-04-03 20:39:27 +00:00
phk	79a894e10e	Use DIOCGSECTORSIZE instead of the bogus DIOCGPART ioctl.	2002-04-02 11:23:14 +00:00
jhb	dc2e474f79	Change the suser() API to take advantage of td_ucred as well as do a general cleanup of the API. The entire API now consists of two functions similar to the pre-KSE API. The suser() function takes a thread pointer as its only argument. The td_ucred member of this thread must be valid so the only valid thread pointers are curthread and a few kernel threads such as thread0. The suser_cred() function takes a pointer to a struct ucred as its first argument and an integer flag as its second argument. The flag is currently only used for the PRISON_ROOT flag. Discussed on: smp@	2002-04-01 21:31:13 +00:00
bde	a9b7c63b29	In ffs_mountffs(), set mnt_iosize_max to si_iosize_max unconditionally provided the latter is nonzero. At this point, the former is a fairly arbitrary default value (DFTPHYS), so changing it to any reasonable value specified by the device driver is safe. Using the maximum of these limits broke ffs clustered i/o for devices whose si_iosize_max is < DFLTPHYS. Using the minimum would break device drivers' ability to increase the active limit from DFTLPHYS up to MAXPHYS. Copied the code for this and the associated (unnecessary?) fixup of mp_iosize_max to all other filesystems that use clustering (ext2fs and msdosfs). It was completely missing. PR: 36309 MFC-after: 1 week	2002-03-30 15:12:57 +00:00
dwmalone	246869b676	Two minor changes to dirhash, which result in some marginal benchmark improvements. 1) If deleting an entry results in a chain of deleted slots ending in an empty slot, then we can be a bit more aggressive about marking slots as empty. 2) The last stage of the FNV hash is to xor the last byte of data into the hash. This means that filenames which differ only in the last byte will be placed close to one another in the hash table, which forms longer chains. To work around this common case, we also hash in the address of the dirhash structure. news/cancel = news/articles/control/cancel for a tradspool inn server squid2 = squid level 2 directory (dirs called 00->FF) squid3 = squid level 3 directory (files called 00001F00->00001FFF) mean #probes for home dir mh inbox news/cancel tmp squid2 squid3 old successful 1.02 3.19 4.07 1.10 7.85 2.06 new successful 1.04 1.32 1.27 1.04 1.93 1.17 old unsuccessful 1.08 4.50 5.37 1.17 10.76 2.69 new unsuccessful 1.08 1.73 1.64 1.17 2.89 1.37 Reviewed by: iedowse MFC after: 2 weeks	2002-03-20 17:58:02 +00:00
jeff	7bd172afc2	Remove references to vm_zone.h and switch over to the new uma API.	2002-03-20 08:48:07 +00:00
alfred	79061a9306	Remove __P.	2002-03-19 22:40:48 +00:00

1 2 3 4 5 ...

929 Commits