freebsd-skq

Author	SHA1	Message	Date
jkim	bc7e5e240b	- Detect Bochs BIOS variants and use HZ_VM as well. - Free kernel environment variable after its use. - Fix style(9) nits.	2008-12-08 18:39:59 +00:00
kib	8324189f53	Do drop vm map lock earlier in the sysctl_kern_proc_vmmap(), to avoid locking a vnode while having vm map locked. Reported and tested by: pho MFC after: 1 week	2008-12-08 12:29:30 +00:00
kmacy	598b522b42	- convert radix node head lock from mutex to rwlock - make radix node head lock not recursive - fix LOR in rtexpunge - fix LOR in rtredirect Reviewed by: sam	2008-12-07 21:15:43 +00:00
kib	ccad2ebfb2	Several threads in a process may do vfork() simultaneously. Then, all parent threads sleep on the parent' struct proc until corresponding child releases the vmspace. Each sleep is interlocked with proc mutex of the child, that triggers assertion in the sleepq_add(). The assertion requires that at any time, all simultaneous sleepers for the channel use the same interlock. Silent the assertion by using conditional variable allocated in the child. Broadcast the variable event on exec() and exit(). Since struct proc * sleep wait channel is overloaded for several unrelated events, I was unable to remove wakeups from the places where cv_broadcast() is added, except exec(). Reported and tested by: ganbold Suggested and reviewed by: jhb MFC after: 2 week	2008-12-05 20:50:24 +00:00
jhb	0f1a8aa011	When the SYSINIT() to load a module invokes the MOD_LOAD event successfully, move that module to the head of the associated linker file's list of modules. The end result is that once all the modules are loaded, they are sorted in the reverse of their load order. This causes the kernel linker to invoke the MOD_QUIESCE and MOD_UNLOAD events in the reverse of the order that MOD_LOAD was invoked. This means that the ordering of MOD_LOAD events that is set by the SI_* paramters to DECLARE_MODULE() are now honored in the same order they would be for SYSUNINIT() for the MOD_QUIESCE and MOD_UNLOAD events. MFC after: 1 month	2008-12-05 16:47:30 +00:00
jhb	0ac14961cc	- Invoke MOD_QUIESCE on all modules in a linker file (kld) before unloading any modules. As a result, if any module veto's an unload request via MOD_QUIESCE, the entire set of modules for that linker file will remain loaded and active now rather than leaving the kld in a weird state where some modules are loaded and some are unloaded. - This also moves the logic for handling the "forced" unload flag out of kern_module.c and into kern_linker.c which is a bit cleaner. - Add a module_name() routine that returns the name of a module and use that instead of printing pointer values in debug messages when a module fails MOD_QUIESCE or MOD_UNLOAD. MFC after: 1 month	2008-12-05 13:40:25 +00:00
bz	965b4db5a5	Fix a credential reference leak. [1] Close subtle but relatively unlikely race conditions when propagating the vnode write error to other active sessions tracing to the same vnode, without holding a reference on the vnode anymore. [2] PR: kern/126368 [1] Submitted by: rwatson [2] Reviewed by: kib, rwatson MFC after: 4 weeks	2008-12-03 15:54:35 +00:00
bz	604d89458a	Rather than using hidden includes (with cicular dependencies), directly include only the header files needed. This reduces the unneeded spamming of various headers into lots of files. For now, this leaves us with very few modules including vnet.h and thus needing to depend on opt_route.h. Reviewed by: brooks, gnn, des, zec, imp Sponsored by: The FreeBSD Foundation	2008-12-02 21:37:28 +00:00
kib	ade687809e	Shared lookup makes it possible to create several negative cache entries for one name. Then, creating inode with that name would remove one entry, leaving others dormant. Reclaiming the vnode would uncover negative entries, causing false return of ENOENT from the calls like stat, that do not create inode. Prevent creation of the duplicated negative entries. Reported and debugged with: pho Reviewed by: jhb X-MFC: after shared lookup changes	2008-12-02 11:14:16 +00:00
peter	76037b082e	Merge user/peter/kinfo branch as of r185547 into head. This changes struct kinfo_filedesc and kinfo_vmentry such that they are same on both 32 and 64 bit platforms like i386/amd64 and won't require sysctl wrapping. Two new OIDs are assigned. The old ones are available under COMPAT_FREEBSD7 - but it isn't that simple. The superceded interface was never actually released on 7.x. The other main change is to pack the data passed to userland via the sysctl. kf_structsize and kve_structsize are reduced for the copyout. If you have a process with 100,000+ sockets open, the unpacked records require a 132MB+ copyout. With packing, it is "only" ~35MB. (Still seriously unpleasant, but not quite as devastating). A similar problem exists for the vmentry structure - have lots and lots of shared libraries and small mmaps and its copyout gets expensive too. My immediate problem is valgrind. It traditionally achieves this functionality by parsing procfs output, in a packed format. Secondly, when tracing 32 bit binaries on amd64 under valgrind, it uses a cross compiled 32 bit binary which ran directly into the differing data structures in 32 vs 64 bit mode. (valgrind uses this to track file descriptor operations and this therefore affected every single 32 bit binary) I've added two utility functions to libutil to unpack the structures into a fixed record length and to make it a little more convenient to use.	2008-12-02 06:50:26 +00:00
peter	0cd59a18e3	Prune some whining.	2008-12-02 02:32:13 +00:00
kan	6f76d481f0	Shared memory objects that have size which is not necessarily equal to exact multiple of system page size should still be allowed to be mapped in their entirety to match the regular vnode backed file behavior. Reported by: ed Reviewed by: jhb	2008-12-01 22:33:50 +00:00
kensmith	b8203d2b22	Catch up with the disappearance of sys/dev/hfa.	2008-12-01 14:34:42 +00:00
attilio	2f606e171c	Fix an inverted check introduced in r184554. Submitted by: tegge Pointy hat to: me	2008-12-01 03:00:26 +00:00
peter	cd7b78c33f	Duplicate another few hundred lines of code in order to be compatible with unreleased binaries.	2008-12-01 02:13:32 +00:00
davidxu	25ef38d002	Revision 184199 had not been fully reverted, add missing piece. Reported by: phk	2008-12-01 01:54:55 +00:00
peter	2b1f03929a	Properly wrap this giant block of duplicate code inside COMPAT_FREEBSD7	2008-11-30 21:04:53 +00:00
peter	343bde9706	Implement copyout packing more along the lines of what I had in mind. Create a temporary duplicate implementation of old filedesc struct for pre-7.1 libgtop package. Todo: specific fd or addr request	2008-11-30 00:18:21 +00:00
peter	83dc2280ce	WIP kinfo_file/kinfo_vmmentry tweaks. The idea: 1) to get the 32 and 64 bit versions in sync so that no shims are needed, Valgrind in particular excercises this. and: 2) reduce the size of the copyout. On large processes this turns out to be a huge problem. Valgrind also suffers from this since it needs to do this in a context that can't malloc. I want to pack the records. 3) Add new types.. 'tell me about fd N' and 'tell me about addr N'.	2008-11-29 20:55:11 +00:00
bz	817305efd5	Unbreak the no-networks (no INET/6) build that I broke with the commit in r185435. Pointyhat: no, but I could need a ski cap for the winter	2008-11-29 16:17:39 +00:00
bz	d2730d5b27	MFp4: Bring in updated jail support from bz_jail branch. This enhances the current jail implementation to permit multiple addresses per jail. In addtion to IPv4, IPv6 is supported as well. Due to updated checks it is even possible to have jails without an IP address at all, which basically gives one a chroot with restricted process view, no networking,.. SCTP support was updated and supports IPv6 in jails as well. Cpuset support permits jails to be bound to specific processor sets after creation. Jails can have an unrestricted (no duplicate protection, etc.) name in addition to the hostname. The jail name cannot be changed from within a jail and is considered to be used for management purposes or as audit-token in the future. DDB 'show jails' command was added to aid debugging. Proper compat support permits 32bit jail binaries to be used on 64bit systems to manage jails. Also backward compatibility was preserved where possible: for jail v1 syscalls, as well as with user space management utilities. Both jail as well as prison version were updated for the new features. A gap was intentionally left as the intermediate versions had been used by various patches floating around the last years. Bump __FreeBSD_version for the afore mentioned and in kernel changes. Special thanks to: - Pawel Jakub Dawidek (pjd) for his multi-IPv4 patches and Olivier Houchard (cognet) for initial single-IPv6 patches. - Jeff Roberson (jeff) and Randall Stewart (rrs) for their help, ideas and review on cpuset and SCTP support. - Robert Watson (rwatson) for lots and lots of help, discussions, suggestions and review of most of the patch at various stages. - John Baldwin (jhb) for his help. - Simon L. Nielsen (simon) as early adopter testing changes on cluster machines as well as all the testers and people who provided feedback the last months on freebsd-jail and other channels. - My employer, CK Software GmbH, for the support so I could work on this. Reviewed by: (see above) MFC after: 3 months (this is just so that I get the mail) X-MFC Before: 7.2-RELEASE if possible	2008-11-29 14:32:14 +00:00
kib	bf74bb2e16	In the nfsrv_fhtovp(), after the vfs_getvfs() function found the pointer to the fs, but before a vnode on the fs is locked, unmount may free fs structures, causing access to destroyed data and freed memory. Introduce a vfs_busymp() function that looks up and busies found fs while mountlist_mtx is held. Use it in nfsrv_fhtovp() and in the implementation of the handle syscalls. Two other uses of the vfs_getvfs() in the vfs_subr.c, namely in sysctl_vfs_ctl and vfs_getnewfsid seems to be ok. In particular, sysctl_vfs_ctl is protected by Giant by being a non-sleeping sysctl handler, that prevents Giant-locked unmount code to interfere with it. Noted by: tegge Reviewed by: dfr Tested by: pho MFC after: 1 month	2008-11-29 13:34:59 +00:00
pjd	881f5f6bef	Improve KASSERT() call a bit: - Print flags in hex. - Note that flags can be fine and panic can be due unexpected error condition. - Remove redundant new line character. Eventhough panic message excess 80 characters keep it in one line so it is easier to grep.	2008-11-29 12:40:14 +00:00
bz	3139d98062	With the permissions of phk@ change the license on kern_jail.c to a 2 clause BSD license.	2008-11-28 19:23:46 +00:00
ed	355c41a8e4	Fix matching of message queues by name. The mqfs_search() routine uses strncmp() to match message queue objects by name. This is because it can be called from environments where the file name is not null terminated (the VFS for example). Unfortunately it doesn't compare the lengths of the message queue names, which means if a system has "Queue12345", the name "Queue" will also match. I noticed this when a student of mine handed in an exercise using message queues with names "Queue2" and "Queue". Reviewed by: rink	2008-11-28 14:53:18 +00:00
kib	c0925f09e7	Explicitely note that destroy_dev() sleeps. Requested by: ed (some time ago), Jaakko Heinonen <jh saunalahti fi>	2008-11-27 16:47:25 +00:00
ganbold	93f46a969a	Remove unused variable. Found with: Coverity Prevent(tm) CID: 3664 Approved by: kib	2008-11-27 04:40:37 +00:00
zec	95a15f5c84	Merge more of currently non-functional (i.e. resolving to whitespace) macros from p4/vimage branch. Do a better job at enclosing all instantiations of globals scheduled for virtualization in #ifdef VIMAGE_GLOBALS blocks. De-virtualize and mark as const saorder_state_alive and saorder_state_any arrays from ipsec code, given that they are never updated at runtime, so virtualizing them would be pointless. Reviewed by: bz, julian Approved by: julian (mentor) Obtained from: //depot/projects/vimage-commit2/... X-MFC after: never Sponsored by: NLnet Foundation, The FreeBSD Foundation	2008-11-26 22:32:07 +00:00
marcus	8b6bc38b7c	Move vn_fullpath1() outside of FILEDESC locking. This is being done in advance of teaching vn_fullpath1() how to query file systems for vnode-to-name mappings when cache lookups fail. Thanks to kib for guidance and patience on this process. Reviewed by: kib Approved by: kib	2008-11-25 15:36:15 +00:00
emaste	abb4c5f368	Correct typo in comment: thier -> their	2008-11-24 19:28:52 +00:00
dwmalone	e3dcd91abe	It's possible that the dump device has gone away after it was configured, change the message to let people know this is a possibility. I've slightly changed the message from the one submitted by Pekka to keep the printf on one line. Submitted by: Pekka Savola <pekkas@netcore.fi>	2008-11-23 21:05:22 +00:00
kib	8fad2283b3	Add sv_flags field to struct sysentvec with intention to provide description of the ABI of the currently executing image. Change some places to test the flags instead of explicit comparing with address of known sysentvec structures to determine ABI features. Discussed with: dchagin, imp, jhb, peter	2008-11-22 12:36:15 +00:00
kmacy	9d3bb599b1	- bump __FreeBSD version to reflect added buf_ring, memory barriers, and ifnet functions - add memory barriers to <machine/atomic.h> - update drivers to only conditionally define their own - add lockless producer / consumer ring buffer - remove ring buffer implementation from cxgb and update its callers - add if_transmit(struct ifnet ifp, struct mbuf m) to ifnet to allow drivers to efficiently manage multiple hardware queues (i.e. not serialize all packets through one ifq) - expose if_qflush to allow drivers to flush any driver managed queues This work was supported by Bitgravity Inc. and Chelsio Inc.	2008-11-22 05:55:56 +00:00
julian	cf07f793f2	Fix a scope problem in the multiple routing table code that stopped the SO_SETFIB socket option from working correctly. Obtained from: Ironport MFC after: 3 days	2008-11-19 19:19:30 +00:00
jhb	c2251260be	Allow device hints to wire the unit numbers of devices. - An "at" hint now reserves a device name. - A new BUS_HINT_DEVICE_UNIT method is added to the bus interface. When determining the unit number of a device, this method is invoked to let the bus driver specify the unit of a device given a specific devclass. This is the only way a device can be given a name reserved via an "at" hint. - Implement BUS_HINT_DEVICE_UNIT() for the acpi(4) and isa(4) bus drivers. Both of these busses implement this by comparing the resources for a given hint device with the resources enumerated by ACPI/PnPBIOS and wire a unit if the hint resources are a subset of the "real" resources. - Use bus_hinted_children() for adding hinted devices on isa(4) busses now instead of doing it by hand. - Remove the unit kludging from sio(4) as it is no longer necessary. Prodding from: peter, imp OK'd by: marcel MFC after: 1 month	2008-11-18 21:01:54 +00:00
jhb	b08d457fbe	When checking to see if another CPU is running its idle thread, examine the thread running on the other CPU instead of the thread being placed on the run queue. Reported by: Ravi Murty @ Intel Reviewed by: jeff	2008-11-18 05:41:34 +00:00
delphij	39ade9204d	Obey signedness flag in %z case. MFC after: 2 months	2008-11-17 23:57:40 +00:00
pjd	bbe899b96e	Update ZFS from version 6 to 13 and bring some FreeBSD-specific changes. This bring huge amount of changes, I'll enumerate only user-visible changes: - Delegated Administration Allows regular users to perform ZFS operations, like file system creation, snapshot creation, etc. - L2ARC Level 2 cache for ZFS - allows to use additional disks for cache. Huge performance improvements mostly for random read of mostly static content. - slog Allow to use additional disks for ZFS Intent Log to speed up operations like fsync(2). - vfs.zfs.super_owner Allows regular users to perform privileged operations on files stored on ZFS file systems owned by him. Very careful with this one. - chflags(2) Not all the flags are supported. This still needs work. - ZFSBoot Support to boot off of ZFS pool. Not finished, AFAIK. Submitted by: dfr - Snapshot properties - New failure modes Before if write requested failed, system paniced. Now one can select from one of three failure modes: - panic - panic on write error - wait - wait for disk to reappear - continue - serve read requests if possible, block write requests - Refquota, refreservation properties Just quota and reservation properties, but don't count space consumed by children file systems, clones and snapshots. - Sparse volumes ZVOLs that don't reserve space in the pool. - External attributes Compatible with extattr(2). - NFSv4-ACLs Not sure about the status, might not be complete yet. Submitted by: trasz - Creation-time properties - Regression tests for zpool(8) command. Obtained from: OpenSolaris	2008-11-17 20:49:29 +00:00
kib	ec4c199459	Revert r184118. There is actually a code in the kernel, for instance in kern_unlinkat(), that expects that vn_start_write() actually fills the mp even when the call failed. As Tor noted, that pattern relies on the the type stability of the mount points, as well as that suspended mount points are never freed and V_XSLEEP is always passed to vn_start_write() when called on a freed mount point. Reported by: stass Reviewed by: tegge PR: 123768	2008-11-16 21:56:29 +00:00
n_hibma	1e9b4258a4	Silence detach messages if the device has marked itself quiet (u3g). MFC after: 3 weeks	2008-11-13 21:46:19 +00:00
ed	1722724523	Don't forget to relock the TTY after uiomove() returns an error. Peter Holm just discovered this funny bug inside the TTY code: if uiomove() in ttydisc_write() returns an error, we forget to relock the TTY before jumping out of ttydisc_write(). Fix it by placing tty_unlock() and tty_lock() around uiomove(). Submitted by: pho	2008-11-12 09:04:44 +00:00
ed	8d12469978	Several cleanups related to pipe(2). - Use `fildes[2]' instead of `*fildes' to make more clear that pipe(2) fills an array with two descriptors. - Remove EFAULT from the manual page. Because of the current calling convention, pipe(2) raises a segmentation fault when an invalid address is passed. - Introduce kern_pipe() to make it easier for binary emulations to implement pipe(2). - Make Linux binary emulation use kern_pipe(), which means we don't have to recover td_retval after calling the FreeBSD system call. Approved by: rdivacky Discussed on: arch	2008-11-11 14:55:59 +00:00
gallatin	5b802b524b	Avoid scheduling firmware taskqs when cold. This prevents a panic which occurs when a driver attempts to load firmware at boot via firmware_get() when the firmware module has not been preloaded. firmware_get() will enqueue a task using a struct taskqueue allocated on the stack, and the machine will crash much later in the firmware taskq thread when taskqs are started and the struct taskqueue is garbage. Not objected to by: sam	2008-11-11 12:25:08 +00:00
ed	7baae41248	Regenerate system call tables for r184789.	2008-11-09 10:48:06 +00:00
ed	9d3703b842	Mark uname(), getdomainname() and setdomainname() with COMPAT_FREEBSD4. Looking at our source code history, it seems the uname(), getdomainname() and setdomainname() system calls got deprecated somewhere after FreeBSD 1.1, but they have never been phased out properly. Because we don't have a COMPAT_FREEBSD1, just use COMPAT_FREEBSD4. Also fix the Linuxolator to build without the setdomainname() routine by just making it call userland_sysctl on kern.domainname. Also replace the setdomainname()'s implementation to use this approach, because we're duplicating code with sysctl_domainname(). I wasn't able to keep these three routines working in our COMPAT_FREEBSD32, because that would require yet another keyword for syscalls.master (COMPAT4+NOPROTO). Because this routine is probably unused already, this won't be a problem in practice. If it turns out to be a problem, we'll just restore this functionality. Reviewed by: rdivacky, kib	2008-11-09 10:45:13 +00:00
kmacy	7f1b49e5c3	make kern.ipc.nmbclusters actually have a useful effect on nmbclusters et al. initialize pkthdr in field order	2008-11-09 01:53:06 +00:00
ed	c3eca8b8cc	Reduce the default baud rate of PTY's to 9600. On RELENG_6 (and probably RELENG_7) we see our syscons windows and pseudo-terminals have the following buffer sizes: \| LINE RAW CAN OUT IHIWT ILOWT OHWT LWT COL STATE SESS PGID DISC \| ttyv0 0 0 0 7680 6720 2052 256 7 OCcl 1146 1146 term \| ttyp0 0 0 0 7680 6720 1296 256 0 OCc 82033 82033 term These buffer sizes make no sense, because we often have much more output than input, but I guess having higher input buffer sizes improves guarantees of the system. On MPSAFE TTY I just sent both the input and output buffer sizes to 7 KB, which is pretty big on a standard FreeBSD install with 8 syscons windows and some PTY's. Reduce the baud rate to 9600 baud, which means we now have the following buffer sizes: \| LINE INQ CAN LIN LOW OUTQ USE LOW COL SESS PGID STATE \| ttyv0 1920 0 0 192 1984 0 199 7 2401 2401 Oil \| pts/0 1920 0 0 192 1984 0 199 5631 1305 2526 Oi This is a lot smaller, but for pseudo-devices this should be good enough. You need to do a lot of punching to fill up a 7.5 KB input buffer. If it turns out things don't work out this way, we'll just switch to 19200 baud.	2008-11-08 20:40:39 +00:00
rodrigc	ca625c199f	Merge latest DTrace changes from Perforce. Approved by: jb	2008-11-05 19:40:36 +00:00
davidxu	1ebf3ee9a3	Revert rev 184216 and 184199, due to the way the thread_lock works, it may cause a lockup. Noticed by: peter, jhb	2008-11-05 03:01:23 +00:00
jhb	fb34cedca4	Use shared vnode locks for auditing vnode arguments as auditing only does a VOP_GETATTR() which does not require an exclusive lock. Reviewed by: csjp, rwatson	2008-11-04 22:31:04 +00:00

1 2 3 4 5 ...

10814 Commits