freebsd-nq

Author	SHA1	Message	Date
Attilio Rao	80002a63db	Add intermediate states for attaching and detaching that will be reused by the enhached newbus locking once it is checked in. This change can be easilly MFCed to STABLE_8 at the appropriate moment. Reviewed by: jhb, scottl Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>	2009-09-03 13:40:41 +00:00
Attilio Rao	8d3635c4db	Fix some bugs related to adaptive spinning: In the lockmgr support: - GIANT_RESTORE() is just called when the sleep finishes, so the current code can ends up into a giant unlock problem. Fix it by appropriately call GIANT_RESTORE() when needed. Note that this is not exactly ideal because for any interation of the adaptive spinning we drop and restore Giant, but the overhead should be not a factor. - In the lock held in exclusive mode case, after the adaptive spinning is brought to completition, we should just retry to acquire the lock instead to fallthrough. Fix that. - Fix a style nit In the sx support: - Call GIANT_SAVE() before than looping. This saves some overhead because in the current code GIANT_SAVE() is called several times. Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>	2009-09-02 17:33:51 +00:00
Konstantin Belousov	579b976090	Fix mount reference leak when V_XSLEEP is specified to vn_start_write(). Submitted by: tegge	2009-09-01 12:05:39 +00:00
Konstantin Belousov	8a945d109c	Reintroduce the r196640, after fixing the problem with my testing. Remove the altkstacks, instead instantiate threads with kernel stack allocated with the right size from the start. For the thread that has kernel stack cached, verify that requested stack size is equial to the actual, and reallocate the stack if sizes differ [1]. This fixes the bug introduced by r173361 that was committed several days after r173004 and consisted of kthread_add(9) ignoring the non-default kernel stack size. Also, r173361 removed the caching of the kernel stacks for a non-first thread in the process. Introduce separate kernel stack cache that keeps some limited amount of preallocated kernel stacks to lower the latency of thread allocation. Add vm_lowmem handler to prune the cache on low memory condition. This way, system with reasonable amount of the threads get lower latency of thread creation, while still not exhausting significant portion of KVA for unused kstacks. Submitted by: peter [1] Discussed with: jhb, julian, peter Reviewed by: jhb Tested by: pho (and retested according to new test scenarious) MFC after: 1 week	2009-09-01 11:41:51 +00:00
Konstantin Belousov	a505c2c72c	Make the mnt_writeopcount and mnt_secondary_writes counters, used by the suspension code, not greater then mnt_ref reference counter value. Increment mnt_ref together with write counter in vn_start_write()/ vn_start_secondary_write(), releasing in vn_finished_write/vn_finished_secondary_write(). Since r186197, unmount code requires that no writers occured after all references are expired. We still could get write counter incremented for freed or reused struct mount, but it seems to be innocent, since corresponding vnode should be referenced and reclaimed then. Reported by: pho (last half a year), erwin Reviewed by: attilio Tested by: pho, erwin MFC after: 1 week	2009-08-31 10:20:52 +00:00
Bjoern A. Zeeb	ecc2fda872	Make sure FreeBSD binaries without .note.ABI-tag section work correctly and do not match a colliding Debian GNU/kFreeBSD brandinfo statements. For this mark the Debian GNU/kFreeBSD brandinfo that it must have an .note.ABI-tag section and ignore the old EI_OSABI brandinfo when comparing a possibly colliding set of options. Due to SYSINIT we add the brandinfo in a non-deterministic order, so native FreeBSD is not always first. We may want to consider to force native FreeBSD to come first as well. The only way a problem could currently be noticed is when running an i386 binary without the .note.ABI-tag on amd64 and the Debian GNU/kFreeBSD brandinfo was matched first, as the fallback to ld-elf32.so.1 does not exist in that case. Reported and tested by: ticso In collaboration with: kib MFC after: 3 days	2009-08-30 14:38:17 +00:00
Konstantin Belousov	f25fa6abb2	Reverse r196640 and r196644 for now.	2009-08-29 21:53:08 +00:00
Konstantin Belousov	b6b2d1bf88	Dispose the kernel stack of the proper thread. Submitted by: alc MFC after: 1 week	2009-08-29 18:01:02 +00:00
Konstantin Belousov	c3cf0b476f	Remove the altkstacks, instead instantiate threads with kernel stack allocated with the right size from the start. For the thread that has kernel stack cached, verify that requested stack size is equial to the actual, and reallocate the stack if sizes differ [1]. This fixes the bug introduced by r173361 that was committed several days after r173004 and consisted of kthread_add(9) ignoring the non-default kernel stack size. Also, r173361 removed the caching of the kernel stacks for a non-first thread in the process. Introduce separate kernel stack cache that keeps some limited amount of preallocated kernel stacks to lower the latency of thread allocation. Add vm_lowmem handler to prune the cache on low memory condition. This way, system with reasonable amount of the threads get lower latency of thread creation, while still not exhausting significant portion of KVA for unused kstacks. Submitted by: peter [1] Discussed with: jhb, julian, peter Reviewed by: jhb Tested by: pho MFC after: 1 week	2009-08-29 13:28:02 +00:00
John Baldwin	2fa8c8d21e	Extend the device pager to support different memory attributes on different pages in an object. - Add a new variant of d_mmap() currently called d_mmap2() which accepts an additional in/out parameter that is the memory attribute to use for the requested page. - A driver either uses d_mmap() or d_mmap2() for all requests but not both. The current implementation uses a flag in the cdevsw (D_MMAP2) to indicate that the driver provides a d_mmap2() handler instead of d_mmap(). This is done to make the change ABI compatible with existing drivers and MFC'able to 7 and 8. Submitted by: alc MFC after: 1 month	2009-08-28 14:06:55 +00:00
Jamie Gritton	c4884ffa6f	Fix a LOR between allprison_lock and vnode locks by releasing allprison_lock before releasing a prison's root vnode. PR: kern/138004 Reviewed by: kib Approved by: bz (mentor) MFC after: 3 days	2009-08-27 16:15:51 +00:00
Marius Strobl	5486ffc898	Add a temporary workaround which just lets init die instead of causing a panic if it is killed due to a unsolved stack overflow seen very late during shutdown on sparc64 when the gmirror worker process exists, which is a regression introduced in 8.0. Reviewed by: kib MFC after: 3 days	2009-08-26 21:10:47 +00:00
Konstantin Belousov	4f4946d337	Honor the vfs.timestamp_precision sysctl settings for utimes(path, NULL) and similar calls. Obtained from: Petr Salinger, Debian GNU/kFreeBSD, Debian bug #489894 MFC after: 3 days	2009-08-26 14:32:37 +00:00
Jilles Tjoelker	74d1c4927a	Fix poll() on half-closed sockets, while retaining POLLHUP for fifos. This reverts part of r196460, so that sockets only return POLLHUP if both directions are closed/error. Fifos get POLLHUP by closing the unused direction immediately after creating the sockets. The tools/regression/poll/*poll.c tests now pass except for two other things: - if POLLHUP is returned, POLLIN is always returned as well instead of only when there is data left in the buffer to be read - fifo old/new reader distinction does not work the way POSIX specs it Reviewed by: kib, bde	2009-08-25 21:44:14 +00:00
Warner Losh	58a745889f	Rather than havnig enabled/disabled, implement a max queue depth. While usually not an issue, this firewalls bugs in the code that may run us out of memory. Fix a memory exhaustion in the case where devctl was disabled, but the link was bouncing. The check to queue was in the wrong place. Implement a new sysctl hw.bus.devctl_queue to control the depth. Make compatibility hacks for hw.bus.devctl_disable to ease transition. Reviewed by: emaste@ Approved by: re@ (kib) MFC after: asap	2009-08-25 06:25:59 +00:00
Bjoern A. Zeeb	89ffc202d6	Fix handling of .note.ABI-tag section for GNU systems [1]. Handle GNU/Linux according to LSB Core Specification 4.0, Chapter 11. Object Format, 11.8. ABI note tag. Also check the first word of desc, not only name, according to glibc abi-tags specification to distinguish between Linux and kFreeBSD. Add explicit handling for Debian GNU/kFreeBSD, which runs on our kernels as well [2]. In {amd64,i386}/trap.c, when checking osrel of the current process, also check the ABI to not change the signal behaviour for Linux binary processes, now that we save an osrel version for all three from the lists above in struct proc [2]. These changes make it possible to run FreeBSD, Debian GNU/kFreeBSD and Linux binaries on the same machine again for at least i386 and amd64, and no longer break kFreeBSD which was detected as GNU(/Linux). PR: kern/135468 Submitted by: dchagin [1] (initial patch) Suggested by: kib [2] Tested by: Petr Salinger (Petr.Salinger seznam.cz) for kFreeBSD Reviewed by: kib MFC after: 3 days	2009-08-24 16:19:47 +00:00
Ed Schouten	2992abe047	Allow multiple console devices per driver without insane code duplication. Say, a driver wants to have multiple console devices to pick from, you would normally write down something like this: CONSOLE_DRIVER(dev1); CONSOLE_DRIVER(dev2); Unfortunately, this means that you have to declare 10 cn routines, instead of 5. It also isn't possible to initialize cn_arg on beforehand. I noticed this restriction when I was implementing some of the console bits for my vt(4) driver in my newcons branch. I have a single set of cn routines (termcn_*) which are shared by all vt(4) console devices. In order to solve this, I'm adding a separate consdev_ops structure, which contains all the function pointers. This structure is referenced through consdev's cn_ops field. While there, I'm removing CONS_DRIVER() and cn_checkc, which have been deprecated for years. They weren't used throughout the source, until the Xen console driver showed up. CONSOLE_DRIVER() has been changed to do the right thing. It now declares both the consdev and consdev_ops structure and ties them together. In other words: this change doesn't change the KPI for drivers that used the regular way of declaring console devices. If drivers want to use multiple console devices, they can do this as follows: static const struct consdev_ops mydriver_cnops = { .cn_probe = mydriver_cnprobe, ... }; static struct mydriver_softc cons0_softc = { ... }; CONSOLE_DEVICE(cons0, mydriver_cnops, &cons0_softc); static struct mydriver_softc cons1_softc = { ... }; CONSOLE_DEVICE(cons1, mydriver_cnops, &cons1_softc); Obtained from: //depot/user/ed/newcons/...	2009-08-24 10:53:30 +00:00
Marko Zec	0cb8b6a9b7	When "jail -c vnet" request fails, the current code actually creates and leaves behind an orphaned vnet. This change ensures that such vnets get released. This change affects only options VIMAGE builds. Submitted by: jamie Discussed with: bz Approved by: re (rwatson), julian (mentor) MFC after: 3 days	2009-08-24 10:16:19 +00:00
Marko Zec	d57425ab65	When registering a protocol to an existing protocol domain via pf_proto_register(), iterate over all existing vnets to call protosw_init() and thus the appropriate .pr_init() handler in the context of each vnet. NB in the future we probably want to separate pr_init() handlers into two, i.e. per-vnet and global, functions. This change has no impact on nooptions VIMAGE builds. Approved by: re (rwatson), julian (mentor) MFC after: 3 days	2009-08-24 10:03:41 +00:00
Robert Watson	77dfcdc445	Rework global locks for interface list and index management, correcting several critical bugs, including race conditions and lock order issues: Replace the single rwlock, ifnet_lock, with two locks, an rwlock and an sxlock. Either can be held to stablize the lists and indexes, but both are required to write. This allows the list to be held stable in both network interrupt contexts and sleepable user threads across sleeping memory allocations or device driver interactions. As before, writes to the interface list must occur from sleepable contexts. Reviewed by: bz, julian MFC after: 3 days	2009-08-23 20:40:19 +00:00
Ed Schouten	bfdaa52382	Allow pty(4) to be loaded as a kld. Unfortunately, the wrappers that are present in pts(4) don't have the mechanics to allow pty(4) to be unloaded safely, so I'm forcing this kld to return EBUSY. This also means we have to enable some extra code in pts(4) unconditionally. Proposed by: rwatson	2009-08-23 20:26:09 +00:00
Konstantin Belousov	f2159cc790	Fix the conformance of poll(2) for sockets after r195423 by returning POLLHUP instead of POLLIN for several cases. Now, the tools/regression/poll results for FreeBSD are closer to that of the Solaris and Linux. Also, improve the POSIX conformance by explicitely clearing POLLOUT when POLLHUP is reported in pollscan(), making the fix global. Submitted by: bde Reviewed by: rwatson MFC after: 1 week	2009-08-23 12:44:15 +00:00
Rui Paulo	88579aa572	Constify prime numbers.	2009-08-23 09:55:06 +00:00
Ed Schouten	5c67885a26	Add ttydisc_rint_simple(). I noticed several drivers in our tree don't actually care about parity and framing, such as pts(4), snp(4) (and my partially finished console driver). Instead of duplicating a lot of code, I think we'd better add a utility function for those drivers to quickly process a buffer of input. Also change pts(4) and snp(4) to use this function.	2009-08-23 08:04:40 +00:00
John Baldwin	eb5a1e8f38	This patch fixes two bugs in sglist(9) and improves robustness of the API via better semantics if a request to append an address range to an existing list fails. - When cloning an sglist, properly set the length in the new sglist instead of leaving the new list empty. - Properly compute the amount of data added to an sglist via _sglist_append_buf(). This allows sglist_consume_uio() to properly update uio_resid. - When a request to append an address range to a scatter/gather list fails, restore the sglist to the state it had at the start of the function call instead of resetting it to an empty list. Requested by: np (3) Approved by: re (kib)	2009-08-21 02:59:07 +00:00
John Baldwin	0cef25aeb2	Change the 'resid' parameter to sglist_consume_uio() from an int to a size_t to match the recent type change of the uio_resid member of struct uio. Approved by: re (kib)	2009-08-20 19:23:58 +00:00
John Baldwin	a56fe095f0	Temporarily revert the new-bus locking for 8.0 release. It will be reintroduced after HEAD is reopened for commits by re@. Approved by: re (kib), attilio	2009-08-20 19:17:53 +00:00
Ed Schouten	6c813c1ccc	Small changes to the warning message generated by pty(4): - Only print the warning once, instead of filling up the screen. - Use the word "legacy" for the pty_warningcnt description, to prevent confusion. - Use log() instead of printf(). Discussed with: rwatson, jhb Approved by: re (kib)	2009-08-19 14:30:46 +00:00
Pawel Jakub Dawidek	e477e4fe8e	Remove unused taskqueue_find() function. Reviewed by: dfr Approved by: re (kib)	2009-08-18 13:55:48 +00:00
Attilio Rao	353998acc3	* Change the scope of the ASSERT_ATOMIC_LOAD() from a generic check to a pointer-fetching specific operation check. Consequently, rename the operation ASSERT_ATOMIC_LOAD_PTR(). * Fix the implementation of ASSERT_ATOMIC_LOAD_PTR() by checking directly alignment on the word boundry, for all the given specific architectures. That's a bit too strict for some common case, but it assures safety. * Add a comment explaining the scope of the macro * Add a new stub in the lockmgr specific implementation Tested by: marcel (initial version), marius Reviewed by: rwatson, jhb (comment specific review) Approved by: re (kib)	2009-08-17 16:17:21 +00:00
Pawel Jakub Dawidek	159ef108e1	Remove OpenSolaris taskq port (it performs very poorly in our kernel) and replace it with wrappers around our taskqueue(9). To make it possible implement taskqueue_member() function which returns 1 if the given thread was created by the given taskqueue. Approved by: re (kib)	2009-08-17 09:01:20 +00:00
Pawel Jakub Dawidek	6a3b289388	Because taskqueue_run() can drop tq_mutex, we need to check if the TQ_FLAGS_ACTIVE flag wasn't removed in the meantime, which means we missed a wakeup. Approved by: re (kib)	2009-08-17 08:42:34 +00:00
Ed Schouten	f6b3749db3	Fix small style regression introduced by the MPSAFE newbus code. Approved by: re (rwatson)	2009-08-16 19:55:53 +00:00
Robert Watson	a38fdf2946	Rather than fix questionable ifnet list locking in the implementation of the kern.polling.enable sysctl, remove the sysctl. It has been deprecated since FreeBSD 6 in favour of per-ifnet polling flags. Reviewed by: luigi Approved by: re (kib)	2009-08-15 23:07:43 +00:00
Bjoern A. Zeeb	8d518523cc	Add a new macro to test that a variable could be loaded atomically. Check that the given variable is at most uintptr_t in size and that it is aligned. Note: ASSERT_ATOMIC_LOAD() uses ALIGN() to check for adequate alignment -- however, the function of ALIGN() is to guarantee alignment, and therefore may lead to stronger alignment enforcement than necessary for types that are smaller than sizeof(uintptr_t). Add checks to mtx, rw and sx locks init functions to detect possible breakage. This was used during debugging of the problem fixed with r196118 where a pointer was on an un-aligned address in the dpcpu area. In collaboration with: rwatson Reviewed by: rwatson Approved by: re (kib)	2009-08-14 21:46:54 +00:00
Konstantin Belousov	8f40845151	Correctly handle unlock for !MAKEENTRY case, after successfull attempt of lock upgrade cache shall be unlocked from write. Reported by: Lucius Windschuh <lwindschuh googlemail com> Reviewed by: kan Approved by: re (rwatson)	2009-08-14 10:57:28 +00:00
Attilio Rao	dc6fbf6545	* Completely Remove the option STOP_NMI from the kernel. This option has proven to have a good effect when entering KDB by using a NMI, but it completely violates all the good rules about interrupts disabled while holding a spinlock in other occasions. This can be the cause of deadlocks on events where a normal IPI_STOP is expected. * Adds an new IPI called IPI_STOP_HARD on all the supported architectures. This IPI is responsible for sending a stop message among CPUs using a privileged channel when disponible. In other cases it just does match a normal IPI_STOP. Right now the IPI_STOP_HARD functionality uses a NMI on ia32 and amd64 architectures, while on the other has a normal IPI_STOP effect. It is responsibility of maintainers to eventually implement an hard stop when necessary and possible. * Use the new IPI facility in order to implement a new userend SMP kernel function called stop_cpus_hard(). That is specular to stop_cpu() but it does use the privileged channel for the stopping facility. * Let KDB use the newly introduced function stop_cpus_hard() and leave stop_cpus() for all the other cases * Disable interrupts on CPU0 when starting the process of APs suspension. * Style cleanup and comments adding This patch should fix the reboot/shutdown deadlocks many users are constantly reporting on mailing lists. Please don't forget to update your config file with the STOP_NMI option removal Reviewed by: jhb Tested by: pho, bz, rink Approved by: re (kib)	2009-08-13 17:09:45 +00:00
Bjoern A. Zeeb	eb79e1c76e	Make it possible to change the vnet sysctl variables on jails with their own virtual network stack. Jails only inheriting a network stack cannot change anything that cannot be changed from within a prison. Reviewed by: rwatson, zec Approved by: re (kib)	2009-08-13 10:26:34 +00:00
Bjoern A. Zeeb	57aea6dfb8	Make the kernel compile without IP networking by moving a variable under a proper #ifdef. Approved by: re (rwatson)	2009-08-12 12:12:23 +00:00
Bjoern A. Zeeb	f2140faed1	Add ddb show dpcpu_off command to ease dpcpu memory debugging. While show pcpu prints pc_dynamic this also prints the original memory address as well as the maths. Once dpcpu goes NUMA this is considered to help debugging as well. Reviewed by: rwatson Approved by: re	2009-08-12 12:06:16 +00:00
Julian Elischer	9734411552	Stop uuidgen(2) from crashing in vimage kerenels. make curvnet valid when needed. Reviewed by: bz@ Approved by: re (kib)	2009-08-02 16:59:02 +00:00
Attilio Rao	444b91868b	Make the newbus subsystem Giant free by adding the new newbus sxlock. The newbus lock is responsible for protecting newbus internIal structures, device states and devclass flags. It is necessary to hold it when all such datas are accessed. For the other operations, softc locking should ensure enough protection to avoid races. Newbus lock is automatically held when virtual operations on the device and bus are invoked when loading the driver or when the suspend/resume take place. For other 'spourious' operations trying to access/modify the newbus topology, newbus lock needs to be automatically acquired and dropped. For the moment Giant is also acquired in some key point (modules subsystem) in order to avoid problems before the 8.0 release as module handlers could make assumptions about it. This Giant locking should go just after the release happens. Please keep in mind that the public interface can be expanded in order to provide more support, if there are really necessities at some point and also some bugs could arise as long as the patch needs a bit of further testing. Bump __FreeBSD_version in order to reflect the newbus lock introduction. Reviewed by: ed, hps, jhb, imp, mav, scottl No answer by: ariff, thompsa, yongari Tested by: pho, G. Trematerra <giovanni dot trematerra at gmail dot com>, Brandon Gooch <jamesbrandongooch at gmail dot com> Sponsored by: Yahoo! Incorporated Approved by: re (ksmith)	2009-08-02 14:28:40 +00:00
Ed Schouten	d40b91cb13	Fix two bugs related to TTY input: - fix write() on pseudo-terminal masters to return the amount of bytes passed to the TTY, not the amount of bytes read from user. - fix ttydisc_rint_bypass() to set the high watermark when it cannot write all input, just like ttydisc_rint() itself. Approved by: re (kib)	2009-08-02 14:25:26 +00:00
Robert Watson	530c006014	Merge the remainder of kern_vimage.c and vimage.h into vnet.c and vnet.h, we now use jails (rather than vimages) as the abstraction for virtualization management, and what remained was specific to virtual network stacks. Minor cleanups are done in the process, and comments updated to reflect these changes. Reviewed by: bz Approved by: re (vimage blanket)	2009-08-01 19:26:27 +00:00
Jamie Gritton	42bebd8205	Make the "enforce_statfs" default 2 (most restrictive) in jail_set(2), instead of whatever the parent/system has (which is generally 0). This mirrors the old-style default used for jail(2) in conjunction with the security.jail.enforce_statfs sysctl. Approved by: re (kib), bz (mentor)	2009-07-31 16:00:41 +00:00
John Baldwin	87eca70e0c	Fix some LORs between vnode locks and filedescriptor table locks. - Don't grab the filedesc lock just to read fd_cmask. - Drop vnode locks earlier when mounting the root filesystem and before sanitizing stdin/out/err file descriptors during execve(). Submitted by: kib Approved by: re (rwatson) MFC after: 1 week	2009-07-31 13:40:06 +00:00
Jamie Gritton	2b0d6f815e	Remove a LOR, where the the sleepable allprison_lock was being obtained in prison_equal_ip4/6 while an inp mutex was held. Locking allprison_lock can be avoided by making a restriction on the IP addresses associated with jails: Don't allow the "ip4" and "ip6" parameters to be changed after a jail is created. Setting the "ip4.addr" and "ip6.addr" parameters is allowed, but only if the jail was already created with either ip4/6=new or ip4/6=disable. With this restriction, the prison flags in question (PR_IP4_USER and PR_IP6_USER) become read-only and can be checked without locking. This also allows the simplification of a messy code path that was needed to handle an existing prison gaining an IP address list. PR: kern/136899 Reported by: Dirk Meyer Approved by: re (kib), bz (mentor)	2009-07-30 14:28:56 +00:00
Jamie Gritton	bdfc8cc4cd	Don't allow mixing the "vnet" and "ip4/6" jail parameters, since vnet jails have their own IP stack and don't have access to the parent IP addresses anyway. Note that a virtual network stack forms a break between prisons with regard to the list of allowed IP addresses. Approved by: re (kib), bz (mentor)	2009-07-29 16:46:59 +00:00
Jamie Gritton	8986e3a023	Change the default value of the "ip4" and "ip6" jail parameters to "disable", which only allows access to the parent/physical system's IP addresses when specifically directed. Change the default value of "host" to "new", and don't copy the parent host values, to insulate jails from the parent hostname et al. Approved by: re (kib), bz (mentor)	2009-07-29 16:41:02 +00:00
Robert Watson	791b0ad2bf	Eliminate ARG_UPATH[12] arguments to AUDIT_ARG_UPATH() and instead provide specific macros, AUDIT_ARG_UPATH1() and AUDIT_ARG_UPATH2() to capture path information for audit records. This allows us to move the definitions of ARG_* out of the public audit header file, as they are an implementation detail of our current kernel-internal audit record, which may change. Approved by: re (kensmith) Obtained from: TrustedBSD Project MFC after: 1 month	2009-07-29 07:44:43 +00:00

1 2 3 4 5 ...

11347 Commits