freebsd-skq

Author	SHA1	Message	Date
Warner Losh	58a745889f	Rather than havnig enabled/disabled, implement a max queue depth. While usually not an issue, this firewalls bugs in the code that may run us out of memory. Fix a memory exhaustion in the case where devctl was disabled, but the link was bouncing. The check to queue was in the wrong place. Implement a new sysctl hw.bus.devctl_queue to control the depth. Make compatibility hacks for hw.bus.devctl_disable to ease transition. Reviewed by: emaste@ Approved by: re@ (kib) MFC after: asap	2009-08-25 06:25:59 +00:00
Bjoern A. Zeeb	89ffc202d6	Fix handling of .note.ABI-tag section for GNU systems [1]. Handle GNU/Linux according to LSB Core Specification 4.0, Chapter 11. Object Format, 11.8. ABI note tag. Also check the first word of desc, not only name, according to glibc abi-tags specification to distinguish between Linux and kFreeBSD. Add explicit handling for Debian GNU/kFreeBSD, which runs on our kernels as well [2]. In {amd64,i386}/trap.c, when checking osrel of the current process, also check the ABI to not change the signal behaviour for Linux binary processes, now that we save an osrel version for all three from the lists above in struct proc [2]. These changes make it possible to run FreeBSD, Debian GNU/kFreeBSD and Linux binaries on the same machine again for at least i386 and amd64, and no longer break kFreeBSD which was detected as GNU(/Linux). PR: kern/135468 Submitted by: dchagin [1] (initial patch) Suggested by: kib [2] Tested by: Petr Salinger (Petr.Salinger seznam.cz) for kFreeBSD Reviewed by: kib MFC after: 3 days	2009-08-24 16:19:47 +00:00
Ed Schouten	2992abe047	Allow multiple console devices per driver without insane code duplication. Say, a driver wants to have multiple console devices to pick from, you would normally write down something like this: CONSOLE_DRIVER(dev1); CONSOLE_DRIVER(dev2); Unfortunately, this means that you have to declare 10 cn routines, instead of 5. It also isn't possible to initialize cn_arg on beforehand. I noticed this restriction when I was implementing some of the console bits for my vt(4) driver in my newcons branch. I have a single set of cn routines (termcn_*) which are shared by all vt(4) console devices. In order to solve this, I'm adding a separate consdev_ops structure, which contains all the function pointers. This structure is referenced through consdev's cn_ops field. While there, I'm removing CONS_DRIVER() and cn_checkc, which have been deprecated for years. They weren't used throughout the source, until the Xen console driver showed up. CONSOLE_DRIVER() has been changed to do the right thing. It now declares both the consdev and consdev_ops structure and ties them together. In other words: this change doesn't change the KPI for drivers that used the regular way of declaring console devices. If drivers want to use multiple console devices, they can do this as follows: static const struct consdev_ops mydriver_cnops = { .cn_probe = mydriver_cnprobe, ... }; static struct mydriver_softc cons0_softc = { ... }; CONSOLE_DEVICE(cons0, mydriver_cnops, &cons0_softc); static struct mydriver_softc cons1_softc = { ... }; CONSOLE_DEVICE(cons1, mydriver_cnops, &cons1_softc); Obtained from: //depot/user/ed/newcons/...	2009-08-24 10:53:30 +00:00
Marko Zec	0cb8b6a9b7	When "jail -c vnet" request fails, the current code actually creates and leaves behind an orphaned vnet. This change ensures that such vnets get released. This change affects only options VIMAGE builds. Submitted by: jamie Discussed with: bz Approved by: re (rwatson), julian (mentor) MFC after: 3 days	2009-08-24 10:16:19 +00:00
Marko Zec	d57425ab65	When registering a protocol to an existing protocol domain via pf_proto_register(), iterate over all existing vnets to call protosw_init() and thus the appropriate .pr_init() handler in the context of each vnet. NB in the future we probably want to separate pr_init() handlers into two, i.e. per-vnet and global, functions. This change has no impact on nooptions VIMAGE builds. Approved by: re (rwatson), julian (mentor) MFC after: 3 days	2009-08-24 10:03:41 +00:00
Robert Watson	77dfcdc445	Rework global locks for interface list and index management, correcting several critical bugs, including race conditions and lock order issues: Replace the single rwlock, ifnet_lock, with two locks, an rwlock and an sxlock. Either can be held to stablize the lists and indexes, but both are required to write. This allows the list to be held stable in both network interrupt contexts and sleepable user threads across sleeping memory allocations or device driver interactions. As before, writes to the interface list must occur from sleepable contexts. Reviewed by: bz, julian MFC after: 3 days	2009-08-23 20:40:19 +00:00
Ed Schouten	bfdaa52382	Allow pty(4) to be loaded as a kld. Unfortunately, the wrappers that are present in pts(4) don't have the mechanics to allow pty(4) to be unloaded safely, so I'm forcing this kld to return EBUSY. This also means we have to enable some extra code in pts(4) unconditionally. Proposed by: rwatson	2009-08-23 20:26:09 +00:00
Konstantin Belousov	f2159cc790	Fix the conformance of poll(2) for sockets after r195423 by returning POLLHUP instead of POLLIN for several cases. Now, the tools/regression/poll results for FreeBSD are closer to that of the Solaris and Linux. Also, improve the POSIX conformance by explicitely clearing POLLOUT when POLLHUP is reported in pollscan(), making the fix global. Submitted by: bde Reviewed by: rwatson MFC after: 1 week	2009-08-23 12:44:15 +00:00
Rui Paulo	88579aa572	Constify prime numbers.	2009-08-23 09:55:06 +00:00
Ed Schouten	5c67885a26	Add ttydisc_rint_simple(). I noticed several drivers in our tree don't actually care about parity and framing, such as pts(4), snp(4) (and my partially finished console driver). Instead of duplicating a lot of code, I think we'd better add a utility function for those drivers to quickly process a buffer of input. Also change pts(4) and snp(4) to use this function.	2009-08-23 08:04:40 +00:00
John Baldwin	eb5a1e8f38	This patch fixes two bugs in sglist(9) and improves robustness of the API via better semantics if a request to append an address range to an existing list fails. - When cloning an sglist, properly set the length in the new sglist instead of leaving the new list empty. - Properly compute the amount of data added to an sglist via _sglist_append_buf(). This allows sglist_consume_uio() to properly update uio_resid. - When a request to append an address range to a scatter/gather list fails, restore the sglist to the state it had at the start of the function call instead of resetting it to an empty list. Requested by: np (3) Approved by: re (kib)	2009-08-21 02:59:07 +00:00
John Baldwin	0cef25aeb2	Change the 'resid' parameter to sglist_consume_uio() from an int to a size_t to match the recent type change of the uio_resid member of struct uio. Approved by: re (kib)	2009-08-20 19:23:58 +00:00
John Baldwin	a56fe095f0	Temporarily revert the new-bus locking for 8.0 release. It will be reintroduced after HEAD is reopened for commits by re@. Approved by: re (kib), attilio	2009-08-20 19:17:53 +00:00
Ed Schouten	6c813c1ccc	Small changes to the warning message generated by pty(4): - Only print the warning once, instead of filling up the screen. - Use the word "legacy" for the pty_warningcnt description, to prevent confusion. - Use log() instead of printf(). Discussed with: rwatson, jhb Approved by: re (kib)	2009-08-19 14:30:46 +00:00
Pawel Jakub Dawidek	e477e4fe8e	Remove unused taskqueue_find() function. Reviewed by: dfr Approved by: re (kib)	2009-08-18 13:55:48 +00:00
Attilio Rao	353998acc3	* Change the scope of the ASSERT_ATOMIC_LOAD() from a generic check to a pointer-fetching specific operation check. Consequently, rename the operation ASSERT_ATOMIC_LOAD_PTR(). * Fix the implementation of ASSERT_ATOMIC_LOAD_PTR() by checking directly alignment on the word boundry, for all the given specific architectures. That's a bit too strict for some common case, but it assures safety. * Add a comment explaining the scope of the macro * Add a new stub in the lockmgr specific implementation Tested by: marcel (initial version), marius Reviewed by: rwatson, jhb (comment specific review) Approved by: re (kib)	2009-08-17 16:17:21 +00:00
Pawel Jakub Dawidek	159ef108e1	Remove OpenSolaris taskq port (it performs very poorly in our kernel) and replace it with wrappers around our taskqueue(9). To make it possible implement taskqueue_member() function which returns 1 if the given thread was created by the given taskqueue. Approved by: re (kib)	2009-08-17 09:01:20 +00:00
Pawel Jakub Dawidek	6a3b289388	Because taskqueue_run() can drop tq_mutex, we need to check if the TQ_FLAGS_ACTIVE flag wasn't removed in the meantime, which means we missed a wakeup. Approved by: re (kib)	2009-08-17 08:42:34 +00:00
Ed Schouten	f6b3749db3	Fix small style regression introduced by the MPSAFE newbus code. Approved by: re (rwatson)	2009-08-16 19:55:53 +00:00
Robert Watson	a38fdf2946	Rather than fix questionable ifnet list locking in the implementation of the kern.polling.enable sysctl, remove the sysctl. It has been deprecated since FreeBSD 6 in favour of per-ifnet polling flags. Reviewed by: luigi Approved by: re (kib)	2009-08-15 23:07:43 +00:00
Bjoern A. Zeeb	8d518523cc	Add a new macro to test that a variable could be loaded atomically. Check that the given variable is at most uintptr_t in size and that it is aligned. Note: ASSERT_ATOMIC_LOAD() uses ALIGN() to check for adequate alignment -- however, the function of ALIGN() is to guarantee alignment, and therefore may lead to stronger alignment enforcement than necessary for types that are smaller than sizeof(uintptr_t). Add checks to mtx, rw and sx locks init functions to detect possible breakage. This was used during debugging of the problem fixed with r196118 where a pointer was on an un-aligned address in the dpcpu area. In collaboration with: rwatson Reviewed by: rwatson Approved by: re (kib)	2009-08-14 21:46:54 +00:00
Konstantin Belousov	8f40845151	Correctly handle unlock for !MAKEENTRY case, after successfull attempt of lock upgrade cache shall be unlocked from write. Reported by: Lucius Windschuh <lwindschuh googlemail com> Reviewed by: kan Approved by: re (rwatson)	2009-08-14 10:57:28 +00:00
Attilio Rao	dc6fbf6545	* Completely Remove the option STOP_NMI from the kernel. This option has proven to have a good effect when entering KDB by using a NMI, but it completely violates all the good rules about interrupts disabled while holding a spinlock in other occasions. This can be the cause of deadlocks on events where a normal IPI_STOP is expected. * Adds an new IPI called IPI_STOP_HARD on all the supported architectures. This IPI is responsible for sending a stop message among CPUs using a privileged channel when disponible. In other cases it just does match a normal IPI_STOP. Right now the IPI_STOP_HARD functionality uses a NMI on ia32 and amd64 architectures, while on the other has a normal IPI_STOP effect. It is responsibility of maintainers to eventually implement an hard stop when necessary and possible. * Use the new IPI facility in order to implement a new userend SMP kernel function called stop_cpus_hard(). That is specular to stop_cpu() but it does use the privileged channel for the stopping facility. * Let KDB use the newly introduced function stop_cpus_hard() and leave stop_cpus() for all the other cases * Disable interrupts on CPU0 when starting the process of APs suspension. * Style cleanup and comments adding This patch should fix the reboot/shutdown deadlocks many users are constantly reporting on mailing lists. Please don't forget to update your config file with the STOP_NMI option removal Reviewed by: jhb Tested by: pho, bz, rink Approved by: re (kib)	2009-08-13 17:09:45 +00:00
Bjoern A. Zeeb	eb79e1c76e	Make it possible to change the vnet sysctl variables on jails with their own virtual network stack. Jails only inheriting a network stack cannot change anything that cannot be changed from within a prison. Reviewed by: rwatson, zec Approved by: re (kib)	2009-08-13 10:26:34 +00:00
Bjoern A. Zeeb	57aea6dfb8	Make the kernel compile without IP networking by moving a variable under a proper #ifdef. Approved by: re (rwatson)	2009-08-12 12:12:23 +00:00
Bjoern A. Zeeb	f2140faed1	Add ddb show dpcpu_off command to ease dpcpu memory debugging. While show pcpu prints pc_dynamic this also prints the original memory address as well as the maths. Once dpcpu goes NUMA this is considered to help debugging as well. Reviewed by: rwatson Approved by: re	2009-08-12 12:06:16 +00:00
Julian Elischer	9734411552	Stop uuidgen(2) from crashing in vimage kerenels. make curvnet valid when needed. Reviewed by: bz@ Approved by: re (kib)	2009-08-02 16:59:02 +00:00
Attilio Rao	444b91868b	Make the newbus subsystem Giant free by adding the new newbus sxlock. The newbus lock is responsible for protecting newbus internIal structures, device states and devclass flags. It is necessary to hold it when all such datas are accessed. For the other operations, softc locking should ensure enough protection to avoid races. Newbus lock is automatically held when virtual operations on the device and bus are invoked when loading the driver or when the suspend/resume take place. For other 'spourious' operations trying to access/modify the newbus topology, newbus lock needs to be automatically acquired and dropped. For the moment Giant is also acquired in some key point (modules subsystem) in order to avoid problems before the 8.0 release as module handlers could make assumptions about it. This Giant locking should go just after the release happens. Please keep in mind that the public interface can be expanded in order to provide more support, if there are really necessities at some point and also some bugs could arise as long as the patch needs a bit of further testing. Bump __FreeBSD_version in order to reflect the newbus lock introduction. Reviewed by: ed, hps, jhb, imp, mav, scottl No answer by: ariff, thompsa, yongari Tested by: pho, G. Trematerra <giovanni dot trematerra at gmail dot com>, Brandon Gooch <jamesbrandongooch at gmail dot com> Sponsored by: Yahoo! Incorporated Approved by: re (ksmith)	2009-08-02 14:28:40 +00:00
Ed Schouten	d40b91cb13	Fix two bugs related to TTY input: - fix write() on pseudo-terminal masters to return the amount of bytes passed to the TTY, not the amount of bytes read from user. - fix ttydisc_rint_bypass() to set the high watermark when it cannot write all input, just like ttydisc_rint() itself. Approved by: re (kib)	2009-08-02 14:25:26 +00:00
Robert Watson	530c006014	Merge the remainder of kern_vimage.c and vimage.h into vnet.c and vnet.h, we now use jails (rather than vimages) as the abstraction for virtualization management, and what remained was specific to virtual network stacks. Minor cleanups are done in the process, and comments updated to reflect these changes. Reviewed by: bz Approved by: re (vimage blanket)	2009-08-01 19:26:27 +00:00
Jamie Gritton	42bebd8205	Make the "enforce_statfs" default 2 (most restrictive) in jail_set(2), instead of whatever the parent/system has (which is generally 0). This mirrors the old-style default used for jail(2) in conjunction with the security.jail.enforce_statfs sysctl. Approved by: re (kib), bz (mentor)	2009-07-31 16:00:41 +00:00
John Baldwin	87eca70e0c	Fix some LORs between vnode locks and filedescriptor table locks. - Don't grab the filedesc lock just to read fd_cmask. - Drop vnode locks earlier when mounting the root filesystem and before sanitizing stdin/out/err file descriptors during execve(). Submitted by: kib Approved by: re (rwatson) MFC after: 1 week	2009-07-31 13:40:06 +00:00
Jamie Gritton	2b0d6f815e	Remove a LOR, where the the sleepable allprison_lock was being obtained in prison_equal_ip4/6 while an inp mutex was held. Locking allprison_lock can be avoided by making a restriction on the IP addresses associated with jails: Don't allow the "ip4" and "ip6" parameters to be changed after a jail is created. Setting the "ip4.addr" and "ip6.addr" parameters is allowed, but only if the jail was already created with either ip4/6=new or ip4/6=disable. With this restriction, the prison flags in question (PR_IP4_USER and PR_IP6_USER) become read-only and can be checked without locking. This also allows the simplification of a messy code path that was needed to handle an existing prison gaining an IP address list. PR: kern/136899 Reported by: Dirk Meyer Approved by: re (kib), bz (mentor)	2009-07-30 14:28:56 +00:00
Jamie Gritton	bdfc8cc4cd	Don't allow mixing the "vnet" and "ip4/6" jail parameters, since vnet jails have their own IP stack and don't have access to the parent IP addresses anyway. Note that a virtual network stack forms a break between prisons with regard to the list of allowed IP addresses. Approved by: re (kib), bz (mentor)	2009-07-29 16:46:59 +00:00
Jamie Gritton	8986e3a023	Change the default value of the "ip4" and "ip6" jail parameters to "disable", which only allows access to the parent/physical system's IP addresses when specifically directed. Change the default value of "host" to "new", and don't copy the parent host values, to insulate jails from the parent hostname et al. Approved by: re (kib), bz (mentor)	2009-07-29 16:41:02 +00:00
Robert Watson	791b0ad2bf	Eliminate ARG_UPATH[12] arguments to AUDIT_ARG_UPATH() and instead provide specific macros, AUDIT_ARG_UPATH1() and AUDIT_ARG_UPATH2() to capture path information for audit records. This allows us to move the definitions of ARG_* out of the public audit header file, as they are an implementation detail of our current kernel-internal audit record, which may change. Approved by: re (kensmith) Obtained from: TrustedBSD Project MFC after: 1 month	2009-07-29 07:44:43 +00:00
Robert Watson	b146fc1bf0	Rework vnode argument auditing to follow the same structure, in order to avoid exposing ARG_ macros/flag values outside of the audit code in order to name which one of two possible vnodes will be audited for a system call. Approved by: re (kib) Obtained from: TrustedBSD Project MFC after: 1 month	2009-07-28 21:52:24 +00:00
Robert Watson	e4b4bbb665	Audit file descriptors passed to fooat(2) system calls, which are used instead of the root/current working directory as the starting point for lookups. Up to two such descriptors can be audited. Add audit record BSM encoding for fooat(2). Note: due to an error in the OpenBSM 1.1p1 configuration file, a further change is required to that file in order to fix openat(2) auditing. Approved by: re (kib) Reviewed by: rdivacky (fooat(2) portions) Obtained from: TrustedBSD Project MFC after: 1 month	2009-07-28 21:39:58 +00:00
Julian Elischer	7973fba3a4	Somewhere along the line accept sockets stopped honoring the FIB selected for them. Fix this. Reviewed by: ambrisko Approved by: re (kib) MFC after: 3 days	2009-07-28 19:43:27 +00:00
Bjoern A. Zeeb	be31e5e7b5	Make the in-kernel logic for the SIOCSIFVNET, SIOCSIFRVNET ioctls (ifconfig ifN (-)vnet <jname\|jid>) work correctly. Move vi_if_move to if.c and split it up into two functions(), one for each ioctl. In the reclaim case, correctly set the vnet before calling if_vmove. Instead of silently allowing a move of an interface from the current vnet to the current vnet, return an error. () There is some duplicate interface name checking before actually moving the interface between network stacks without locking and thus race prone. Ideally if_vmove will correctly and automagically handle these in the future. Suggested by: rwatson (*) Approved by: re (kib)	2009-07-26 11:29:26 +00:00
Jamie Gritton	7cbf72137f	Some jail parameters (in particular, "ip4" and "ip6" for IP address restrictions) were found to be inadequately described by a boolean. Define a new parameter type with three values (disable, new, inherit) to handle these and future cases. Approved by: re (kib), bz (mentor) Discussed with: rwatson	2009-07-25 14:48:57 +00:00
Brooks Davis	254d03c508	Introduce a new sysctl process mib, kern.proc.groups which adds the ability to retrieve the group list of each process. Modify procstat's -s option to query this mib when the kinfo_proc reports that the field has been truncated. If the mib does not exist, fall back to the truncated list. Reviewed by: rwatson Approved by: re (kib) MFC after: 2 weeks	2009-07-24 19:12:19 +00:00
Brooks Davis	1b5768be71	Revert the changes to struct kinfo_proc in r194498. Instead, fill in up to 16 (KI_NGROUPS) values and steal a bit from ki_cr_flags (all bits currently unused) to indicate overflow with the new flag KI_CRF_GRP_OVERFLOW. This fixes procstat -s. Approved by: re (kib)	2009-07-24 15:03:10 +00:00
John Baldwin	013818111a	Add a new type of VM object: OBJT_SG. An OBJT_SG object is very similar to a device pager (OBJT_DEVICE) object in that it uses fictitious pages to provide aliases to other memory addresses. The primary difference is that it uses an sglist(9) to determine the physical addresses for a given offset into the object instead of invoking the d_mmap() method in a device driver. Reviewed by: alc Approved by: re (kensmith) MFC after: 2 weeks	2009-07-24 13:50:29 +00:00
Robert Watson	d0728d7174	Introduce and use a sysinit-based initialization scheme for virtual network stacks, VNET_SYSINIT: - Add VNET_SYSINIT and VNET_SYSUNINIT macros to declare events that will occur each time a network stack is instantiated and destroyed. In the !VIMAGE case, these are simply mapped into regular SYSINIT/SYSUNINIT. For the VIMAGE case, we instead use SYSINIT's to track their order and properties on registration, using them for each vnet when created/ destroyed, or immediately on module load for already-started vnets. - Remove vnet_modinfo mechanism that existed to serve this purpose previously, as well as its dependency scheme: we now just use the SYSINIT ordering scheme. - Implement VNET_DOMAIN_SET() to allow protocol domains to declare that they want init functions to be called for each virtual network stack rather than just once at boot, compiling down to DOMAIN_SET() in the non-VIMAGE case. - Walk all virtualized kernel subsystems and make use of these instead of modinfo or DOMAIN_SET() for init/uninit events. In some cases, convert modular components from using modevent to using sysinit (where appropriate). In some cases, do minor rejuggling of SYSINIT ordering to make room for or better manage events. Portions submitted by: jhb (VNET_SYSINIT), bz (cleanup) Discussed with: jhb, bz, julian, zec Reviewed by: bz Approved by: re (VIMAGE blanket)	2009-07-23 20:46:49 +00:00
Bjoern A. Zeeb	a08362ce46	sysctl_msec_to_ticks is used with both virtualized and non-vrtiualized sysctls so we cannot used one common function. Add a macro to convert the arg1 in the virtualized case to vnet.h to not expose the maths to all over the code. Add a wrapper for the single virtualized call, properly handling arg1 and call the default implementation from there. Convert the two over places to use the new macro. Reviewed by: rwatson Approved by: re (kib)	2009-07-21 21:58:55 +00:00
Rui Paulo	5b4d5f9ee2	Improve the printf message when a module failed to load. This gives the user some clue about the possibility of a __FreeBSD_version mismatch. Discussed with: rwatson, jhb Approved by: re (kib)	2009-07-21 14:18:25 +00:00
Robert Watson	17ef1feb8a	Add macros VNET_SETNAME and VNET_SYMPREFIX, and expose to userspace if _WANT_VNET is defined. This way we don't need separate definitions in libkvm. Reviewed by: bz Approved by: re (vimage blanket)	2009-07-20 07:50:50 +00:00
Konstantin Belousov	7ac5806bfb	When buffer write is failed, it is wrong for brelse() to invalidate portion of the page that was written. Among other problems, this page might be picked up by pagedaemon, with failed assertion in vm_pageout_flush() about validity of the page. Reported and tested by: pho Approved by: re (kensmith) MFC after: 3 weeks	2009-07-19 20:25:59 +00:00
Robert Watson	006e9db452	Normalize field naming for struct vnet, fix two debugging printfs that print them. Reviewed by: bz Approved by: re (kensmith, kib)	2009-07-19 17:40:45 +00:00

1 2 3 4 5 ...

11333 Commits