freebsd-skq

Author	SHA1	Message	Date
imp	6441be832c	Move pccard_safe_quote() up to subr_bus.c and rename to devctl_safe_quote() so it can be used more generally.	2016-03-28 20:16:29 +00:00
np	0b3b29f07b	Plug leak in m_unshare. m_unshare passes on the source mbuf's flags as-is to m_getcl and this results in a leak if the flags include M_NOFREE. The fix is to clear the bits not listed in M_COPYALL before calling m_getcl. M_RDONLY should probably be filtered out too but that's outside the scope of this fix. Add assertions in the zone_mbuf and zone_pack ctors to catch similar bugs. Update netmap_get_mbuf to not pass M_NOFREE to m_getcl. It's not clear what the original code was trying to do but it's likely incorrect. Updated code is no different functionally but it avoids the newly added assertions. Reviewed by: gnn@ Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D5698	2016-03-26 23:39:53 +00:00
cem	96eb07ae2b	Add td_swinvoltick to track last involuntary context switch Expose in DDB via "show thread." Reviewed by: markj Sponsored by: EMC / Isilon Storage Division	2016-03-25 19:35:29 +00:00
glebius	fa5cb70a2d	Space and style(9) corrections for recent mbuf changes.	2016-03-24 20:06:52 +00:00
skra	2683d49bfb	Generalize IPI support for ARM intrng and use it for interrupt controller IPI provider. New struct intr_ipi is defined which keeps all info about an IPI: its name, counter, send and dispatch methods. Generic intr_ipi_setup(), intr_ipi_send() and intr_ipi_dispatch() functions are implemented. An IPI provider must implement two functions: (1) an intr_ipi_send_t function which is able to send an IPI, (2) a setup function which initializes itself for an IPI and calls intr_ipi_setup() with appropriate arguments. Differential Revision: https://reviews.freebsd.org/D5700	2016-03-24 09:55:11 +00:00
gnn	77a2ccb155	Move mbuf provider under SDT to indicate that it is FreeBSD specific and not a stable interface. Reviewed by: markj MFC after: 2 weeks Sponsored by: Rubicon Communications (Netgate) Differential Revision: https://reviews.freebsd.org/D5716	2016-03-24 08:26:06 +00:00
bdrewery	a9f48f4d56	Pass the expected struct radix_node_head * to vfs_free_netcred. No functional change. struct radix_node_head's first element is rh so this was already referring to the same address. It was likely an unintended s/rnh/&rnh->rh/ change from r294706 as all other rnh_walktree() callers pass the expected struct radix_node_head * rather than obscurely passing the address of their first element. Sponsored by: EMC / Isilon Storage Division	2016-03-24 04:40:07 +00:00
bdrewery	91bc17507f	Fix M_RTABLE memory leak from r274118 (11/2014). Replace free(M_RTABLE) with rn_detachhead() to match rn_inithead(). This would trigger when reloading NFS exports and was similar to problems with pf reload [1]. PR: 194078 [1] Sponsored by: EMC / Isilon Storage Division	2016-03-24 03:08:39 +00:00
trasz	1c2e36026b	Wait for root mount tokens before showing the root mount prompt. This restores the pre-r290196 behaviour, eliminating the need to manually press '.' a couple of times to get USB to finish probing. Note that there's still something wrong with the console (character echoing doesn't quite work), and there's also a reported problem with BHyVe, but those two don't seem related to the problem above. MFC after: 1 month Sponsored by: The FreeBSD Foundation	2016-03-22 13:46:01 +00:00
gnn	e4786b4992	Add an mbuf provider to DTrace. The mbuf provider is made up of a set of Statically Defined Tracepoints which help us look into mbufs as they are allocated and freed. This can be used to inspect the buffers or for a simplified mbuf leak detector. New tracepoints are: mbuf:::m-init mbuf:::m-gethdr mbuf:::m-get mbuf:::m-getcl mbuf:::m-clget mbuf:::m-cljget mbuf:::m-cljset mbuf:::m-free mbuf:::m-freem There is also a translator for mbufs which gives some visibility into the structure, see mbuf.d for more details. Reviewed by: bz, markj MFC after: 2 weeks Sponsored by: Rubicon Communications (Netgate) Differential Revision: https://reviews.freebsd.org/D5682	2016-03-22 13:16:52 +00:00
jhb	ccef35d926	Regen.	2016-03-21 21:38:35 +00:00
jhb	6f8f2fe586	Fully handle size_t lengths in AIO requests. First, update the return types of aio_return() and aio_waitcomplete() to ssize_t. POSIX requires aio_return() to return a ssize_t so that it can represent all return values from read() and write(). aio_waitcomplete() should use ssize_t for the same reason. aio_return() has used ssize_t in <aio.h> since r31620 but the manpage and system call entry were not updated. aio_waitcomplete() has always returned int. Note that this does not require new system call stubs as this is effectively only an API change in how the compiler interprets the return value. Second, allow aio_nbytes values up to IOSIZE_MAX instead of just INT_MAX. aio_read/write should now honor the same length limits as normal read/write. Third, use longs instead of ints in the aio_return() and aio_waitcomplete() system call functions so that the 64-bit size_t in the in-kernel aiocb isn't truncated to 32-bits before being copied out to userland or being returned. Finally, a simple test has been added to verify the bounds checking on the maximum read size from a file.	2016-03-21 21:37:33 +00:00
maxim	86d336784d	o "avaliable" -> "available". PR: 208141 Submitted by: Tyler Littlefield	2016-03-21 08:03:50 +00:00
pfg	eea894a1e2	aio_qphysio(): Avoid uninitialized pointer read on error. For the !unmap case it may happen that pbuf gets called unreferenced when vm_fault_quick_hold_pages() fails. Initialize it so it doesn't cause trouble. CID: 1352776 Reviewed by: jhb MFC after: 1 week	2016-03-18 19:04:01 +00:00
jhibbits	720f47c9ed	Use uintmax_t (typedef'd to rman_res_t type) for rman ranges. On some architectures, u_long isn't large enough for resource definitions. Particularly, powerpc and arm allow 36-bit (or larger) physical addresses, but type `long' is only 32-bit. This extends rman's resources to uintmax_t. With this change, any resource can feasibly be placed anywhere in physical memory (within the constraints of the driver). Why uintmax_t and not something machine dependent, or uint64_t? Though it's possible for uintmax_t to grow, it's highly unlikely it will become 128-bit on 32-bit architectures. 64-bit architectures should have plenty of RAM to absorb the increase on resource sizes if and when this occurs, and the number of resources on memory-constrained systems should be sufficiently small as to not pose a drastic overhead. That being said, uintmax_t was chosen for source clarity. If it's specified as uint64_t, all printf()-like calls would either need casts to uintmax_t, or be littered with PRI64 macros. Casts to uintmax_t aren't horrible, but it would also bake into the API for resource_list_print_type() either a hidden assumption that entries get cast to uintmax_t for printing, or these calls would need the PRI64 macros. Since source code is meant to be read more often than written, I chose the clearest path of simply using uintmax_t. Tested on a PowerPC p5020-based board, which places all device resources in 0xfxxxxxxxx, and has 8GB RAM. Regression tested on qemu-system-i386 Regression tested on qemu-system-mips (malta profile) Tested PAE and devinfo on virtualbox (live CD) Special thanks to bz for his testing on ARM. Reviewed By: bz, jhb (previous) Relnotes: Yes Sponsored by: Alex Perez/Inertial Computing Differential Revision: https://reviews.freebsd.org/D4544	2016-03-18 01:28:41 +00:00
cem	6b40e40026	fail(9): Only gather/print stacks if STACK is enabled This is a follow-up fix to the earlier r296927. Reported by: bz Sponsored by: EMC / Isilon Storage Division	2016-03-17 01:05:53 +00:00
cem	1cab282ecb	fail(9): Upstreaming some fail point enhancements This is several year's worth of fail point upgrades done at EMC Isilon. They are interdependent enough that it makes sense to put a single diff up for them. Primarily, we added: - Changing all mainline execution paths to be lockless, which lets us use fail points in more sleep-sensitive areas, and allows more parallel execution - A number of additional commands, including 'pause' that lets us do some interesting deterministic repros of race conditions - The ability to dump the stacks of all threads sleeping on a fail point - A number of other API changes to allow marking up the fail point's context in the code, and firing callbacks before and after execution - A man page update Submitted by: Matthew Bryan <matthew.bryan@isilon.com> Reviewed by: cem (earlier version), jhb, kib, pho With feedback from: bdrewery Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D5427	2016-03-16 04:22:32 +00:00
glebius	743cf42b4f	Free the temporary buffer in sysctl_handle_counter_u64_array(). Submitted by: mjg	2016-03-15 00:21:32 +00:00
glebius	0cafb74055	Provide sysctl(9) macro to deal with array of counter(9).	2016-03-15 00:05:00 +00:00
gibbs	453fdfd69a	Provide high precision conversion from ns,us,ms -> sbintime in kevent In timer2sbintime(), calculate the second and fractional second portions of the sbintime separately. When calculating the the fractional second portion, use a 64bit multiply to prevent excess truncation. This avoids the ~7% error in the original conversion for ns, and smaller errors of the same type for us and ms. PR: 198139 Reviewed by: jhb MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D5397	2016-03-12 23:02:53 +00:00
jhb	419b17235a	Do not include system call wrappers in libc for old FreeBSD system calls. The base system libc is only used to run binaries built on FreeBSD 7.0 and later. It does not need to include system call wrappers for system calls only used by FreeBSD binaries built on versions older than 7.0. This was already true for "COMPAT" system calls, but now wrappers for system calls used on FreeBSD 4 and 6 are excluded as well. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D5597	2016-03-12 22:53:46 +00:00
trasz	8804916675	Refactor the way we restore cn_lkflags; no functional changes. MFC after: 1 month Sponsored by: The FreeBSD Foundation	2016-03-12 09:05:43 +00:00
trasz	beb648d9cc	Remove cn_consume from 'struct componentname'. It was never set to anything other than 0. Reviewed by: kib@ MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D5611	2016-03-12 08:50:38 +00:00
trasz	faec271eeb	Fix autofs triggering problem. Assume you have an NFS server, 192.168.1.1, with share "share". This commit fixes a problem where "mkdir /net/192.168.1.1/share/meh" would return spurious error instead of creating the directory if the target filesystem wasn't mounted yet; subsequent attempts would work correctly. The failure scenario is kind of complicated to explain, but it all boils down to calling VOP_MKDIR() for the target filesystem (NFS) with wrong dvp - the autofs vnode instead of the filesystem root mounted over it. Reviewed by: kib@ MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D5442	2016-03-12 07:54:42 +00:00
jhb	374ffbde40	Use SI_SUB_LAST instead of SI_SUB_SMP as the "catch-all" subsystem. Reviewed by: kib Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D5515	2016-03-11 23:18:06 +00:00
jhb	96e88fd872	Regen.	2016-03-09 19:06:46 +00:00
jhb	1b87e4306e	Simplify AIO initialization now that it is standard. - Mark AIO system calls as STD and remove the helpers to dynamically register them. - Use COMPAT6 for the old system calls with the older sigevent instead of an 'o' prefix. - Simplify the POSIX configuration to note that AIO is always available. - Handle AIO in the default VOP_PATHCONF instead of special casing it in the pathconf() system call. fpathconf() is still hackish. - Remove freebsd32_aio_cancel() as it just called the native one directly. Reviewed by: kib Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D5589	2016-03-09 19:05:11 +00:00
kib	1e048a7127	Convert all panics from the link_elf_obj kernel linker for object files format into printfs and errors to caller. Some leaks of resources are there, but the same leaks are present in other error pathes. With the change, the kernel at least boots even when module with unexpected or corrupted ELF structure is preloaded. Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-03-07 18:44:06 +00:00
kib	ffbdd975f0	In the link_elf_obj.c, handle sections of type SHT_AMD64_UNWIND same as SHT_PROGBITS. This is needed after the clang 3.8 import, which generates that type for .eh_frame section, which had SHT_PROGBITS type before. Reported by: Nikolai Lifanov <lifanov@mail.lifanov.com> PR: 207729 Tested by: dim (previous version) Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-03-06 00:31:11 +00:00
jhibbits	70aaabfeac	Replace all resource occurrences of '0UL/~0UL' with '0/~0'. Summary: The idea behind this is '~0ul' is well-defined, and casting to uintmax_t, on a 32-bit platform, will leave the upper 32 bits as 0. The maximum range of a resource is 0xFFF.... (all bits of the full type set). By dropping the 'ul' suffix, C type promotion rules apply, and the sign extension of ~0 on 32 bit platforms gets it to a type-independent 'unsigned max'. Reviewed By: cem Sponsored by: Alex Perez/Inertial Computing Differential Revision: https://reviews.freebsd.org/D5255	2016-03-03 05:07:35 +00:00
kib	486320aac4	If callout_stop_safe() noted that the callout is currently executing, but next invocation is cancelled while migrating, sleepq_check_timeout() needs to be informed that the callout is stopped. Otherwise the thread switches off CPU and never become runnable, since running callout could have already raced with us, while the migrating and cancelled callout could be one which is expected to set TDP_TIMOFAIL flag for us. This contradicts with the expected behaviour of callout_stop() for other callers, which e.g. decrement references from the callout callbacks. Add a new flag CS_MIGRBLOCK requesting report of the situation as 'successfully stopped'. Reviewed by: jhb (previous version) Tested by: cognet, pho PR: 200992 Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D5221	2016-03-02 18:46:17 +00:00
glebius	a6bfbefef5	Fix regression in r296242 affecting several drivers. For EXT_NET_DRV, EXT_MOD_TYPE, EXT_DISPOSABLE types we should first execute the free callback, then free the mbuf, otherwise we will derefernce memory that was just freed. Reported and tested: jhibbits	2016-03-02 02:12:01 +00:00
bdrewery	199dda9a01	Correct a comment.	2016-03-01 23:58:53 +00:00
jhb	58823d0b49	Use SCHEDULER_STOPPED() in cv_wait() instead of checking panicstr. Reviewed by: kib MFC after: 1 month Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D5516	2016-03-01 22:51:44 +00:00
jhb	be47bc68fb	Refactor the AIO subsystem to permit file-type-specific handling and improve cancellation robustness. Introduce a new file operation, fo_aio_queue, which is responsible for queueing and completing an asynchronous I/O request for a given file. The AIO subystem now exports library of routines to manipulate AIO requests as well as the ability to run a handler function in the "default" pool of AIO daemons to service a request. A default implementation for file types which do not include an fo_aio_queue method queues requests to the "default" pool invoking the fo_read or fo_write methods as before. The AIO subsystem permits file types to install a private "cancel" routine when a request is queued to permit safe dequeueing and cleanup of cancelled requests. Sockets now use their own pool of AIO daemons and service per-socket requests in FIFO order. Socket requests will not block indefinitely permitting timely cancellation of all requests. Due to the now-tight coupling of the AIO subsystem with file types, the AIO subsystem is now a standard part of all kernels. The VFS_AIO kernel option and aio.ko module are gone. Many file types may block indefinitely in their fo_read or fo_write callbacks resulting in a hung AIO daemon. This can result in hung user processes (when processes attempt to cancel all outstanding requests during exit) or a hung system. To protect against this, AIO requests are only permitted for known "safe" files by default. AIO requests for all file types can be enabled by setting the new vfs.aio.enable_usafe sysctl to a non-zero value. The AIO tests have been updated to skip operations on unsafe file types if the sysctl is zero. Currently, AIO requests on sockets and raw disks are considered safe and are enabled by default. aio_mlock() is also enabled by default. Reviewed by: cem, jilles Discussed with: kib (earlier version) Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D5289	2016-03-01 18:12:14 +00:00
jhb	15b2caff0f	Remove taskqueue_enqueue_fast(). taskqueue_enqueue() was changed to support both fast and non-fast taskqueues 10 years ago in r154167. It has been a compat shim ever since. It's time for the compat shim to go. Submitted by: Howard Su <howard0su@gmail.com> Reviewed by: sephe Differential Revision: https://reviews.freebsd.org/D5131	2016-03-01 17:47:32 +00:00
skra	1cb2a75f09	Remove an alternative way for dealing with root interrupt controller which is not complete. Likely, it was forgotten and not removed before committing.	2016-03-01 11:27:58 +00:00
skra	caf6d91084	Mark other parts of interrupt framework as INTR_SOLO option specific. Note that isrc_arg member of struct intr_irqsrc is used only for INTR_SOLO and IPI filter. This should be remembered if IPI filters and their arguments will be stored on another place. This option could be unusable very soon, if interrupt controllers implementations will not be implemented considering it.	2016-03-01 10:57:29 +00:00
glebius	163857deb4	New way to manage reference counting of mbuf external storage. The m_ext.ext_cnt pointer becomes a union. It can now hold the refcount value itself. To tell that m_ext.ext_flags flag EXT_FLAG_EMBREF is used. The first mbuf to attach a cluster stores the refcount. The further mbufs to reference the cluster point at refcount in the first mbuf. The first mbuf is freed only when the last reference is freed. The benefit over refcounts stored in separate slabs is that now refcounts of different, unrelated mbufs do not share a cache line. For EXT_EXTREF mbufs the zone_ext_refcnt is no longer needed, and m_extadd() becomes void, making widely used M_EXTADD macro safe. For EXT_SFBUF mbufs the sf_ext_ref() is removed, which was an optimization exactly against the cache aliasing problem with regular refcounting. Discussed with: rrs, rwatson, gnn, hiren, sbruno, np Reviewed by: rrs Differential Revision: https://reviews.freebsd.org/D5396 Sponsored by: Netflix	2016-03-01 00:17:14 +00:00
kib	e76eb4255b	Implement process-shared locks support for libthr.so.3, without breaking the ABI. Special value is stored in the lock pointer to indicate shared lock, and offline page in the shared memory is allocated to store the actual lock. Reviewed by: vangyzen (previous version) Discussed with: deischen, emaste, jhb, rwatson, Martin Simmons <martin@lispworks.com> Tested by: pho Sponsored by: The FreeBSD Foundation	2016-02-28 17:52:33 +00:00
skra	27bb203f7c	Move IPI related parts back to (ARM) machine specific file now, when the interrupt framework is also going to be used by another (MIPS) architecture. IPI implementations may vary much across different architectures. An IPI implementation should still define INTR_IPI_COUNT and use intr_ipi_setup_counters() to setup IPI counters which are inside of intrcnt[] and intrnames[] arrays. Those are used for sysctl and ddb. Then, intr_ipi_increment_count() should be used to increment obtained counter. Reviewed by: imp Differential Revision: https://reviews.freebsd.org/D5459	2016-02-27 12:03:07 +00:00
ed	4a923c8cd0	Remove the errno argument from unp_drop(). While there, add a comment to clarify that ECONNRESET should always be returned for POSIX conformance. Suggested by: Steven Hartland	2016-02-26 12:46:34 +00:00
markj	9abb1836d9	Improve error handling for posix_fallocate(2) and posix_fadvise(2). - Set td_errno so that ktrace and dtrace can obtain the syscall error number in the usual way. - Pass negative error numbers directly to the syscall layer, as they're not intended to be returned to userland. Reviewed by: kib Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D5425	2016-02-25 19:58:23 +00:00
ed	0151167359	Make asynchronous connection failures on UNIX sockets fail with ECONNRESET. While making CloudABI work well on Linux, I discovered that I had a FreeBSD-ism in one of my unit tests. The test did the following: - Create UNIX socket 1, bind it, make it listen. - Create UNIX socket 2, connect it to UNIX socket 1. - Close UNIX socket 1. - Obtain SO_ERROR from socket 2. On FreeBSD this returns ECONNABORTED, while on Linux it returns ECONNRESET. I dug through some of the relevant specifications[1] and it looks like Linux is all right here. ECONNABORTED should only be returned when the local connection (socket 2) is aborted; not the peer (socket 1). It is of course slightly misleading: the function in which we set this error is called uipc_abort(), but keep in mind that we're aborting the peer, thus resetting the local socket. [1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/connect.html Reviewed by: cem Sponsored by: Nuxi, the Netherlands Differential Revision: https://reviews.freebsd.org/D5419	2016-02-24 17:10:32 +00:00
kib	392fea70cc	Provide more correct sizing of the KVA consumed by a vnode, used by the virtvnodes calculation. Include the size of fs-specific v_data as the nfs nclnode inline, the NFS nclnode is bigger than either ZFS znode or UFS inode. Include the size of namecache_ts and short cache path element, multiplied by the name cache population factor, again inline. Inline defines are used to avoid pollution of the vnode.h with the subsystem-private objects. Non-significant unsynchronized changes of the definitions are fine, we do not care about that precision, and e.g. ZFS consumes much malloced memory per vnode for reasons unaccounted in the formula. Lower the partition of kmem dedicated to vnodes, from 1/7 to 1/10. The measures reduce vnode cache pressure on kmem and bring the vnode cache memory use below some apparent thresholds that were exceeded by r291244 due to more robust vnode reuse. Reported and tested by: marius (i386, previous version) Reviewed by: bde Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-02-24 15:15:46 +00:00
bdrewery	8a44a735c3	Fix build after r295934.	2016-02-23 23:37:10 +00:00
oshogbo	5256441af8	According to the sys/kern/capabilities.conf, gethostid(3) should be allowed. Pointed out by: Milosz Kaniewski <m.kaniewski@wheelsystems.com> Approved by: pjd (mentor) MFC after: 3 days Sponsored by: Wheel Systems, http://wheelsystems.com	2016-02-23 22:02:25 +00:00
ian	c764d7edd2	Allow a dynamic env to override a compiled-in static env by passing in the override indication in the env data. Submitted by: bde	2016-02-21 18:35:01 +00:00
jhibbits	f8385663ee	Introduce a RMAN_IS_DEFAULT_RANGE() macro, and use it. This simplifies checking for default resource range for bus_alloc_resource(), and improves readability. This is part of, and related to, the migration of rman_res_t from u_long to uintmax_t. Discussed with: jhb Suggested by: marcel	2016-02-20 01:32:58 +00:00
markj	32d1c3375a	Ensure that we test the event condition when a disabled kevent is enabled. r274560 modified kqueue_register() to only test the event condition if the corresponding knote is not disabled. However, this check takes place before the EV_ENABLE flag is used to clear the KN_DISABLED flag on the knote, so enabling a previously-disabled kevent would not result in a notification for a triggered event. This change fixes the problem by testing for EV_ENABLED before possibly checking the event condition. This change also updates a kqueue regression test to exercise this case. PR: 206368 Reviewed by: kib Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D5307	2016-02-19 01:49:33 +00:00
markj	1945f66461	Return an error if both EV_ENABLE and EV_DISABLE are specified for a kevent. Currently, this combination results in EV_DISABLE being ignored. Reviewed by: kib Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D5307	2016-02-19 01:35:01 +00:00
zbb	9fb865d099	Fix build for i386 and arm64 after r295755 - Take bus_space_tag_t type into consideration when returning default, zero value. - Include missing rman.h required by ofw_pci.h	2016-02-18 15:44:45 +00:00
zbb	d2a1177be6	Introduce bus_get_bus_tag() method Provide bus_get_bus_tag() for sparc64, powerpc, arm, arm64 and mips nexus and its children in order to return a platform specific default tag. This is required to ensure generic correctness of the bus_space tag. It is especially needed for arches where child bus tag does not match the parent bus tag. This solves the problem with ppc architecture where the PCI bus tag differs from parent bus tag which is big-endian. This commit is a part of the following patch: https://reviews.freebsd.org/D4879 Submitted by: Marcin Mazurek <mma@semihalf.com> Obtained from: Semihalf Sponsored by: Annapurna Labs Reviewed by: jhibbits, mmel Differential Revision: https://reviews.freebsd.org/D4879	2016-02-18 13:00:04 +00:00
kib	a5f88cf93f	In bnoreuselist(), check both ends of the specified logical block numbers range. This effectively skips indirect and extdata blocks on the buffer queue. Since their logical block numbers are negative, bnoreuselist() could loop infinitely. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-02-17 19:39:57 +00:00
imp	0bfb5dbc86	Create an API to reset a struct bio (g_reset_bio). This is mandatory for all struct bio you get back from g_{new,alloc}_bio. Temporary bios that you create on the stack or elsewhere should use this before first use of the bio, and between uses of the bio. At the moment, it is nothing more than a wrapper around bzero, but that may change in the future. The wrapper also removes one place where we encode the size of struct bio in the KBI.	2016-02-17 17:16:02 +00:00
andrew	61f10a2936	Remove an unused FDT header, fdt_common.h should only be needed in a few places, mostly in sys/dev/fdt and legacy code. Sponsored by: ABT Systems Ltd	2016-02-15 17:05:03 +00:00
adrian	355920c5b7	Allow MIPS INTRNG code to be built without FDT support. This patch allows the newly imported INTRNG code to be built without necessarily having FDT support in the kernel. This may be useful for some MIPS platforms that wish to move to INTRNG, but not to FDT at the same time. Basically all the code is already within ifdef's where FDT is concerned, it's just the headers that aren't. Submitted by: Stanislav Galabov <sgalabov@gmail.com> Differential Revision: https://reviews.freebsd.org/D5249	2016-02-15 14:34:35 +00:00
glebius	f0d593add5	o Gather all mbuf(9) allocation functions into kern_mbuf.c, and all mbuf(9) manipulation functions into uipc_mbuf.c. This looks like the initial intent, but had diffused in the last decade. o Gather all declarations in mbuf.h in one place and sort them. o Uninline m_clget() and m_cljget(). There are no functional changes in this patch. The patch comes from a larger version, where all mbuf(9) allocation was uninlined, which allowed to make mbuf(9) UMA zones private to kern_mbuf.c. The performance impact of the total uninlining is still unclear, so we are holding on now with larger version. Together with: melifaro, olivier	2016-02-11 21:32:23 +00:00
kib	5fefd35412	Remove useless checks for NULL before calling free(9), in the kernel elf linkers. Found by: Related PVS-Studio diagnostic Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-02-10 21:35:00 +00:00
kib	fd3e233ddd	Finish r173600. There is no need to test a condition if both cases result in the same value. Found by: PVS-Studio Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-02-10 21:16:37 +00:00
glebius	bb999f3ee9	Garbage collect unused arguments of m_init().	2016-02-10 18:54:18 +00:00
glebius	b3c4f0ddbf	Include sys/_task.h into uma_int.h, so that taskqueue.h isn't a requirement for uma_int.h. Suggested by: jhb	2016-02-09 20:22:35 +00:00
kib	736e078495	Rename P_KTHREAD struct proc p_flag to P_KPROC. I left as is an apparent bug in ntoskrnl_var.h:AT_PASSIVE_LEVEL() definition. Suggested by: jhb Sponsored by: The FreeBSD Foundation	2016-02-09 16:30:16 +00:00
jhb	f6d0271729	Call kthread_exit() rather than kproc_exit() for a premature kthread exit. Kernel threads (and processes) are supposed to call kthread_exit() (or kproc_exit()) to terminate. However, the kernel includes a fallback in fork_exit() to force a kthread exit if a kernel thread's "main" routine returns. This fallback was added back when the kernel only had processes and was not updated to call kthread_exit() instead of kproc_exit() when threads were added to the kernel. This mistake was particular exciting when the errant thread belonged to proc0. Due to the missing P_KTHREAD flag the fallback did not kick in and instead tried to return to userland via whatever garbage was in the trapframe. With P_KTHREAD set it tried to terminate proc0 resulting in other amusements. PR: 204999 MFC after: 1 week	2016-02-08 23:11:23 +00:00
jhb	9fbc164e2b	Mark proc0 as a kernel process via the P_KTHREAD flag. All other kernel processes have this flag set and all threads in proc0 (including thread0) have the similar TDP_KTHREAD flag set. PR: 204999 Submitted by: Oliver Pinter @ HardenedBSD Reviewed by: kib MFC after: 1 week	2016-02-08 23:06:27 +00:00
kib	2ed1e2991e	Remove the assert which outlived its usefulness, and, by default, disable compilation of the code which made it possible to call stop_all_proc() from usermode at all. Move the comment to the preamble of stop_all_proc() and reword it to give overview of the function intent. proc0 has P_HADTHREADS flag set due to kthread_add(), but no P_KTHREAD, which triggered the assert, which does not serve a purpose now. Reported by: Oliver Pinter Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-02-08 10:54:27 +00:00
jilles	b2791f185c	semget(): Check for [EEXIST] error first. Although POSIX literally permits failing with [EINVAL] if IPC_CREAT and IPC_EXCL were both passed, the semaphore set already exists and has fewer semaphores than nsems, this does not allow an application to retry safely: if the [EINVAL] is actually because of the semmsl limit, an infinite loop would result. PR: 206927	2016-02-07 22:12:39 +00:00
pfg	8d82ef5201	Minor grammar fix in comment.	2016-02-07 16:18:12 +00:00
mckusick	3ddfa78dde	Clarify a comment in kern_openat() about the use of falloc_noinstall(). Suggested by: Steve Jacobson	2016-02-07 01:04:47 +00:00
mjg	8d5e31f3ba	fork: ansify sys_pdfork No functional changes.	2016-02-06 09:01:03 +00:00
jhb	21434c7a70	Rename aiocblist to kaiocb and use consistent variable names. Typically <foo>list is used for a structure that holds a list head in FreeBSD, not for members of a list. As such, rename 'struct aiocblist' to 'struct kaiocb' (the kernel version of 'struct aiocb'). While here, use more consistent variable names for AIO control blocks: - Use 'job' instead of 'aiocbe', 'cb', 'cbe', or 'iocb' for kernel job objects. - Use 'jobn' instead of 'cbn' for use with TAILQ_FOREACH_SAFE(). - Use 'sjob' and 'sjobn' instead of 'scb' and 'scbn' for fsync jobs. - Use 'ujob' instead of 'aiocbp', 'job', 'uaiocb', or 'uuaiocb' to hold a user pointer to a 'struct aiocb'. - Use 'ujobp' instead of 'aiocbp' for a user pointer to a 'struct aiocb *'. Reviewed by: kib Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D5125	2016-02-05 20:38:09 +00:00
kib	977f53633f	When matching brand to the ELF binary by notes, try to find a brand with interpreter name exactly matching one wanted by the binary. If no such brand exists, return first brand which accepted the binary by note. The change fixes a regression after r292749, where e.g. our two ia32 compat brands, ia32_brand_info and ia32_brand_oinfo, only differ by the interpeter path and binary matches to a brand by linkage order. Then old binaries which require /usr/libexec/ld-elf.so.1 but matched against ia32_brand_info with interp_path /libexec/ld-elf.so.1, were considered requiring non-standard interpreter name, and magic to force ld-elf32.so.1 did not happen. Note that it might make sense to apply the same selection of brands for other matching criteria, SCO EI_OSABI and 3.x string. Reported and tested by: dwmalone Sponsored by: The FreeBSD Foundation MFC after: 3 days	2016-02-04 20:55:49 +00:00
kib	55fda50177	Do not copy by field when converting struct oexport_args to struct export_args on mount update, bzero() is consistent with vfs_oexport_conv(). Make the code structure more explicit by using switch. Return EINVAL if export option layout (deduced from size) is unknown. Based on the submission by: bde Sponsored by: The FreeBSD Foundation	2016-02-04 16:32:21 +00:00
kib	ce2fb04a46	Guard against runnable td2 exiting and than being reused for unrelated process when the parent sleeps waiting for the debugger attach on fork. Diagnosed and reviewed by: mjg Sponsored by: The FreeBSD Foundation	2016-02-04 10:49:34 +00:00
mjg	027c9d90e3	fork: plug a use after free of the returned process fork1 required its callers to pass a pointer to struct proc * which would be set to the new process (if any). procdesc and racct manipulation also used said pointer. However, the process could have exited prior to do_fork return and be automatically reaped, thus making this a use-after-free. Fix the problem by letting callers indicate whether they want the pid or the struct proc, return the process in stopped state for the latter case. Reviewed by: kib	2016-02-04 04:25:30 +00:00
mjg	9a7c585ab5	fork: pass arguments to fork1 in a dedicated structure Suggested by: kib	2016-02-04 04:22:18 +00:00
glebius	953ea03018	Redo r292484. Embed task(9) into zone, so that uz_maxaction is called in a context that can sleep, allowing consumers of the KPI to run their drain routines without any extra measures. Discussed with: jtl	2016-02-03 23:30:17 +00:00
glebius	8f977b6c5c	Move uma_dbg_alloc() and uma_dbg_free() into uma_core.c, which allows to make uma_dbg.h not depend on uma_int.h, which allows to uninclude uma_int.h from the mbuf(9) allocator.	2016-02-03 22:02:36 +00:00
alfred	b04e8a547e	Increase max allowed backlog for listen sockets from short to int. PR: 203922 Submitted by: White Knight <white_knight@2ch.net> MFC After: 4 weeks	2016-02-02 05:57:59 +00:00
glebius	306a6faf84	These files were getting sys/malloc.h and vm/uma.h with header pollution via sys/mbuf.h	2016-02-01 17:41:21 +00:00
andrew	aac959f280	Fix the logic in the ddb command 'show ktr /a'. Prior to r118269 it would print until cncheckc returned a non -1, i.e. a character had been entered. After this change it would print only if cncheckc returned a character. As this was before each call to db_mach_vtrace the normal outcome was nothing was printed. With this change 'show ktr /a' will now keep printing until the user stops the command with a key press, or there is no more entries to print.	2016-01-31 17:32:20 +00:00
vangyzen	f8f3d27673	kqueue EVFILT_PROC: avoid collision between NOTE_CHILD and NOTE_EXIT NOTE_CHILD and NOTE_EXIT return something in kevent.data: the parent pid (ppid) for NOTE_CHILD and the exit status for NOTE_EXIT. Do not let the two events be combined, since one would overwrite the other's data. PR: 180385 Submitted by: David A. Bright <david_a_bright@dell.com> Reviewed by: jhb MFC after: 1 month Sponsored by: Dell Inc. Differential Revision: https://reviews.freebsd.org/D4900	2016-01-28 20:24:15 +00:00
mckusick	0b10a802f8	The bread() function was inconsistent about whether it would return a buffer pointer in the event of an error (for some errors it would return a buffer pointer and for other errors it would not return a buffer pointer). The cluster_read() function was similarly inconsistent. Clients of these functions were inconsistent in handling errors. Some would assume that no buffer was returned after an error and would thus lose buffers under certain error conditions. Others would assume that brelse() should always be called after an error and would thus panic the system under certain error conditions. To correct both of these problems with minimal code churn, bread() and cluster_write() now always free the buffer when returning an error thus ensuring that buffers will never be lost. The brelse() routine checks for being passed a NULL buffer pointer and silently returns to avoid panics. Thus both approaches to handling error returns from bread() and cluster_read() will work correctly. Future code should be written assuming that bread() and cluster_read() will never return a buffer with an error, so should not attempt to brelse() the buffer when an error is returned. Reviewed by: kib	2016-01-27 21:23:01 +00:00
mjg	0e34636722	ktrace: tidy up ktrstruct - minor style fixes - avoid doing strlen twice [1] PR: 206648 Submitted by: C Turt <ecturt gmail.com> (original version) [1]	2016-01-27 19:55:02 +00:00
jhibbits	31bb8ee5bd	Convert rman to use rman_res_t instead of u_long Summary: Migrate to using the semi-opaque type rman_res_t to specify rman resources. For now, this is still compatible with u_long. This is step one in migrating rman to use uintmax_t for resources instead of u_long. Going forward, this could feasibly be used to specify architecture-specific definitions of resource ranges, rather than baking a specific integer type into the API. This change has been broken out to facilitate MFC'ing drivers back to 10 without breaking ABI. Reviewed By: jhb Sponsored by: Alex Perez/Inertial Computing Differential Revision: https://reviews.freebsd.org/D5075	2016-01-27 02:23:54 +00:00
jhb	fe64620955	Various style fixes. - Wrap long lines. - Fix indentation. - Remove excessive parens. - Whitespace fixes in struct definitions. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D5025	2016-01-26 21:24:49 +00:00
kib	e18e95605f	Don't clear the software flow control flag before draining for last close or assert the bug that it is clear when leaving. Remove an unrelated rotted comment that was attached to the buggy clearing. Since draining is not done in more cases, flushing is needed in more cases, so start fixing flushing: - do a full flush in ttydisc_close(). State what POSIX requires more clearly. This was missing ttydevsw_pktnotify() calls to tell the devsw layer to flush. Hardware tty drivers don't actually flush since they don't understand this API. - fix 2 missing wakeups in tty_flush(). Most of the wakeups here are unnecessary for last close. But ttydisc_close() did one of the missing ones. This flow control bug ameliorated the design bug of requiring potentially unbounded waits in draining. Software flow control is the easiest way to get an unbounded wait, and a long wait is sometimes actually useful. Users can type the xoff character on the receiver and (if ixon is set on the sender) expect the output to be held until the user is ready for more. Hardware flow control can also give the unbounded wait, and this bug didn't affect hardware flow control. Unbounded waits from hardware flow control take a more unusual configuration. E.g., a terminal program that controls the modem status lines, or unplugging the cable in a configuration where this doesn't break the connection. The design bug is still ameliorated by a newer bug in draining for last close -- the 1 second timeout. E.g., if the user types the xoff character and the sender reaches last close, then output is not resumed and the wait times out after just 1 second. This is broken, but preferable to an unbounded wait. Before this change, the output was resumed immediately and usually completed. Submitted by: bde MFC after: 2 weeks	2016-01-26 14:46:39 +00:00
trasz	8e0224f2c9	Fix the way RCTL handles rules' rrl_exceeded on credenials change. Because of what this variable does, it was probably harmless - but still incorrect. MFC after: 1 month Sponsored by: The FreeBSD Foundation	2016-01-26 11:28:55 +00:00
kib	d4a0747609	Restore flushing of output for revoke(2) again. Document revoke()'s intended behaviour in its man page. Simplify tty_drain() to match. Don't call ttydevsw methods in tty_flush() if the device is gone since we now sometimes call it then. The flushing was supposed to be implemented by passing the FNONBLOCK flag to VOP_CLOSE() for revoke(). The tty driver is one of the few that can block in close and was one of the fewer that knew about this. This almost worked in FreeBSD-1 and similarly in Net/2. These versions only almost worked because there was and is considerable confusion between IO_NDELAY and FNONBLOCK (aka O_NONBLOCK). IO_NDELAY is only valid for VOP_READ() and VOP_WRITE(). For other VOPs it has the same value as O_SHLOCK. But since vfs_subr.c and tty.c consistently used the wrong flag and the O_SHLOCK flag is rarely set, this mostly worked. It also gave the feature than applications could get the non-blocking close by abusing O_SHLOCK. This was first broken then fixed in 1995. I changed only the tty driver to use FNONBLOCK, as a hack to get non-blocking via the normal flag FNONBLOCK for last closes. I didn't know about revoke()'s use of IO_NDELAY or change it to be consistent, so revoke() was broken. Then I changed revoke() to match. This was next broken in 1997 then fixed in 1998. Importing Lite2 made the flags inconsistent again by undoing the fix only in vfs_subr.c. This was next broken in 2008 by replacing everything in tty.c and not checking any flags in last close. Other bugs in draining limited the resulting unbounded waits to drain in some cases. It is now possible to fix this better using the new FREVOKE flag. Just restore flushing for revoke() for now. Don't restore or undo any hacks for ordinary last closes yet. But remove dead code in the 1-second relative timeout (r272789). This did extra work to extend the buggy draining for revoke() for as long as possible. The 1-second timeout made this not very long by usually flushing after 1 second. Submitted by: bde MFC after: 2 weeks	2016-01-26 07:57:44 +00:00
markj	249795444f	Evaluate the sysctl_running fail point before taking the sysctl lock. The fail point handler may sleep, but this is not permitted while holding a rm read lock. MFC after: 1 week Sponsored by: EMC / Isilon Storage Division	2016-01-26 01:15:18 +00:00
marius	e5b176f8bf	- Make the code consistent with itself style-wise and bring it closer to style(9). - Mark unused arguments as such. - Make the ttystates table const.	2016-01-25 22:58:06 +00:00
kib	8d218f7844	Don't allow opening the callout device when the callin device is already open (in disguise as the console device). The only allowed combination was supposed to be the callin device with the console. Fix the assertion in ttydev_close() that was meant to detect this (it only detected all 3 devices being open). Assert this in ttydev_open() too. Submitted by: bde MFC after: 2 weeks	2016-01-25 16:47:20 +00:00
kib	265360f6c0	Fix the %b flags string for ddb. All bits above the 5th (TF_OPENED_CONS) were broken in r188147 by adding TF_OPENED_CONS without updating the string. It was especially confusing to display OPENED_CONS as GONE and BYPASS as ZOMBIE. 2 flags at the end were not updated in r188487. Don't print an extra 0x prefix for %p in a ddb command. In the rest of the kernel there are more than 6000 lines with %p and only about 40 with this bug. Print a non-extra 0x prefix for %b in a ddb command. In the rest of the kernel, there are approx. 180 lines with %b and 2/3 of them have this bug. Submitted by: bde MFC after: 2 weeks	2016-01-25 15:37:01 +00:00
melifaro	23582454c7	MFP r287070,r287073: split radix implementation and route table structure. There are number of radix consumers in kernel land (pf,ipfw,nfs,route) with different requirements. In fact, first 3 don't have _any_ requirements and first 2 does not use radix locking. On the other hand, routing structure do have these requirements (rnh_gen, multipath, custom to-be-added control plane functions, different locking). Additionally, radix should not known anything about its consumers internals. So, radix code now uses tiny 'struct radix_head' structure along with internal 'struct radix_mask_head' instead of 'struct radix_node_head'. Existing consumers still uses the same 'struct radix_node_head' with slight modifications: they need to pass pointer to (embedded) 'struct radix_head' to all radix callbacks. Routing code now uses new 'struct rib_head' with different locking macro: RADIX_NODE_HEAD prefix was renamed to RIB_ (which stands for routing information base). New net/route_var.h header was added to hold routing subsystem internal data. 'struct rib_head' was placed there. 'struct rtentry' will also be moved there soon.	2016-01-25 06:33:15 +00:00
kib	66dd616ca8	In tty_dealloc(), clear the queues. See the comment for a scenario which explains why ttydev_leave() cleanup might not happen. Submitted by: bde MFC after: 3 weeks	2016-01-22 20:38:46 +00:00
kib	f011d9b5bd	The struct file f_advice member is overlaid with the devfs f_cdevpriv data. If vnode bypass for devfs file failed, vn_read/vn_write are called and might try to dereference f_advice. Limit the accesses to f_advice to VREG vnodes only, which is the type ensured by posix_fadvise(). The f_advice for regular files is protected by mtxpool lock. Recheck that f_advice is not NULL after lock is taken. Reported and tested by: bde Sponsored by: The FreeBSD Foundation MFC after: 3 weeks	2016-01-22 20:35:20 +00:00
glebius	0d9f222414	- Separate sendfile(2) implementation from uipc_syscalls.c into separate file. Claim my copyright. - Provide more comments, better function and structure names. - Sort out unneeded includes from resulting two files. No functional changes.	2016-01-22 02:23:18 +00:00
jhb	7a18594de0	AIO daemons have always been kernel processes to facilitate switching to user VM spaces while servicing jobs. Update various comments and data structures that refer to AIO daemons as threads to refer to processes instead. Reviewed by: kib Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D4999	2016-01-21 02:20:38 +00:00
jhb	e4c5597b37	Remove unused variables for socket AIO. In r55943, a per-process queue of pending socket AIO requests (requests waiting for the socket to become ready) was added so that they could be cancelled during process rundown. In r154765, the rundown code was changed to handle jobs in this state (JOBST_JOBQSOCK) directly removing the need for the extra queue. However, the per-process queue head and global lock were never removed. Reviewed by: kib Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D4997	2016-01-21 01:28:31 +00:00
mjg	0b0166a192	cache: minor changes 1. vhold and zap immediately instead of postponing few lines later 2. increment numneg after new entry is added No functional changes. No objections: kib	2016-01-21 01:09:39 +00:00
mjg	fc47375f70	cache: perform . lockup without the namecache lock Reviewed by: kib	2016-01-21 01:07:05 +00:00
mjg	a7359714be	cache: provide a helper for computing the hash Reviewed by: kib	2016-01-21 01:05:41 +00:00
mjg	5ca67cac1e	cache: use counter(9) API to maintain statistics Previously the code would just increment statistics while only holding a shared lock, in effect losing updates. Separate tracking for nchstats is removed as values can be obtained from existing counters. Note that some fields are updated by external consumers and are left unfixed. This should not be a serious issue as this structure looks quite obsolete. No strong objections: kib	2016-01-21 01:04:03 +00:00
mjg	eabf748a18	session: avoid proctree lock on proc exit when possible We can get away with the common case with only proc lock held. Reviewed by: kib	2016-01-20 23:33:58 +00:00
mjg	1a52d1b25e	session: tidy up fixjobc This stops abusing the 'p' pointer for iteration over children processes and gets rid of useless locking around PRS_ZOMBIE check. Suggested by: kib	2016-01-20 23:22:36 +00:00
marius	c9d9d68bae	Fix tty_drain() and, thus, TIOCDRAIN of the current tty(4) incarnation to actually wait until the TX FIFOs of UARTs have be drained before returning. This is done by bringing the equivalent of the TS_BUSY flag found in the previous implementation back in an ABI-preserving way. Reported and tested by: Patrick Powell Most likely, drivers for USB-serial-adapters likewise incorporating TX FIFOs as well as other terminal devices that buffer output in some form should also provide implementations of tsw_busy. MFC after: 3 days	2016-01-19 23:34:27 +00:00
jhb	897a9bc5d3	Various cleanups to the main function for AIO kernel processes: - Pull the vmspace logic out into helper functions and reduce duplication. Operations on the vmspace are all isolated to vm_map.c, but it now exports a new 'vmspace_switch_aio' for use by AIO kernel processes. - When an AIO kernel process wants to exit, break out of the main loop and perform cleanup after the loop end. This reduces a lot of indentation and allows cleanup to more closely mirror setup actions before the loop starts. - Convert a DIAGNOSTIC to KASSERT(). - Replace mycp with more typical 'p'. Reviewed by: kib Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D4990	2016-01-19 21:37:51 +00:00
jhb	ea0e31cb8f	Don't create a dedicated session for each AIO kernel process. This code dates back to the initial AIO support and the commit log does not explain why it is needed. However, I cannot find anything in the AIO code or the various file methods (fo_read/fo_write) that would change behavior due to using a private session instead of proc0's session. Reviewed by: kib Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D4988	2016-01-19 20:46:30 +00:00
markj	09fb369fc5	Add vrefl(), a locked variant of vref(9). This API has no in-tree consumers at the moment but is useful to at least one out-of-tree consumer, and naturally complements existing vnode refcount functions (vholdl(9), vdropl(9)). Obtained from: kib (sys/ portion) Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D4947 Differential Revision: https://reviews.freebsd.org/D4953	2016-01-18 22:21:46 +00:00
kib	7d0828c94e	When cleaning up from failed adv locking and checking for write, do not call VOP_CLOSE() manually. Instead, delegate the close to fo_close() performed as part of the fdrop() on the file failed to open. For this, finish constructing file on error, in particular, set f_vnode and f_ops. Forcibly resetting f_ops to badfileops disabled additional cleanups performed by fo_close() for some file types, in this case it was noted that cdevpriv data was corrupted. Since fo_close() call must be enabled for some file types, it makes more sense to enable it for all files opened through vn_open_cred(). In collaboration with: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-01-17 08:40:51 +00:00
jhb	ea7fa1c904	Remove aiod_timeout. It hasn't been used since the AIO code was made MPSAFE 10 years ago. Reviewed by: kib Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D4946	2016-01-14 21:28:56 +00:00
jhb	61577b76c5	Rename aiod_bio taskqueue to aiod_kick. This taskqueue is not used to handle bio requests. It is only used to run aio_kick_nowait() to spin up new aio daemon processes. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D4904	2016-01-14 20:51:48 +00:00
glebius	796cbcc738	Call crextend() before copying old credentials to the new credentials and replace crcopysafe by crcopy as crcopysafe is is not intended to be safe in a threaded environment, it drops PROC_LOCK() in while() that can lead to unexpected results, such as overwrite kernel memory. In my POV crcopysafe() needs special attention. For now I do not see any problems with this function, but who knows. Submitted by: dchagin Found by: trinity Security: SA-16:04.linux	2016-01-14 10:16:25 +00:00
cperciva	9ca3584fdd	Fix a bug introduced in r291716: "The problem with the approach taken both in _bus_dmamap_load_pages and bus_dmamap_load_ma_triv is that they split the request buffer into arbitrary chunks based on page boundaries, creating segments that no longer have a size that's a multiple of the sector size. This breaks drivers like blkfront (and probably other stuff)." [1] This was most easily triggered by running `fsck /` on a system running in Xen (e.g. Amazon EC2) but also showed up via growfs(8) and probably many other userland tools which access the disk directly. Patch by: royger [1] "Thinks this should be fine" by: ken	2016-01-11 20:38:39 +00:00
dchagin	e706df7b9a	Implement vsyscall hack. Prior to 2.13 glibc uses vsyscall instead of vdso. An upcoming linux_base-c6 needs it. Differential Revision: https://reviews.freebsd.org/D1090 Reviewed by: kib, trasz MFC after: 1 week	2016-01-09 20:18:53 +00:00
markj	e38d62e90d	Prevent cv_waiters wraparound. r282971 attempted to fix this problem by decrementing cv_waiters after waking up from sleeping on a condition variable, but this can result in a use-after-free if the CV is freed before all woken threads have had a chance to run. Instead, avoid incrementing cv_waiters past INT_MAX, and have cv_signal() explicitly check for sleeping threads once cv_waiters has reached this bound. Reviewed by: jhb MFC after: 2 weeks Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D4822	2016-01-09 01:56:46 +00:00
glebius	aaa09777e1	New sendfile(2) syscall. A joint effort of NGINX and Netflix from 2013 and up to now. The new sendfile is the code that Netflix uses to send their multiple tens of gigabits of data per second. The new implementation features asynchronous I/O, when I/O operations are launched, but not awaited to be complete. An explanation of why such behavior is beneficial compared to old one is going to be too long for a commit message, so we will skip it here. Additional features of new syscall are extra flags, which provide an application more control over data sent. The SF_NOCACHE flag tells kernel that data shouldn't be cached after it was sent. The SF_READAHEAD() macro allows to specify readahead size in pages. The new syscalls is a drop in replacement. No modifications are required to applications. One can take nginx binary for stable/10 and run it successfully on head. Although SF_NODISKIO lost its original sense, as now sendfile doesn't block, and now means something completely different (tm), using the new sendfile the old way is absolutely safe. Celebrates: Netflix global launch! Sponsored by: Nginx, Inc. Sponsored by: Netflix Relnotes: yes	2016-01-08 20:34:57 +00:00
glebius	e25e77f91d	Make it possible for sbappend() to preserve M_NOTREADY on mbufs, just like sbappendstream() does. Although, M_NOTREADY may appear only on SOCK_STREAM sockets, due to sendfile(2) supporting only the latter, there is a corner case of AF_UNIX/SOCK_STREAM socket, that still uses records for the sake of control data, albeit being stream socket. Provide private version of m_clrprotoflags(), which understands PRUS_NOTREADY, similar to m_demote().	2016-01-08 19:03:20 +00:00
glebius	088235535d	Revert r293405: it breaks socket buffer INVARIANTS when sending control data over local sockets.	2016-01-08 17:27:23 +00:00
glebius	a4cad9f2ef	For SOCK_STREAM socket use sbappendstream() instead of sbappend().	2016-01-08 01:16:03 +00:00
kib	eb437d36bf	Convert tty common code to use make_dev_s(). Tty.c was untypical in that it handled the si_drv1 issue consistently and correctly, by always checking for si_drv1 being non-NULL and sleeping if NULL. The removed code also illustrated unneeded complications in drivers which are eliminated by the use of new KPI. Reviewed by: hps, jhb Discussed with: bde Sponsored by: The FreeBSD Foundation MFC after: 3 weeks Differential revision: https://reviews.freebsd.org/D4746	2016-01-07 20:15:09 +00:00
kib	3277da17a1	Provide yet another KPI for cdev creation, make_dev_s(9). Immediate problem fixed by the new KPI is the long-standing race between device creation and assignments to cdev->si_drv1 and cdev->si_drv2, which allows the window where cdevsw methods might be called with si_drv1,2 fields not yet set. Devices typically checked for NULL and returned spurious errors to usermode, and often left some methods unchecked. The new function interface is designed to be extensible, which should allow to add more features to make_dev_s(9) without inventing yet another name for function to create devices, while maintaining KPI and even KBI backward-compatibility. Reviewed by: hps, jhb Sponsored by: The FreeBSD Foundation MFC after: 3 weeks Differential revision: https://reviews.freebsd.org/D4746	2016-01-07 20:08:02 +00:00
mjg	cbad85009d	cache: ansify functions and fix some style issues No functional changes.	2016-01-07 02:04:17 +00:00
kib	8c46f725d5	Two fixes for excessive iterations after r292326. Advance the logical block number to the lblkno of the found block plus one, instead of incrementing the block number which was used for lookup. This change skips sparcely populated buffer ranges, similar to r292325, instead of doing useless lookups. Do not restart the bnoreuselist() from the start of the range if buffer lock cannot be obtained without sleep. Only retry lookup and lock for the same queue and same logical block number. Reported by: benno Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 3 days	2016-01-05 14:48:40 +00:00
ian	3d96cedc35	Make the 'env' directive described in config(5) work on all architectures, providing compiled-in static environment data that is used instead of any data passed in from a boot loader. Previously 'env' worked only on i386 and arm xscale systems, because it required the MD startup code to examine the global envmode variable and decide whether to use static_env or an environment obtained from the boot loader, and set the global kern_envp accordingly. Most startup code wasn't doing so. Making things even more complex, some mips startup code uses an alternate scheme that involves calling init_static_kenv() to pass an empty buffer and its size, then uses a series of kern_setenv() calls to populate that buffer. Now all MD startup code calls init_static_kenv(), and that routine provides a single point where envmode is checked and the decision is made whether to use the compiled-in static_kenv or the values provided by the MD code. The routine also continues to serve its original purpose for mips; if a non-zero buffer size is passed the routine installs the empty buffer ready to accept kern_setenv() values. Now if the size is zero, the provided buffer full of existing env data is installed. A NULL pointer can be passed if the boot loader provides no env data; this allows the static env to be installed if envmode is set to do so. Most of the work here is a near-mechanical change to call the init function instead of directly setting kern_envp. A notable exception is in xen/pv.c; that code was originally installing a buffer full of preformatted env data along with its non-zero size (like mips code does), which would have allowed kern_setenv() calls to wipe out the preformatted data. Now it passes a zero for the size so that the buffer of data it installs is treated as non-writeable.	2016-01-02 02:53:48 +00:00
marius	05a298f61f	- (Ab)use udivx for dividing the u_int pc_cpuid when implementing CPU_ISSET(), CPU_SET etc. in sparc64 asm. This approach has the benefit of not clobbering %y, allowing to revert r222827 and partially r222828. - In r222828, CATR() already was changed to use the equivalent of PCPU_GET(cpuid) instead of the MD module ID for KTR_CPU, so belatedly also catch up with the C side of ktr(9). Originally, in r203838 CATR() was moved away from directly reading the module ID or equivalent as that became impractical with other CPU types than USI/II supported. With r222828 in place, per-CPU data generally is set up soon enough, though, that employing PCPU things in ktr(9) also for use during early stages works. - Unfortunately, an exception to the latter is the ktr(9) use in pmap_bootstrap(), which actually is run so early that even checking for bootverbose being set via the loader doesn't work. Consequently, replace the ktr(9) use in pmap_bootstrap() with OF_printf(9) and put it under #ifdef DIAGNOSTIC instead. MFC after: 3 days	2015-12-30 13:49:20 +00:00
jhb	fb5720f7be	Add ptrace(2) reporting for LWP events. Add two new LWPINFO flags: PL_FLAG_BORN and PL_FLAG_EXITED for reporting thread creation and destruction. Newly created threads will stop to report PL_FLAG_BORN before returning to userland and exiting threads will stop to report PL_FLAG_EXIT before exiting completely. Both of these events are only enabled and reported if PT_LWP_EVENTS is enabled on a process.	2015-12-29 23:25:26 +00:00
jhb	79ec12eeb6	Call kern_thr_exit() instead of duplicating it. This code is missing the racct_subr() call from kern_thr_exit() and would require further code duplication in future changes. Reviewed by: kib MFC after: 1 week	2015-12-29 23:16:20 +00:00
dchagin	dad1819732	Verify that tv_sec value specified in settimeofday() and clock_settime() (CLOCK_REALTIME case) system calls is non negative. This commit hides a kernel panic in atrtc_settime() as the clock_ts_to_ct() does not properly convert negative tv_sec. ps. in my opinion clock_ts_to_ct() should be rewritten to properly handle negative tv_sec values. Differential Revision: https://reviews.freebsd.org/D4714 Reviewed by: kib MFC after: 1 week	2015-12-27 15:37:07 +00:00
kib	cc13042464	Do not substitute interpeter if the brand interpreter path is different from the interpreter path requested by the binary. Before this change, it is impossible to activate non-default interpreter for 32bit image on amd64, when /libexec/ld-elf32.so.1 file exists. Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2015-12-26 15:40:12 +00:00
jtl	f41bf39357	Only allow one PT_INTERP ELF program header. This also fixes a potential memory leak for interp_buf. Differential Revision: https://reviews.freebsd.org/D4692 Reviewed by: kib MFC after: 2 weeks Sponsored by: Juniper Networks	2015-12-24 00:58:11 +00:00
ngie	b78f13918e	Fix r292640 vim overzealously removed some trailing `+' and I didn't check the diff MFC after: 1 week X-MFC with: r292640 Pointyhat to: ngie Sponsored by: EMC / Isilon Storage Division	2015-12-23 03:34:43 +00:00
ngie	e1cc5a3ca1	Clean up trailing whitespace; no functional change MFC after: 1 week Sponsored by: EMC / Isilon Storage Division	2015-12-23 03:29:37 +00:00
ngie	9273c09a18	Fold lim_shared into lim_copy to mute a -Wunused compiler warning from clang when the kernel is compiled without INVARIANTS Differential Revision: https://reviews.freebsd.org/D4683 Reviewed by: kib, jhb MFC after: 1 week Sponsored by: EMC / Isilon Storage Division	2015-12-22 21:07:33 +00:00
kib	bcb048ba0c	If we annoy user with the terminal output due to failed load of interpreter, also show the actual error code instead of some interpretation. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-12-22 20:12:52 +00:00
jtl	94d8d1452b	Add a safety net to reclaim mbufs when one of the mbuf zones become exhausted. It is possible for a bug in the code (or, theoretically, even unusual network conditions) to exhaust all possible mbufs or mbuf clusters. When this occurs, things can grind to a halt fairly quickly. However, we currently do not call mb_reclaim() unless the entire system is experiencing a low-memory condition. While it is best to try to prevent exhaustion of one of the mbuf zones, it would also be useful to have a mechanism to attempt to recover from these situations by freeing "expendable" mbufs. This patch makes two changes: a) The patch adds a generic API to the UMA zone allocator to set a function that should be called when an allocation fails because the zone limit has been reached. Because of the way this function can be called, it really should do minimal work. b) The patch uses this API to try to free mbufs when an allocation fails from one of the mbuf zones because the zone limit has been reached. The function schedules a callout to run mb_reclaim(). Differential Revision: https://reviews.freebsd.org/D3864 Reviewed by: gnn Comments by: rrs, glebius MFC after: 2 weeks Sponsored by: Juniper Networks	2015-12-20 02:05:33 +00:00
mjg	e70da8e2e9	proc: fix a race which could result in dereference of bad p_pgrp pointer on fork During fork p_starcopy - p_endcopy area of a process is populated with bcopy with only proc lock held. Another forking thread can find such a process and proceed to access p_pgrp included in said area. Fix the problem by moving the field outside. It is being properly assigned later. Reviewed by: kib Diagnosed by: kib Tested by: Fabian Keil <freebsd-listen fabiankeil.de> MFC after: 10 days	2015-12-18 16:33:15 +00:00
adrian	a3e51ff0e6	[intrng] Migrate the intrng code from sys/arm/arm to sys/kern/subr_intr.c. The ci20 port (by kan@) is going to reuse almost all of the intrng code since the SoC in question looks suspiciously like someone took an ARM SoC design and replaced the ARM core with a MIPS core. * migrate out the code; * rename ARM_ -> INTR_; * rename arm_ -> intr_; * move the interrupt flush routine from intr.c / intrng.c into arm/machdep_intr.c - removing the code duplication and removing the ARM specific bits from here. Thanks to the Star Wars: The Force Awakens premiere line for allowing me a couple hours of quiet time to finish the universe builds. Tested: * make universe TODO: * The structure definitions in subr_intr.c still includes machine/intr.h which requires one duplicates all of the intrng definitions in the platform code (which kan has done, and I think we don't have to.) Instead I should break out the generic things (function declarations, common intr structures, etc) into a separate header. * Kan has requested I make the PIC based IPI stuff optional.	2015-12-18 05:43:59 +00:00
markj	338746b90e	Support an arbitrary number of arguments to DTrace syscall probes. Rather than pushing all eight possible arguments into dtrace_probe()'s stack frame, make the syscall_args struct for the current syscall available via the current thread. Using a custom getargval method for the systrace provider, this allows any syscall argument to be fetched, even in kernels that have modified the maximum number of system call arguments. Sponsored by: EMC / Isilon Storage Division	2015-12-17 00:00:27 +00:00
markj	fa1b8e9a4f	Fix style issues around existing SDT probes. - Use SDT_PROBE<N>() instead of SDT_PROBE(). This has no functional effect at the moment, but will be needed for some future changes. - Don't hardcode the module component of the probe identifier. This is set automatically by the SDT framework. MFC after: 1 week	2015-12-16 23:39:27 +00:00
glebius	63cd1c131a	A change to KPI of vm_pager_get_pages() and underlying VOP_GETPAGES(). o With new KPI consumers can request contiguous ranges of pages, and unlike before, all pages will be kept busied on return, like it was done before with the 'reqpage' only. Now the reqpage goes away. With new interface it is easier to implement code protected from race conditions. Such arrayed requests for now should be preceeded by a call to vm_pager_haspage() to make sure that request is possible. This could be improved later, making vm_pager_haspage() obsolete. Strenghtening the promises on the business of the array of pages allows us to remove such hacks as swp_pager_free_nrpage() and vm_pager_free_nonreq(). o New KPI accepts two integer pointers that may optionally point at values for read ahead and read behind, that a pager may do, if it can. These pages are completely owned by pager, and not controlled by the caller. This shifts the UFS-specific readahead logic from vm_fault.c, which should be file system agnostic, into vnode_pager.c. It also removes one VOP_BMAP() request per hard fault. Discussed with: kib, alc, jeff, scottl Sponsored by: Nginx, Inc. Sponsored by: Netflix	2015-12-16 21:30:45 +00:00
kib	b5160b0280	Optimize vop_stdadvise(POSIX_FADV_DONTNEED). Instead of looking up a buffer for each block number in the range with gbincore(), look up the next instantiated buffer with the logical block number which is greater or equal to the next lblkno. This significantly speeds up the iteration for sparce-populated range. Move the iteration into new helper bnoreuselist(), which is structured similarly to flushbuflist(). Reported and tested by: pho Reviewed by: markj Sponsored by: The FreeBSD Foundation	2015-12-16 08:48:37 +00:00
kib	764a2409cb	Simplify the loop step in the flushbuflist() and make it independed on the type stability of the buffers memory. Instead of memoizing pointer to the next buffer and validating it, remember the next logical block number in the bo list and re-lookup. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation	2015-12-16 08:39:51 +00:00
adrian	64b681fbd5	Don't call wakeup if we're just returning reserved space; just return the reservation and wait for more space to appear. Submitted by: jeff Reviewed by: kib	2015-12-16 00:13:16 +00:00
jamie	b78d6a91e2	Fix jail name checking that disallowed anything that starts with '0'. The intention was to just limit leading zeroes on numeric names. That check is now improved to also catch the leading spaces and '+' that strtoul can pass through. PR: 204897 MFC after: 3 days	2015-12-15 17:25:00 +00:00
trasz	9d2d111f78	Tweak comments. MFC after: 1 month Sponsored by: The FreeBSD Foundation	2015-12-13 11:30:36 +00:00
trasz	6751d261c4	Actually make the 'amount' argument to racct_adjust_resource() signed, as it was always supposed to be. MFC after: 1 month Sponsored by: The FreeBSD Foundation	2015-12-13 11:21:13 +00:00
trasz	3430b87794	Avoid useless relocking. MFC after: 1 month Sponsored by: The FreeBSD Foundation	2015-12-13 11:08:29 +00:00
markj	98fd4878e0	Don't make assertions about td_critnest when the scheduler is stopped. A panicking thread always executes with a critical section held, so any attempt to allocate or free memory while dumping will otherwise cause a second panic. This can occur, for example, if xpt_polled_action() completes non-dump I/O that was pending at the time of the panic. The fact that this can occur is itself a bug, but asserting in this case does little but reduce the reliability of kernel dumps. Suggested by: kib Reported by: pho	2015-12-11 20:05:07 +00:00
imp	b4d51d26ba	Create the MDT_PNP_INFO metadata record to communicate PNP info about modules. External agents may use this data to automatically load those modules. Differential Review: https://reviews.freebsd.org/D3461	2015-12-11 05:27:53 +00:00

1 2 3 4 5 ...

14846 Commits