freebsd-skq

Author	SHA1	Message	Date
kib	03d3355a71	Arm and arm64 both have fueword() implemented for some time. Correct the comment. Sponsored by: The FreeBSD Foundation	2016-04-20 17:28:21 +00:00
pfg	32dcf3933a	Indentation issues. Contract some lines leftover from r298310. Mea culpa.	2016-04-20 16:19:44 +00:00
cem	81f4ce8db7	kern_rctl: Fix resource leak in error path Ordinarily, rctl_write_outbuf frees 'sb'. However, if we are in low memory conditions we skip past the rctl_write_outbuf. In that case, free 'sb'. Reported by: Coverity CID: 1338539 Sponsored by: EMC / Isilon Storage Division	2016-04-20 02:09:38 +00:00
pfg	a7d40a88c9	kernel: use our nitems() macro when it is available through param.h. No functional change, only trivial cases are done in this sweep, Discussed in: freebsd-current	2016-04-19 23:48:27 +00:00
trasz	8298725669	Fix debugging printf. MFC after: 1 month Sponsored by: The FreeBSD Foundation	2016-04-19 13:36:31 +00:00
kib	b69824cf4d	Fix umtx lock/trylock for compat32. Sponsored by: The FreeBSD Foundation MFC after: 3 days	2016-04-19 11:37:43 +00:00
markj	5af0054863	Use a loop instead of a goto in sysctl_kern_proc_kstack(). MFC after: 3 days	2016-04-17 23:22:32 +00:00
kib	f16910a47e	The struct thread td_estcpu member is only used by the 4BSD scheduler. Move it to the struct td_sched for 4BSD, removing always present field, otherwise unused for ULE. New scheduler method sched_estcpu() returns the estimation for kinfo_proc consumption. As before, it always returns 0 for ULE. Remove sched_tick() scheduler method, unused both by 4BSD and ULE. Update locking comment for the 4BSD struct td_sched, copying it from the same comment for ULE. Spell MAXPRI as PRI_MAX_TIMESHARE in the 4BSD comment. Based on some notes from, and reviewed by: bde Sponsored by: The FreeBSD Foundation	2016-04-17 11:04:27 +00:00
cem	98188ed5c2	Add 4Kn kernel dump support (And 4Kn minidump support, but only for amd64.) Make sure all I/O to the dump device is of the native sector size. To that end, we keep a native sector sized buffer associated with dump devices (di->blockbuf) and use it to pad smaller objects as needed (e.g. kerneldumpheader). Add dump_write_pad() as a convenience API to dump smaller objects with zero padding. (Rather than pull in NPM leftpad, we wrote our own.) Savecore(1) has been updated to deal with these dumps. The format for 512-byte sector dumps should remain backwards compatible. Minidumps for other architectures are left as an exercise for the reader. PR: 194279 Submitted by: ambrisko@ Reviewed by: cem (earlier version), rpokala Tested by: rpokala (4Kn/512 except 512 fulldump), cem (512 fulldump) Relnotes: yes Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D5848	2016-04-15 17:45:12 +00:00
pfg	5b3421712d	kern: for pointers replace 0 with NULL. These are mostly cosmetical, no functional change. Found with devel/coccinelle.	2016-04-15 16:10:11 +00:00
trasz	d4ed08909e	Allocate RACCT/RCTL zones without UMA_ZONE_NOFREE; no idea why it was there in the first place. MFC after: 1 month Sponsored by: The FreeBSD Foundation	2016-04-15 13:34:59 +00:00
trasz	d690bf0a4e	Sort variable declarations. MFC after: 1 month Sponsored by: The FreeBSD Foundation	2016-04-15 11:55:29 +00:00
imp	edc06f5852	Create wrappers for uint64_t and int64_t for the tunables. While not strictly necessary, it is more convenient.	2016-04-15 03:09:55 +00:00
jamie	e3a9ee4ccf	Clean up some style(9) violations.	2016-04-14 17:07:26 +00:00
jamie	3c6ae3fb05	Separate POSIX mqueue objects in jails; actually, separate them by the jail's root, so jails that don't have their own filesystem directory also won't have their own mqueue namespace. PR: 208082	2016-04-13 20:15:49 +00:00
jamie	384be5b5ff	Separate POSIX sem/shm objects in jails, by prepending the jail's path name to the object's "path". While the objects don't have real path names, it's a filesystem-like namespace, which allows jails to be kept to their own space, but still allows the system / jail parent to access a jail's IPC. PR: 208082	2016-04-13 20:14:13 +00:00
trasz	611644daf2	Fix overflow checking. There are some other potential problems related to overflowing racct counters; I'll revisit those later. Submitted by: Pieter de Goeje (earlier version) Reviewed by: emaste@ MFC after: 1 month Sponsored by: The FreeBSD Foundation	2016-04-12 18:13:24 +00:00
pfg	b63211eed5	Cleanup unnecessary semicolons from the kernel. Found with devel/coccinelle.	2016-04-10 23:07:00 +00:00
jhb	454f6ff2fd	Add a function to lookup a device_t object by name. This just walks the global list of devices looking for one with the requested name. The one use case outside of devctl2's implementation is for DDB commands that wish to lookup devices by name.	2016-04-10 05:05:02 +00:00
jhb	6beb82443a	Add more fine-grained kernel options for NUMA support. VM_NUMA_ALLOC is used to enable use of domain-aware memory allocation in the virtual memory system. DEVICE_NUMA is used to enable affinity reporting for devices such as bus_get_domain(). MAXMEMDOM must still be set to a value greater than for any NUMA support to be effective. Note that 'cpuset -gd' always works if MAXMEMDOM is enabled and the system supports NUMA. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D5782	2016-04-09 13:58:04 +00:00
bz	9c35ad4130	Make the KASSERT message in hash destroy more informative. While the pointer might not be too helpful, the malloc type might at least give a good hint about which hashtbl we are talking. MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Reviewed by: gnn, emaste Differential Revision: https://reviews.freebsd.org/D5802	2016-04-09 09:24:05 +00:00
trasz	fd767bd07b	Make it possible to tweak RCTL throttling sysctls at runtime. MFC after: 1 month Sponsored by: The FreeBSD Foundation	2016-04-08 18:15:31 +00:00
avg	fd12934507	topo_set_pu_id: turn a check into an assertion The new id must not be present in any cpu set in any topology element. MFC after: 30 days	2016-04-08 11:59:11 +00:00
kib	7c6ea895ee	Use the ABI-prescribed name for SHT_X86_64_UNWIND in the loader and kernel linker, after the r297686. Sponsored by: The FreeBSD Foundation	2016-04-08 10:23:48 +00:00
skra	b67819482f	Fix intr_irq_shuffle(). After r297539, ISRCs doing IPI may be also registered into global interrupt table. Thus, they must be filtered out like per-cpu interrupts. Fortunately, it does not influence anything on interrupt controllers which already use INTRNG.	2016-04-07 15:16:33 +00:00
skra	1e6a6a2cd5	Implement intr_isrc_init_on_cpu() and use it to replace very same code implemented in every interrupt controller driver running SMP. This function returns true, if provided ISRC should be enabled on given cpu.	2016-04-07 15:00:25 +00:00
trasz	825d80e01c	Add four new RCTL resources - readbps, readiops, writebps and writeiops, for limiting disk (actually filesystem) IO. Note that in some cases these limits are not quite precise. It's ok, as long as it's within some reasonable bounds. Testing - and review of the code, in particular the VFS and VM parts - is very welcome. MFC after: 1 month Relnotes: yes Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D5080	2016-04-07 04:23:25 +00:00
skra	b96ba003d0	Fix PIC lookup by device and xref. There was not taken into account the situation that someone has a pointer to device but not its xref. This situation is regular now, after r297539.	2016-04-06 12:48:45 +00:00
trasz	cd0a56084a	Use proper locking macros in RACCT in RCTL. MFC after: 1 month Sponsored by: The FreeBSD Foundation	2016-04-05 11:30:52 +00:00
avg	b2f81dcbd7	x86 topo: add some comments, descriptions and references to documentation Plus a minor cosmetic change. MFC after: 1 month	2016-04-05 10:36:40 +00:00
avg	14341afcfc	new x86 smp topology detection code Previously, the code determined a topology of processing units (hardware threads, cores, packages) and then deduced a cache topology using certain assumptions. The new code builds a topology that includes both processing units and caches using the information provided by the hardware. At the moment, the discovered full topology is used only to creeate a scheduling topology for SCHED_ULE. There is no KPI for other kernel uses. Summary: - based on APIC ID derivation rules for Intel and AMD CPUs - can handle non-uniform topologies - requires homogeneous APIC ID assignment (same bit widths for ID components) - topology for dual-node AMD CPUs may not be optimal - topology for latest AMD CPU models may not be optimal as the code is several years old - supports only thread/package/core/cache nodes Todo: - AMD dual-node processors - latest AMD processors - NUMA nodes - checking for homogeneity of the APIC ID assignment across packages - more flexible cache placement within topology - expose topology to userland, e.g., via sysctl nodes Long term todo: - KPI for CPU sharing and affinity with respect to various resources (e.g., two logical processors may share the same FPU, etc) Reviewed by: mav Tested by: mav MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D2728	2016-04-04 16:09:29 +00:00
andrew	813a9f775b	Include sys/rman.h directly rather than relying on header pollution. Obtained from: ABT Systems Ltd Sponsored by: The FreeBSD Foundation	2016-04-04 10:52:43 +00:00
skra	11cdd44a03	Remove FDT specific parts from INTRNG. Change its interface to make it universal. (1) New struct intr_map_data is defined as a container for arbitrary description of an interrupt used by a device. Typically, an interrupt number and configuration relevant to an interrupt controller is encoded in such description. However, any additional information may be encoded too like a set of cpus on which an interrupt should be enabled or vendor specific data needed for setup of an interrupt in controller. The struct intr_map_data itself is meant to be opaque for INTRNG. (2) An intr_map_irq() function is created which takes an interrupt controller identification and struct intr_map_data as arguments and returns global interrupt number which identifies an interrupt. (3) A set of functions to be used by bus drivers is created as well as a corresponding set of methods for interrupt controller drivers. These sets take both struct resource and struct intr_map_data as one of the arguments. There is a goal to keep struct intr_map_data in struct resource, however, this way a final solution is not limited to that. (4) Other small changes are done to reflect new situation. This is only first step aiming to create stable interface for interrupt controller drivers. Thus, some temporary solution is taken. Interrupt descriptions for devices are stored in INTRNG and two specific mapping function are created to be temporary used by bus drivers. That's why the struct intr_map_data is not opaque for INTRNG now. This temporary solution will be replaced by final one in next step. Differential Revision: https://reviews.freebsd.org/D5730	2016-04-04 09:15:25 +00:00
trasz	e0e88029f1	Add configurable rate limit for "log" and "devctl" actions. MFC after: 1 month Sponsored by: The FreeBSD Foundation	2016-04-02 09:11:52 +00:00
trasz	530be67ef4	Fix mismerge. MFC after: 1 month Sponsored by: The FreeBSD Foundation	2016-04-01 18:45:04 +00:00
trasz	a36f880ad3	Drop the 'resource' argument to racct_decay(); it wouldn't make sense to iterate separately for each resource. MFC after: 1 month Sponsored by: The FreeBSD Foundation	2016-04-01 18:36:10 +00:00
jhb	a0ef1d0d15	Cap IOSIZE_MAX to INT_MAX for 32-bit processes. Previously, freebsd32 binaries could submit read/write requests with lengths greater than INT_MAX that a native kernel would have rejected. Reviewed by: kib Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D5788	2016-04-01 18:29:38 +00:00
trasz	02eee7b047	Call rctl_enforce() in all cases the resource usage goes up, even when called from racct_*_force() functions. It makes the "log" and "devctl" actions work in those cases. MFC after: 1 month Sponsored by: The FreeBSD Foundation	2016-04-01 17:28:55 +00:00
trasz	4f9a63f201	Reorder the functions; no functional changes. MFC after: 1 month Sponsored by: The FreeBSD Foundation	2016-04-01 17:21:55 +00:00
trasz	4aa469be24	Reduce code duplication. MFC after: 1 month Sponsored by: The FreeBSD Foundation	2016-04-01 17:17:32 +00:00
trasz	5f827ef346	Reduce code duplication. There should be no (intended) functional changes. MFC after: 1 month Sponsored by: The FreeBSD Foundation	2016-04-01 17:05:46 +00:00
sbruno	e9ae7cf74d	Repair a overflow condition where a user could submit a string that was not getting a proper bounds check. Thanks to CTurt for pointing at this with a big red blinking neon sign. PR: 206761 Submitted by: sson Reviewed by: cturt@hardenedbsd.org MFC after: 3 days	2016-04-01 16:16:26 +00:00
jhb	b4f65d818d	Rework handling of thread sleeps before timers are working. Previously, calls to sleep() and cv_wait*() immediately returned during early boot. Instead, permit threads that request a sleep without a timeout to sleep as wakeup() works during early boot. Sleeps with timeouts are harder to emulate without working timers, so just punt and panic explicitly if any thread tries to use those before timers are working. Any threads that depend on timeouts should either wait until SI_SUB_KICK_SCHEDULER to start or they should use DELAY() until timers are available. Until APs are started earlier this should be a no-op as other kthreads shouldn't get a chance to start running until after timers are working regardless of when they were created. Reviewed by: kib Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D5724	2016-03-31 18:10:29 +00:00
trasz	fe839a29fe	Refactor; no functional changes. MFC after: 1 month Sponsored by: The FreeBSD Foundation	2016-03-31 17:32:28 +00:00
jhb	df609e0cf4	Tidy up the unmapped I/O code in qphysio. - Move some blocks around to reduce the number of 'if (unmap)' checks. - Use 'pbuf == NULL' instead of 'unmap'. - Use nitems. - Pull an assignment out of an if expression. Reviewed by: kib Sponsored by: Chelsio Communications	2016-03-31 17:27:30 +00:00
trasz	1191e72322	Fix overflows, making it impossible to add negative amounts using rctl(8). MFC after: 1 month Sponsored by: The FreeBSD Foundation	2016-03-31 17:00:47 +00:00
jamie	3461615130	Add osd_reserve() and osd_set_reserved(), which allow M_WAITOK allocation of an OSD array,	2016-03-30 16:57:28 +00:00
glebius	e05176a63d	The sendfile(2) allows to send extra data from userspace before the file data (headers). Historically the size of the headers was not checked against the socket buffer space. Application could easily overcommit the socket buffer space. With the new sendfile (r293439) the problem remained, but a KASSERT was inserted that checked that amount of data written to the socket matches its space. In case when size of headers is bigger that socket space, KASSERT fires. Without INVARIANTS the new sendfile won't panic, but would report incorrect amount of bytes sent. o With this change, the headers copyin is moved down into the cycle, after the sbspace() check. The uio size is trimmed by socket space there, which fixes the overcommit problem and its consequences. o The compatibility handling for FreeBSD 4 sendfile headers API is pushed up the stack to syscall wrappers. This required a copy and paste of the code, but in turn this allowed to remove extra stack carried parameter from fo_sendfile_t, and embrace entire compat code into #ifdef. If in future we got more fo_sendfile_t function, the copy and paste level would even reduce. Reviewed by: emax, gallatin, Maxim Dounin <mdounin mdounin.ru> Tested by: Vitalij Satanivskij <satan ukr.net> Sponsored by: Netflix	2016-03-29 19:57:11 +00:00
trasz	ca92bb3067	Remove some NULL checks for M_WAITOK allocations. MFC after: 1 month Sponsored by: The FreeBSD Foundation	2016-03-29 13:56:59 +00:00
jamie	63095618a2	Move the various per-type arrays of OSD data into a single structure array.	2016-03-28 22:18:37 +00:00
imp	6441be832c	Move pccard_safe_quote() up to subr_bus.c and rename to devctl_safe_quote() so it can be used more generally.	2016-03-28 20:16:29 +00:00
np	0b3b29f07b	Plug leak in m_unshare. m_unshare passes on the source mbuf's flags as-is to m_getcl and this results in a leak if the flags include M_NOFREE. The fix is to clear the bits not listed in M_COPYALL before calling m_getcl. M_RDONLY should probably be filtered out too but that's outside the scope of this fix. Add assertions in the zone_mbuf and zone_pack ctors to catch similar bugs. Update netmap_get_mbuf to not pass M_NOFREE to m_getcl. It's not clear what the original code was trying to do but it's likely incorrect. Updated code is no different functionally but it avoids the newly added assertions. Reviewed by: gnn@ Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D5698	2016-03-26 23:39:53 +00:00
cem	96eb07ae2b	Add td_swinvoltick to track last involuntary context switch Expose in DDB via "show thread." Reviewed by: markj Sponsored by: EMC / Isilon Storage Division	2016-03-25 19:35:29 +00:00
glebius	fa5cb70a2d	Space and style(9) corrections for recent mbuf changes.	2016-03-24 20:06:52 +00:00
skra	2683d49bfb	Generalize IPI support for ARM intrng and use it for interrupt controller IPI provider. New struct intr_ipi is defined which keeps all info about an IPI: its name, counter, send and dispatch methods. Generic intr_ipi_setup(), intr_ipi_send() and intr_ipi_dispatch() functions are implemented. An IPI provider must implement two functions: (1) an intr_ipi_send_t function which is able to send an IPI, (2) a setup function which initializes itself for an IPI and calls intr_ipi_setup() with appropriate arguments. Differential Revision: https://reviews.freebsd.org/D5700	2016-03-24 09:55:11 +00:00
gnn	77a2ccb155	Move mbuf provider under SDT to indicate that it is FreeBSD specific and not a stable interface. Reviewed by: markj MFC after: 2 weeks Sponsored by: Rubicon Communications (Netgate) Differential Revision: https://reviews.freebsd.org/D5716	2016-03-24 08:26:06 +00:00
bdrewery	a9f48f4d56	Pass the expected struct radix_node_head * to vfs_free_netcred. No functional change. struct radix_node_head's first element is rh so this was already referring to the same address. It was likely an unintended s/rnh/&rnh->rh/ change from r294706 as all other rnh_walktree() callers pass the expected struct radix_node_head * rather than obscurely passing the address of their first element. Sponsored by: EMC / Isilon Storage Division	2016-03-24 04:40:07 +00:00
bdrewery	91bc17507f	Fix M_RTABLE memory leak from r274118 (11/2014). Replace free(M_RTABLE) with rn_detachhead() to match rn_inithead(). This would trigger when reloading NFS exports and was similar to problems with pf reload [1]. PR: 194078 [1] Sponsored by: EMC / Isilon Storage Division	2016-03-24 03:08:39 +00:00
trasz	1c2e36026b	Wait for root mount tokens before showing the root mount prompt. This restores the pre-r290196 behaviour, eliminating the need to manually press '.' a couple of times to get USB to finish probing. Note that there's still something wrong with the console (character echoing doesn't quite work), and there's also a reported problem with BHyVe, but those two don't seem related to the problem above. MFC after: 1 month Sponsored by: The FreeBSD Foundation	2016-03-22 13:46:01 +00:00
gnn	e4786b4992	Add an mbuf provider to DTrace. The mbuf provider is made up of a set of Statically Defined Tracepoints which help us look into mbufs as they are allocated and freed. This can be used to inspect the buffers or for a simplified mbuf leak detector. New tracepoints are: mbuf:::m-init mbuf:::m-gethdr mbuf:::m-get mbuf:::m-getcl mbuf:::m-clget mbuf:::m-cljget mbuf:::m-cljset mbuf:::m-free mbuf:::m-freem There is also a translator for mbufs which gives some visibility into the structure, see mbuf.d for more details. Reviewed by: bz, markj MFC after: 2 weeks Sponsored by: Rubicon Communications (Netgate) Differential Revision: https://reviews.freebsd.org/D5682	2016-03-22 13:16:52 +00:00
jhb	ccef35d926	Regen.	2016-03-21 21:38:35 +00:00
jhb	6f8f2fe586	Fully handle size_t lengths in AIO requests. First, update the return types of aio_return() and aio_waitcomplete() to ssize_t. POSIX requires aio_return() to return a ssize_t so that it can represent all return values from read() and write(). aio_waitcomplete() should use ssize_t for the same reason. aio_return() has used ssize_t in <aio.h> since r31620 but the manpage and system call entry were not updated. aio_waitcomplete() has always returned int. Note that this does not require new system call stubs as this is effectively only an API change in how the compiler interprets the return value. Second, allow aio_nbytes values up to IOSIZE_MAX instead of just INT_MAX. aio_read/write should now honor the same length limits as normal read/write. Third, use longs instead of ints in the aio_return() and aio_waitcomplete() system call functions so that the 64-bit size_t in the in-kernel aiocb isn't truncated to 32-bits before being copied out to userland or being returned. Finally, a simple test has been added to verify the bounds checking on the maximum read size from a file.	2016-03-21 21:37:33 +00:00
maxim	86d336784d	o "avaliable" -> "available". PR: 208141 Submitted by: Tyler Littlefield	2016-03-21 08:03:50 +00:00
pfg	eea894a1e2	aio_qphysio(): Avoid uninitialized pointer read on error. For the !unmap case it may happen that pbuf gets called unreferenced when vm_fault_quick_hold_pages() fails. Initialize it so it doesn't cause trouble. CID: 1352776 Reviewed by: jhb MFC after: 1 week	2016-03-18 19:04:01 +00:00
jhibbits	720f47c9ed	Use uintmax_t (typedef'd to rman_res_t type) for rman ranges. On some architectures, u_long isn't large enough for resource definitions. Particularly, powerpc and arm allow 36-bit (or larger) physical addresses, but type `long' is only 32-bit. This extends rman's resources to uintmax_t. With this change, any resource can feasibly be placed anywhere in physical memory (within the constraints of the driver). Why uintmax_t and not something machine dependent, or uint64_t? Though it's possible for uintmax_t to grow, it's highly unlikely it will become 128-bit on 32-bit architectures. 64-bit architectures should have plenty of RAM to absorb the increase on resource sizes if and when this occurs, and the number of resources on memory-constrained systems should be sufficiently small as to not pose a drastic overhead. That being said, uintmax_t was chosen for source clarity. If it's specified as uint64_t, all printf()-like calls would either need casts to uintmax_t, or be littered with PRI64 macros. Casts to uintmax_t aren't horrible, but it would also bake into the API for resource_list_print_type() either a hidden assumption that entries get cast to uintmax_t for printing, or these calls would need the PRI64 macros. Since source code is meant to be read more often than written, I chose the clearest path of simply using uintmax_t. Tested on a PowerPC p5020-based board, which places all device resources in 0xfxxxxxxxx, and has 8GB RAM. Regression tested on qemu-system-i386 Regression tested on qemu-system-mips (malta profile) Tested PAE and devinfo on virtualbox (live CD) Special thanks to bz for his testing on ARM. Reviewed By: bz, jhb (previous) Relnotes: Yes Sponsored by: Alex Perez/Inertial Computing Differential Revision: https://reviews.freebsd.org/D4544	2016-03-18 01:28:41 +00:00
cem	6b40e40026	fail(9): Only gather/print stacks if STACK is enabled This is a follow-up fix to the earlier r296927. Reported by: bz Sponsored by: EMC / Isilon Storage Division	2016-03-17 01:05:53 +00:00
cem	1cab282ecb	fail(9): Upstreaming some fail point enhancements This is several year's worth of fail point upgrades done at EMC Isilon. They are interdependent enough that it makes sense to put a single diff up for them. Primarily, we added: - Changing all mainline execution paths to be lockless, which lets us use fail points in more sleep-sensitive areas, and allows more parallel execution - A number of additional commands, including 'pause' that lets us do some interesting deterministic repros of race conditions - The ability to dump the stacks of all threads sleeping on a fail point - A number of other API changes to allow marking up the fail point's context in the code, and firing callbacks before and after execution - A man page update Submitted by: Matthew Bryan <matthew.bryan@isilon.com> Reviewed by: cem (earlier version), jhb, kib, pho With feedback from: bdrewery Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D5427	2016-03-16 04:22:32 +00:00
glebius	743cf42b4f	Free the temporary buffer in sysctl_handle_counter_u64_array(). Submitted by: mjg	2016-03-15 00:21:32 +00:00
glebius	0cafb74055	Provide sysctl(9) macro to deal with array of counter(9).	2016-03-15 00:05:00 +00:00
gibbs	453fdfd69a	Provide high precision conversion from ns,us,ms -> sbintime in kevent In timer2sbintime(), calculate the second and fractional second portions of the sbintime separately. When calculating the the fractional second portion, use a 64bit multiply to prevent excess truncation. This avoids the ~7% error in the original conversion for ns, and smaller errors of the same type for us and ms. PR: 198139 Reviewed by: jhb MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D5397	2016-03-12 23:02:53 +00:00
jhb	419b17235a	Do not include system call wrappers in libc for old FreeBSD system calls. The base system libc is only used to run binaries built on FreeBSD 7.0 and later. It does not need to include system call wrappers for system calls only used by FreeBSD binaries built on versions older than 7.0. This was already true for "COMPAT" system calls, but now wrappers for system calls used on FreeBSD 4 and 6 are excluded as well. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D5597	2016-03-12 22:53:46 +00:00
trasz	8804916675	Refactor the way we restore cn_lkflags; no functional changes. MFC after: 1 month Sponsored by: The FreeBSD Foundation	2016-03-12 09:05:43 +00:00
trasz	beb648d9cc	Remove cn_consume from 'struct componentname'. It was never set to anything other than 0. Reviewed by: kib@ MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D5611	2016-03-12 08:50:38 +00:00
trasz	faec271eeb	Fix autofs triggering problem. Assume you have an NFS server, 192.168.1.1, with share "share". This commit fixes a problem where "mkdir /net/192.168.1.1/share/meh" would return spurious error instead of creating the directory if the target filesystem wasn't mounted yet; subsequent attempts would work correctly. The failure scenario is kind of complicated to explain, but it all boils down to calling VOP_MKDIR() for the target filesystem (NFS) with wrong dvp - the autofs vnode instead of the filesystem root mounted over it. Reviewed by: kib@ MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D5442	2016-03-12 07:54:42 +00:00
jhb	374ffbde40	Use SI_SUB_LAST instead of SI_SUB_SMP as the "catch-all" subsystem. Reviewed by: kib Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D5515	2016-03-11 23:18:06 +00:00
jhb	96e88fd872	Regen.	2016-03-09 19:06:46 +00:00
jhb	1b87e4306e	Simplify AIO initialization now that it is standard. - Mark AIO system calls as STD and remove the helpers to dynamically register them. - Use COMPAT6 for the old system calls with the older sigevent instead of an 'o' prefix. - Simplify the POSIX configuration to note that AIO is always available. - Handle AIO in the default VOP_PATHCONF instead of special casing it in the pathconf() system call. fpathconf() is still hackish. - Remove freebsd32_aio_cancel() as it just called the native one directly. Reviewed by: kib Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D5589	2016-03-09 19:05:11 +00:00
kib	1e048a7127	Convert all panics from the link_elf_obj kernel linker for object files format into printfs and errors to caller. Some leaks of resources are there, but the same leaks are present in other error pathes. With the change, the kernel at least boots even when module with unexpected or corrupted ELF structure is preloaded. Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-03-07 18:44:06 +00:00
kib	ffbdd975f0	In the link_elf_obj.c, handle sections of type SHT_AMD64_UNWIND same as SHT_PROGBITS. This is needed after the clang 3.8 import, which generates that type for .eh_frame section, which had SHT_PROGBITS type before. Reported by: Nikolai Lifanov <lifanov@mail.lifanov.com> PR: 207729 Tested by: dim (previous version) Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-03-06 00:31:11 +00:00
jhibbits	70aaabfeac	Replace all resource occurrences of '0UL/~0UL' with '0/~0'. Summary: The idea behind this is '~0ul' is well-defined, and casting to uintmax_t, on a 32-bit platform, will leave the upper 32 bits as 0. The maximum range of a resource is 0xFFF.... (all bits of the full type set). By dropping the 'ul' suffix, C type promotion rules apply, and the sign extension of ~0 on 32 bit platforms gets it to a type-independent 'unsigned max'. Reviewed By: cem Sponsored by: Alex Perez/Inertial Computing Differential Revision: https://reviews.freebsd.org/D5255	2016-03-03 05:07:35 +00:00
kib	486320aac4	If callout_stop_safe() noted that the callout is currently executing, but next invocation is cancelled while migrating, sleepq_check_timeout() needs to be informed that the callout is stopped. Otherwise the thread switches off CPU and never become runnable, since running callout could have already raced with us, while the migrating and cancelled callout could be one which is expected to set TDP_TIMOFAIL flag for us. This contradicts with the expected behaviour of callout_stop() for other callers, which e.g. decrement references from the callout callbacks. Add a new flag CS_MIGRBLOCK requesting report of the situation as 'successfully stopped'. Reviewed by: jhb (previous version) Tested by: cognet, pho PR: 200992 Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D5221	2016-03-02 18:46:17 +00:00
glebius	a6bfbefef5	Fix regression in r296242 affecting several drivers. For EXT_NET_DRV, EXT_MOD_TYPE, EXT_DISPOSABLE types we should first execute the free callback, then free the mbuf, otherwise we will derefernce memory that was just freed. Reported and tested: jhibbits	2016-03-02 02:12:01 +00:00
bdrewery	199dda9a01	Correct a comment.	2016-03-01 23:58:53 +00:00
jhb	58823d0b49	Use SCHEDULER_STOPPED() in cv_wait() instead of checking panicstr. Reviewed by: kib MFC after: 1 month Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D5516	2016-03-01 22:51:44 +00:00
jhb	be47bc68fb	Refactor the AIO subsystem to permit file-type-specific handling and improve cancellation robustness. Introduce a new file operation, fo_aio_queue, which is responsible for queueing and completing an asynchronous I/O request for a given file. The AIO subystem now exports library of routines to manipulate AIO requests as well as the ability to run a handler function in the "default" pool of AIO daemons to service a request. A default implementation for file types which do not include an fo_aio_queue method queues requests to the "default" pool invoking the fo_read or fo_write methods as before. The AIO subsystem permits file types to install a private "cancel" routine when a request is queued to permit safe dequeueing and cleanup of cancelled requests. Sockets now use their own pool of AIO daemons and service per-socket requests in FIFO order. Socket requests will not block indefinitely permitting timely cancellation of all requests. Due to the now-tight coupling of the AIO subsystem with file types, the AIO subsystem is now a standard part of all kernels. The VFS_AIO kernel option and aio.ko module are gone. Many file types may block indefinitely in their fo_read or fo_write callbacks resulting in a hung AIO daemon. This can result in hung user processes (when processes attempt to cancel all outstanding requests during exit) or a hung system. To protect against this, AIO requests are only permitted for known "safe" files by default. AIO requests for all file types can be enabled by setting the new vfs.aio.enable_usafe sysctl to a non-zero value. The AIO tests have been updated to skip operations on unsafe file types if the sysctl is zero. Currently, AIO requests on sockets and raw disks are considered safe and are enabled by default. aio_mlock() is also enabled by default. Reviewed by: cem, jilles Discussed with: kib (earlier version) Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D5289	2016-03-01 18:12:14 +00:00
jhb	15b2caff0f	Remove taskqueue_enqueue_fast(). taskqueue_enqueue() was changed to support both fast and non-fast taskqueues 10 years ago in r154167. It has been a compat shim ever since. It's time for the compat shim to go. Submitted by: Howard Su <howard0su@gmail.com> Reviewed by: sephe Differential Revision: https://reviews.freebsd.org/D5131	2016-03-01 17:47:32 +00:00
skra	1cb2a75f09	Remove an alternative way for dealing with root interrupt controller which is not complete. Likely, it was forgotten and not removed before committing.	2016-03-01 11:27:58 +00:00
skra	caf6d91084	Mark other parts of interrupt framework as INTR_SOLO option specific. Note that isrc_arg member of struct intr_irqsrc is used only for INTR_SOLO and IPI filter. This should be remembered if IPI filters and their arguments will be stored on another place. This option could be unusable very soon, if interrupt controllers implementations will not be implemented considering it.	2016-03-01 10:57:29 +00:00
glebius	163857deb4	New way to manage reference counting of mbuf external storage. The m_ext.ext_cnt pointer becomes a union. It can now hold the refcount value itself. To tell that m_ext.ext_flags flag EXT_FLAG_EMBREF is used. The first mbuf to attach a cluster stores the refcount. The further mbufs to reference the cluster point at refcount in the first mbuf. The first mbuf is freed only when the last reference is freed. The benefit over refcounts stored in separate slabs is that now refcounts of different, unrelated mbufs do not share a cache line. For EXT_EXTREF mbufs the zone_ext_refcnt is no longer needed, and m_extadd() becomes void, making widely used M_EXTADD macro safe. For EXT_SFBUF mbufs the sf_ext_ref() is removed, which was an optimization exactly against the cache aliasing problem with regular refcounting. Discussed with: rrs, rwatson, gnn, hiren, sbruno, np Reviewed by: rrs Differential Revision: https://reviews.freebsd.org/D5396 Sponsored by: Netflix	2016-03-01 00:17:14 +00:00
kib	e76eb4255b	Implement process-shared locks support for libthr.so.3, without breaking the ABI. Special value is stored in the lock pointer to indicate shared lock, and offline page in the shared memory is allocated to store the actual lock. Reviewed by: vangyzen (previous version) Discussed with: deischen, emaste, jhb, rwatson, Martin Simmons <martin@lispworks.com> Tested by: pho Sponsored by: The FreeBSD Foundation	2016-02-28 17:52:33 +00:00
skra	27bb203f7c	Move IPI related parts back to (ARM) machine specific file now, when the interrupt framework is also going to be used by another (MIPS) architecture. IPI implementations may vary much across different architectures. An IPI implementation should still define INTR_IPI_COUNT and use intr_ipi_setup_counters() to setup IPI counters which are inside of intrcnt[] and intrnames[] arrays. Those are used for sysctl and ddb. Then, intr_ipi_increment_count() should be used to increment obtained counter. Reviewed by: imp Differential Revision: https://reviews.freebsd.org/D5459	2016-02-27 12:03:07 +00:00
ed	4a923c8cd0	Remove the errno argument from unp_drop(). While there, add a comment to clarify that ECONNRESET should always be returned for POSIX conformance. Suggested by: Steven Hartland	2016-02-26 12:46:34 +00:00
markj	9abb1836d9	Improve error handling for posix_fallocate(2) and posix_fadvise(2). - Set td_errno so that ktrace and dtrace can obtain the syscall error number in the usual way. - Pass negative error numbers directly to the syscall layer, as they're not intended to be returned to userland. Reviewed by: kib Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D5425	2016-02-25 19:58:23 +00:00
ed	0151167359	Make asynchronous connection failures on UNIX sockets fail with ECONNRESET. While making CloudABI work well on Linux, I discovered that I had a FreeBSD-ism in one of my unit tests. The test did the following: - Create UNIX socket 1, bind it, make it listen. - Create UNIX socket 2, connect it to UNIX socket 1. - Close UNIX socket 1. - Obtain SO_ERROR from socket 2. On FreeBSD this returns ECONNABORTED, while on Linux it returns ECONNRESET. I dug through some of the relevant specifications[1] and it looks like Linux is all right here. ECONNABORTED should only be returned when the local connection (socket 2) is aborted; not the peer (socket 1). It is of course slightly misleading: the function in which we set this error is called uipc_abort(), but keep in mind that we're aborting the peer, thus resetting the local socket. [1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/connect.html Reviewed by: cem Sponsored by: Nuxi, the Netherlands Differential Revision: https://reviews.freebsd.org/D5419	2016-02-24 17:10:32 +00:00
kib	392fea70cc	Provide more correct sizing of the KVA consumed by a vnode, used by the virtvnodes calculation. Include the size of fs-specific v_data as the nfs nclnode inline, the NFS nclnode is bigger than either ZFS znode or UFS inode. Include the size of namecache_ts and short cache path element, multiplied by the name cache population factor, again inline. Inline defines are used to avoid pollution of the vnode.h with the subsystem-private objects. Non-significant unsynchronized changes of the definitions are fine, we do not care about that precision, and e.g. ZFS consumes much malloced memory per vnode for reasons unaccounted in the formula. Lower the partition of kmem dedicated to vnodes, from 1/7 to 1/10. The measures reduce vnode cache pressure on kmem and bring the vnode cache memory use below some apparent thresholds that were exceeded by r291244 due to more robust vnode reuse. Reported and tested by: marius (i386, previous version) Reviewed by: bde Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-02-24 15:15:46 +00:00
bdrewery	8a44a735c3	Fix build after r295934.	2016-02-23 23:37:10 +00:00
oshogbo	5256441af8	According to the sys/kern/capabilities.conf, gethostid(3) should be allowed. Pointed out by: Milosz Kaniewski <m.kaniewski@wheelsystems.com> Approved by: pjd (mentor) MFC after: 3 days Sponsored by: Wheel Systems, http://wheelsystems.com	2016-02-23 22:02:25 +00:00
ian	c764d7edd2	Allow a dynamic env to override a compiled-in static env by passing in the override indication in the env data. Submitted by: bde	2016-02-21 18:35:01 +00:00
jhibbits	f8385663ee	Introduce a RMAN_IS_DEFAULT_RANGE() macro, and use it. This simplifies checking for default resource range for bus_alloc_resource(), and improves readability. This is part of, and related to, the migration of rman_res_t from u_long to uintmax_t. Discussed with: jhb Suggested by: marcel	2016-02-20 01:32:58 +00:00
markj	32d1c3375a	Ensure that we test the event condition when a disabled kevent is enabled. r274560 modified kqueue_register() to only test the event condition if the corresponding knote is not disabled. However, this check takes place before the EV_ENABLE flag is used to clear the KN_DISABLED flag on the knote, so enabling a previously-disabled kevent would not result in a notification for a triggered event. This change fixes the problem by testing for EV_ENABLED before possibly checking the event condition. This change also updates a kqueue regression test to exercise this case. PR: 206368 Reviewed by: kib Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D5307	2016-02-19 01:49:33 +00:00

1 2 3 4 5 ...

14846 Commits