freebsd-skq

Author	SHA1	Message	Date
imp	bfc21ccd43	Create deprecation management functions. gone_in(majar, msg); If we're running in FreeBSD major, tell the user this code may be deleted soon. If we're running in FreeBSD major - 1, the the user is deprecated and will be gone in major. Otherwise say nothing. gone_in_dev(dev, major, msg) Just like gone_in, except use device_printf. New tunable / sysctl debug.oboslete_panic: 0 - don't panic, 1 - panic in major or newer , 2 - panic in major - 1 or newer default: 0 if NO_OBSOLETE_CODE is defined, then both of these turn into compile time errors when building for major. Add options NO_OBSOLETE_CODE to kernel build system. This lets us tag code that's going away so users know it will be gone, as well as automatically manage things. Differential Review: https://reviews.freebsd.org/D13818	2018-01-29 00:14:39 +00:00
imp	3506c70696	Add the DF_SUSPENDED flag to flags that are printed.	2018-01-28 05:13:17 +00:00
mckusick	b60c21e66e	For many years the message "fsync: giving up on dirty" has occationally appeared on UFS/FFS filesystems. In some cases it was promptly followed by a panic of "softdep_deallocate_dependencies: dangling deps". This fix should eliminate both of these occurences. Submitted by: Andreas Longwitz <longwitz at incore.de> Reviewed by: kib Tested by: Peter Holm (pho) PR: 225423 MFC after: 1 week	2018-01-26 18:17:11 +00:00
lwhsu	344189d5c0	Fix build for architectures where size_t is not unsigned long Reviewed by: cem Differential Revision: https://reviews.freebsd.org/D14045	2018-01-25 06:37:14 +00:00
cem	2b1bc6707d	malloc(9): Change nominal size to size_t to match standard C No functional change -- size_t matches unsigned long on all platforms. Reported by: bde Discussed with: jhb Sponsored by: Dell EMC Isilon	2018-01-24 19:37:18 +00:00
wma	86a31a6bb4	Reverting r328320	2018-01-24 13:57:01 +00:00
wma	d8d083c4f2	ULE: provide defaults to ts_cpu Fix a bug when the system has no CPU 0. When created, threads were implicitly assigned to CPU 0. This had no practical effect since a real CPU was chosen immediately by the scheduler. However, on systems without a CPU 0, sched_ule attempted to access the scheduler queue of the "old" CPU when assigned the initial choice of the old one. This caused an attempt to use illegal memory and a crash (or, more usually, a deadlock). Fix this by assigned new threads to the BSP explicitly and add some asserts to see that this problem does not recur. Authored by: Nathan Whitehorn <nwhitehorn@freebsd.org> Submitted by: Wojciech Macek <wma@semihalf.com> Obtained from: Semihalf Differential revision: https://reviews.freebsd.org/D13932	2018-01-24 07:54:05 +00:00
pfg	6f66677652	Forgot to sort here in r328238.	2018-01-22 02:26:10 +00:00
pfg	f0c6025eb6	Unsign some values related to allocation. When allocating memory through malloc(9), we always expect the amount of memory requested to be unsigned as a negative value would either stand for an error or an overflow. Unsign some values, found when considering the use of mallocarray(9), to avoid unnecessary casting. Also consider that indexes should be of at least the same size/type as the upper limit they pretend to index. MFC after: 3 weeks	2018-01-22 02:08:10 +00:00
pfg	ced875130d	Revert r327828, r327949, r327953, r328016-r328026, r328041: Uses of mallocarray(9). The use of mallocarray(9) has rocketed the required swap to build FreeBSD. This is likely caused by the allocation size attributes which put extra pressure on the compiler. Given that most of these checks are superfluous we have to choose better where to use mallocarray(9). We still have more uses of mallocarray(9) but hopefully this is enough to bring swap usage to a reasonable level. Reported by: wosch PR: 225197	2018-01-21 15:42:36 +00:00
nwhitehorn	e79f2b9178	Remove SFBUF_OPTIONAL_DIRECT_MAP and such hacks, replacing them across the kernel by PHYS_TO_DMAP() as previously present on amd64, arm64, riscv, and powerpc64. This introduces a new MI macro (PMAP_HAS_DMAP) that can be evaluated at runtime to determine if the architecture has a direct map; if it does not (or does) unconditionally and PMAP_HAS_DMAP is either 0 or 1, the compiler can remove the conditional logic. As part of this, implement PHYS_TO_DMAP() on sparc64 and mips64, which had similar things but spelled differently. 32-bit MIPS has a partial direct-map that maps poorly to this concept and is unchanged. Reviewed by: kib Suggestions from: marius, alc, kib Runtime tested on: amd64, powerpc64, powerpc, mips64	2018-01-19 17:46:31 +00:00
avg	d57a913b89	correct read-ahead calculations in vfs_bio_getpages Previously the calculations were done as if the requested region ended at the start of the last requested page, not its end. The problem as actually quite minor as it affected only stats and page prefaulting, not the actual page data, and only with specific parameters. Reviewed by: kib (previous version) MFC after: 2 weeks	2018-01-18 12:59:04 +00:00
wma	e93d0fe0cf	KDB: restart only CPUs stopped by KDB There is a case when not all CPUs went online. In that situation, restart only APs which were operational before entering KDB. Created by: Wojciech Macek <wma@semihalf.com> Obtained from: Semihalf Reviewed by: nwhitehorn Differential revision: https://reviews.freebsd.org/D13949 Sponsored by: QCM Technologies	2018-01-18 07:38:54 +00:00
jhb	1248834695	Require the SHF_ALLOC flag for program sections from kernel object modules. ELF object files can contain program sections which are not supposed to be loaded into memory (e.g. .comment). Normally the static linker uses these flags to decide which sections are allocated to loadable program segments in ELF binaries and shared objects (including kernels on all architectures and kernel modules on architectures other than amd64). Mapping ELF object files (such as amd64 kernel modules) into memory directly is a bit of a grey area. ELF object files are intended to be used as inputs to the static linker. As a result, there is not a standardized definition for what the memory layout of an ELF object should be (none of the section headers have valid virtual memory addresses for example). The kernel and loader were not checking the SHF_ALLOC flag but loading any program sections with certain types such as SHT_PROGBITS. As a result, the kernel and loader would load into RAM some sections that weren't marked with SHF_ALLOC such as .comment that are not loaded into RAM for kernel modules on other architectures (which are implemented as ELF shared objects). Aside from possibly requiring slightly more RAM to hold a kernel module this does not affect runtime correctness as the kernel relocates symbols based on the layout it uses. Debuggers such as gdb and lldb do not extract symbol tables from a running process or kernel. Instead, they replicate the memory layout of ELF executables and shared objects and use that to construct their own symbol tables. For executables and shared objects this works fine. For ELF objects the current logic in kgdb (and probably lldb based on a simple reading) assumes that only sections with SHF_ALLOC are memory resident when constructing a memory layout. If the debugger constructs a different memory layout than the kernel, then it will compute different addresses for symbols causing symbols in the debugger to appear to have the wrong values (though the kernel itself is working fine). The current port of mdb does not check SHF_ALLOC as it replicates the kernel's logic in its existing kernel support. The bfd linker sorts the sections in ELF object files such that all of the allocated sections (sections with SHF_ALLOCATED) are placed first followed by unallocated sections. As a result, when kgdb composed a memory layout using only the allocated sections, this layout happened to match the layout used by the kernel and loader. The lld linker does not sort the sections in ELF object files and mixed allocated and unallocated sections. This resulted in kgdb composing a different memory layout than the kernel and loader. We could either patch kgdb (and possibly in the future lldb) to use custom handling when generating memory layouts for kernel modules that are ELF objects, or we could change the kernel and loader to check SHF_ALLOCATED. I chose the latter as I feel we shouldn't be loading things into RAM that the module won't use. This should mostly be a NOP when linking with bfd but will allow the existing kgdb to work with amd64 kernel modules linked with lld. Note that we only require SHF_ALLOC for "program" sections for types like SHT_PROGBITS and SHT_NOBITS. Other section types such as symbol tables, string tables, and relocations must also be loaded and are not marked with SHF_ALLOC. Reported by: np Reviewed by: kib, emaste MFC after: 1 month Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D13926	2018-01-17 22:51:59 +00:00
jhb	0aaf564f94	Use long for the last argument to VOP_PATHCONF rather than a register_t. pathconf(2) and fpathconf(2) both return a long. The kern_[f]pathconf() functions now accept a pointer to a long value rather than modifying td_retval directly. Instead, the system calls explicitly store the returned long value in td_retval[0]. Requested by: bde Reviewed by: kib Sponsored by: Chelsio Communications	2018-01-17 22:36:58 +00:00
pfg	50d0301ca5	kern: make some use of mallocarray(9). Focus on code where we are doing multiplications within malloc(9). None of these ire likely to overflow, however the change is still useful as some static checkers can benefit from the allocation attributes we use for mallocarray. This initial sweep only covers malloc(9) calls with M_NOWAIT. No good reason but I started doing the changes before r327796 and at that time it was convenient to make sure the sorrounding code could handle NULL values. X-Differential revision: https://reviews.freebsd.org/D13837	2018-01-15 21:18:04 +00:00
ian	9c7cac637a	Add RTC clock conversions for BCD values, with non-panic validation. RTC clock hardware frequently uses BCD numbers. Currently the low-level bcd2bin() and bin2bcd() functions will KASSERT if given out-of-range BCD values. Every RTC driver must implement its own code for validating the unreliable data coming from the hardware to avoid a potential kernel panic. This change introduces two new functions, clock_bcd_to_ts() and clock_ts_to_bcd(). The former validates its inputs and returns EINVAL if any values are out of range. The latter guarantees the returned data will be valid BCD in a known format (4-digit years, etc). A new bcd_clocktime structure is used with the new functions. It is similar to the original clocktime structure, but defines the fields holding BCD values as uint8_t (uint16_t for year), and adds a PM flag for handling hours using AM/PM mode. PR: 224813 Differential Revision: https://reviews.freebsd.org/D13730 (no reviewers)	2018-01-14 17:01:37 +00:00
bz	8d499a89c5	Remove trailing whitespace. No functional change.	2018-01-14 15:01:25 +00:00
kib	78904c70bb	Add sysctl debug.kdb.stack_overflow to conveniently test kernel handling of the kstack overflow. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2018-01-13 11:59:49 +00:00
mjg	00f02d857e	sx: retry hard shared unlock just like in r327905 for rwlocks	2018-01-13 09:26:24 +00:00
mjg	6f29c630d9	rwlock: try regular read unlock even in the hard path Saves on turnstile trips if the lock got more readers.	2018-01-13 00:05:31 +00:00
jeff	f375b4dd66	Implement NUMA support in uma(9) and malloc(9). Allocations from specific domains can be done by the _domain() API variants. UMA also supports a first-touch policy via the NUMA zone flag. The slab layer is now segregated by VM domains and is precise. It handles iteration for round-robin directly. The per-cpu cache layer remains a mix of domains according to where memory is allocated and freed. Well behaved clients can achieve perfect locality with no performance penalty. The direct domain allocation functions have to visit the slab layer and so require per-zone locks which come at some expense. Reviewed by: Attilio (a slightly older version) Tested by: pho Sponsored by: Netflix, Dell/EMC Isilon	2018-01-12 23:25:05 +00:00
jeff	e7c9f84113	Implement NUMA policy for kmem_*(9). This maintains compatibility with reservations by giving each memory domain its own KVA space in vmem that is naturally aligned on superpage boundaries. Reviewed by: alc, markj, kib (some objections) Sponsored by: Netflix, Dell/EMC Isilon Tested by; pho Differential Revision: https://reviews.freebsd.org/D13289	2018-01-12 23:13:55 +00:00
jeff	058d03378a	Regenerate auto-generated files	2018-01-12 23:06:35 +00:00
jeff	94c7af8ca2	Implement 'domainset', a cpuset based NUMA policy mechanism. This allows userspace to control NUMA policy administratively and programmatically. Implement domainset based iterators in the page layer. Remove the now legacy numa_* syscalls. Cleanup some header polution created by having seq.h in proc.h. Reviewed by: markj, kib Discussed with: alc Tested by: pho Sponsored by: Netflix, Dell/EMC Isilon Differential Revision: https://reviews.freebsd.org/D13403	2018-01-12 22:48:23 +00:00
mjg	ddfd5797d3	mtx: use fcmpset to cover setting MTX_CONTESTED	2018-01-12 13:40:50 +00:00
mjg	749ff5c610	vfs: tidy up vdrop Skip vfs_refcount_release_if_not_last if the interlock is held and just go straight to refcount_release. While here do cosmetic rearrangement of _vhold to better show it contains equivalent behaviour.	2018-01-12 13:39:02 +00:00
tuexen	131d6b30b7	Ensure that the vnet is set when calling pru_sockaddr() and pru_peeraddr(). This is already true when called via kern_getsockname() and kern_getpeername(). This patch sets it also, when they arecalled via soo_fill_kinfo(). This is necessary, since the corresponding functions for SCTP require the vnet to be set. Without this, if a process having an wildcard bound SCTP socket is terminated and a core is written, the kernel panics. Reviewed by: bz Differential Revision: https://reviews.freebsd.org/D13652	2018-01-11 20:26:17 +00:00
cem	2c9ae2323b	mallocarray(9): panic if the requested allocation would overflow Additionally, move the overflow check logic out to WOULD_OVERFLOW() for consumers to have a common means of testing for overflowing allocations. WOULD_OVERFLOW() should be a secondary check -- on 64-bit platforms, just because an allocation won't overflow size_t does not mean it is a sane size to request. Callers should be imposing reasonable allocation limits far, far, below overflow. Discussed with: emaste, jhb, kp Sponsored by: Dell EMC Isilon	2018-01-10 21:49:45 +00:00
jhb	db85554676	Don't store shadow copies of per-process AIO limits. Previously the AIO subsystem would save a snapshot of the currently configured per-process limits the first time a process used AIO. The process would continue to use the snapshotted limits ignoring any changes to the global limits during the rest of its lifetime. This change removes the snapshotted values and changes the AIO code to always check the global values which can be toggled at runtime. This means an administrator can now change the effective limits of existing processes. This is more consistent with how other limits configured via sysctl work in FreeBSD. Reviewed by: asomers, kib MFC after: 2 weeks Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D13819	2018-01-10 21:18:46 +00:00
jhb	0e2f42609d	Allow the fast-path for disk AIO requests to fail requests. - If aio_qphysio() returns a non-zero error code, fail the request rather than queueing it to the AIO kproc pool to be retried via the slow path. Currently this means that if vm_fault_quick_hold_pages() reports an error, EFAULT is returned from the fast-path rather than retrying the request in the slow path where it will still fail with EFAULT. - If aio_qphysio() wishes to use the fast path for a device that doesn't support unmapped I/O but there are already the maximum number of such requests in flight, fail with EAGAIN as we do for other AIO resource limits rather than queueing the request to the AIO kproc pool. - Move the opcode check for aio_qphysio() out of the caller and into aio_qphysio() to simplify some logic and remove two goto's while here. It also uses a whitelist (only supported for LIO_READ / LIO_WRITE) rather than a blacklist (skipped for LIO_SYNC). PR: 217261 Submitted by: jkim (an earlier version) MFC after: 2 weeks Sponsored by: Chelsio Communications	2018-01-10 00:18:47 +00:00
jhb	0fe959494b	Simplify some logic by merging an if test with a subsequent switch. Specifically, in aio_queue_file() the code was doing this: if (opcode == LIO_SYNC) { ... } switch (opcode) { ... case LIO_SYNC: ... } This moves the body of the if statement into the LIO_SYNC case of the switch statement. MFC after: 2 weeks Sponsored by: Chelsio Communications	2018-01-10 00:02:06 +00:00
jhb	fb5e317b48	Add a counter to track in-flight AIO requests using unmapped I/O. MFC after: 2 weeks Sponsored by: Chelsio Communications	2018-01-09 23:57:29 +00:00
markj	8b210be68e	Generalize the gzio API. We currently use a set of subroutines in kern_gzio.c to perform compression of user and kernel core dumps. In the interest of adding support for other compression algorithms (zstd) in this role without complicating the API consumers, add a simple compressor API which can be used to select an algorithm. Also change the (non-default) GZIO kernel option to not enable compressed user cores by default. It's not clear that such a default would be desirable with support for multiple algorithms implemented, and it's inconsistent in that it isn't applied to kernel dumps. Reviewed by: cem Differential Revision: https://reviews.freebsd.org/D13632	2018-01-08 21:27:41 +00:00
ian	f47ac564b6	Use EVENTHANDLER_DIRECT_INVOKE for [un]mount events, for better performance.	2018-01-07 18:07:22 +00:00
ian	bdbecc7f3f	Use EVENTHANDLER_DIRECT_INVOKE() for device events, for better performance.	2018-01-07 18:06:30 +00:00
kp	69291826ac	Introduce mallocarray() in the kernel Similar to calloc() the mallocarray() function checks for integer overflows before allocating memory. It does not zero memory, unless the M_ZERO flag is set. Reviewed by: pfg, vangyzen (previous version), imp (previous version) Obtained from: OpenBSD Differential Revision: https://reviews.freebsd.org/D13766	2018-01-07 13:21:01 +00:00
glebius	6947b91466	In sendfile_iodone() both pru_abort and sorele need to be executed with proper VNET context set. Reported by: sbruno MFC after: 2 weeks	2018-01-05 20:21:46 +00:00
jhb	3cc4339e5b	Always use atomic_fetchadd() when updating per-user accounting values. This avoids re-reading a variable after it has been updated via an atomic op. It is just a cosmetic cleanup as the read value was only used to control a diagnostic printf that should rarely occur (if ever). Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D13768	2018-01-04 22:07:58 +00:00
jhb	a5135f50cf	Report offset relative to the backing object for kinfo_vmentry structures. For the pathname reported in kinfo_vmentry structures (kve_path), the sysctl handlers walk the object chain to find the bottom-most VM object. This permits a COW mapping of a file with dirty pages to report the pathname of the originally mapped file. Do the same for the object offset (kve_offset) computing a cumulative offset during the same object walk so that the reported offset is relative to the reported pathname. Note that ptrace(PT_VM_ENTRY) already returns a cumulative offset rather than the raw offset of the VM map entry. Note also that this does not affect procstat -v output (even structured output) since that output does not include the kve_offset field. Reviewed by: kib MFC after: 2 weeks Sponsored by: DARPA / AFRL Differential Revision: https://reviews.freebsd.org/D13767	2018-01-04 21:59:34 +00:00
karels	95cd76bc3b	make SW_WATCHDOG dynamic Enable the hardclock-based watchdog previously conditional on the SW_WATCHDOG option whenever hardware watchdogs are not found, and watchdogd attempts to enable the watchdog. The SW_WATCHDOG option still causes the sofware watchdog to be enabled even if there is a hardware watchdog. This does not change the other software-based watchdog enabled by the --softtimeout option to watchdogd. Note that the code to reprime the watchdog during kernel core dumps is no longer conditional on SW_WATCHDOG. I think this was previously a bug. Reviewed by: imp alfred bjk MFC after: 1 week Relnotes: yes Differential Revision: https://reviews.freebsd.org/D13713	2018-01-03 00:56:30 +00:00
antoine	2d68146a96	sysctl_kern_proc_args: do not take the fast path if p_args is NULL In this case it falls back to reading ps_strings	2018-01-01 21:25:01 +00:00
cperciva	55fe5887ff	Use the TSLOG framework to record entry/exit timestamps for DELAY and _vprintf; these functions are called in many places and can contribute meaningfully to the total time spent booting.	2017-12-31 09:24:41 +00:00
cperciva	d913278fc1	Instrument thread creations for the the benefit of the TSLOG framework. This assists in tracking time spent while the boot is being "held" waiting for something to happen.	2017-12-31 09:24:11 +00:00
cperciva	95c6ba719c	Instrument "boot holds" for the benefit of the TSLOG framework. These are places where the "main thread" of the booting kernel (either the thread which later becomes swapper or the thread which later becomes init) has to stop and wait for action to take place in another thread before continuing. There are currently three such holds: 1. The intr_config_hooks SYSINIT waits for hooks registered via the config_intrhook_establish function; this allows (typically) devices which need interrupts enabled to complete their initialization to do so before root is mounted. 2. The g_waitidle function waits for the GEOM event queue to be empty; this ensures that all of the disks which have been attached have been tasted before we attempt to mount root. 3. The vfs_mountroot_wait function (in addition to calling g_waitidle) waits for holds registered via root_mount_hold; among other things, this is used by the USB subsystem to ensure that we don't fail to mount root if it's located on a USB disk which takes a while to probe.	2017-12-31 09:23:52 +00:00
cperciva	a8789eddcf	Teach makeobjops.awk to accept PROLOG and EPILOG blocks before METHOD and STATICMETHOD declarations; that code will be inserted into the dispatch function before and after the method call. Use this functionality and the TSLOG framework to record DEVICE_ATTACH and DEVICE_PROBE entry/exit timestamps.	2017-12-31 09:23:19 +00:00
cperciva	c5a93994dc	Use the TSLOG framework to record entry/exit timestamps for machine independent functions with important roles in the early boot process: mi_startup (with the "exit" recorded when it becomes swapper), start_init (with the "exit" recorded when the thread is about to "return" into the newly created init process), vfs_mountroot, and vfs_mountroot_wait.	2017-12-31 09:22:31 +00:00
cperciva	e9be0de69c	Code for recording timestamps of events, especially function entries/exits. This is a very primitive system, intended for use in measuring performance during the early system boot, before more sophisticated tools like DTrace or infrastructure like kernel memory allocation and mutexes are available. Because this code records pointers to strings rather than copying strings (in order to keep the memory usage more manageable), if a kernel module is unloaded after logging an event, Bad Things can happen. Users are advised to not do that. Since cycle counts from the early kernel boot are used as an initial entropy source, publishing this information to userland could result in inadequate entropy being kept private to the kernel RNG. Users are advised to not enable this on systems with untrusted users. Discussed on: freebsd-current	2017-12-31 09:21:01 +00:00
pfg	81e471c032	sysv_{ipc\|shm}: update the NetBSD VCS tags to match nearer our files. Both files originated in NetBSD: sysv_ipc.c CVS 1.9: Most of their changes don't apply to us as we already have similar changes. This is a better reference for future merges. sysv_shm.c CVS 1.39: Most of their changes don't apply to our code but interestingly this revision merged our changes and is a better point for reference. Move the VCS tags to the position recommended in our committers guide (section 8), No functional change.	2017-12-31 03:34:00 +00:00
mjg	ea6bb8f729	locks: adjust loop limit check when waiting for readers The check was for the exact value, but since the counter started being incremented by the number of readers it could have jumped over.	2017-12-31 02:31:01 +00:00

1 2 3 4 5 ...

15891 Commits