freebsd-nq

Author	SHA1	Message	Date
Hiroki Sato	ed063112f4	Fix a panic which occurs in a VIMAGE-enabled kernel after r270158, and separate socket_hhook_register() part and put it into VNET_SYS{,UN}INIT() handler. Discussed with: marcel	2014-08-22 05:03:30 +00:00
Marcel Moolenaar	4ec7371233	For vendors like Juniper, extensibility for sockets is important. A good example is socket options that aren't necessarily generic. To this end, OSD is added to the socket structure and hooks are defined for key operations on sockets. These are: o soalloc() and sodealloc() o Get and set socket options o Socket related kevent filters. One aspect about hhook that appears to be not fully baked is the return semantics (the return value from the hook is ignored in hhook_run_hooks() at the time of commit). To support return values, the socket_hhook_data structure contains a 'status' field to hold return values. Submitted by: Anuranjan Shukla <anshukla@juniper.net> Obtained from: Juniper Networks, Inc.	2014-08-18 23:45:40 +00:00
Warner Losh	817dc00433	Expand the elf brandelf infrastructure to give access to the whole ELF header (Elf_Ehdr) to determine if a particular interpretor wants to accept it or not. Use this mechanism to filter EABI arm on OABI arm kernels, and vice versa. This method could also be used to implement OABI on EABI arm kernels, if desired, or to allow a single mips kernel to run o32, n32 and n64 binaries. Differential Revision: https://reviews.freebsd.org/D609	2014-08-18 02:44:56 +00:00
Edward Tomasz Napierala	3914ddf8a7	Bring in the new automounter, similar to what's provided in most other UNIX systems, eg. MacOS X and Solaris. It uses Sun-compatible map format, has proper kernel support, and LDAP integration. There are still a few outstanding problems; they will be fixed shortly. Reviewed by: allanjude@, emaste@, kib@, wblock@ (earlier versions) Phabric: D523 MFC after: 2 weeks Relnotes: yes Sponsored by: The FreeBSD Foundation	2014-08-17 09:44:42 +00:00
Mark Johnston	ba78d6b7a1	Correct the order of arguments passed to LIST_INSERT_AFTER(). Reviewed by: kib X-MFC-With: r269656	2014-08-15 15:42:58 +00:00
Xin LI	7001d850bb	Add a new loader tunable, vm.kmem_zmax which allows a system administrator to limit the maximum allocation size that malloc(9) would consider using the UMA cache allocator as backend. Suggested by: alfred MFC after: 2 weeks	2014-08-14 05:31:39 +00:00
Xin LI	bda06553fd	Re-instate UMA cached backend for 4K - 64K allocations. New consumers like geli(4) uses malloc(9) to allocate temporary buffers that gets free'ed shortly, causing frequent TLB shootdown as observed in hwpmc supported flame graph. Discussed with: jeff, alfred MFC after: 1 week	2014-08-14 05:13:24 +00:00
Konstantin Belousov	70978c93b8	If vm_page_grab() allocates a new page, the page is not inserted into page queue even when the allocation is not wired. It is responsibility of the vm_page_grab() caller to ensure that the page does not end on the vm_object queue but not on the pagedaemon queue, which would effectively create unpageable unwired page. In exec_map_first_page() and vm_imgact_hold_page(), activate the page immediately after unbusying it, to avoid leak. In the uiomove_object_page(), deactivate page before the object is unlocked. There is no leak, since the page is deactivated after uiomove_fromphys() finished. But allowing non-queued non-wired page in the unlocked object queue makes it impossible to assert that leak does not happen in other places. Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-08-13 05:44:08 +00:00
Gleb Smirnoff	cd1692fa5d	Move KASSERT into locked region. Submitted by: kib	2014-08-11 15:06:07 +00:00
Gleb Smirnoff	eaf78ad3f7	Use M_WAITOK in sf_buf_init(). Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-08-11 13:12:18 +00:00
Gleb Smirnoff	818d40d033	Provide sf_buf_ref() to optimize refcounting of already allocated sendfile(2) buffers. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-08-11 12:59:55 +00:00
Bjoern A. Zeeb	e346f8c452	Split up sys_ktimer_getoverrun() into a sys_ and a kern_ variant and export the kern_ version needed by an upcoming linuxolator change. MFC after: 3 days Sponsored by: DARPA,AFRL	2014-08-07 16:49:50 +00:00
Andrey V. Elsukov	3e40097976	Temporary revert r269661, it looks like the patch isn't complete.	2014-08-07 14:32:28 +00:00
Andrey V. Elsukov	c60e497af9	Use cpuset_setithread() to apply cpu mask to taskq threads. Sponsored by: Yandex LLC	2014-08-07 10:23:50 +00:00
Konstantin Belousov	d735998057	Correct the problems with the ptrace(2) making the debuggee an orphan. One problem is inferior(9) looping due to the process tree becoming a graph instead of tree if the parent is traced by child. Another issue is due to the use of p_oppid to restore the original parent/child relationship, because real parent could already exited and its pid reused (noted by mjg). Add the function proc_realparent(9), which calculates the parent for given process. It uses the flag P_TREE_FIRST_ORPHAN to detect the head element of the p_orphan list and than stepping back to its container to find the parent process. If the parent has already exited, the init(8) is returned. Move the P_ORPHAN and the new helper flag from the p_flag* to new p_treeflag field of struct proc, which is protected by proctree lock instead of proc lock, since the orphans relationship is managed under the proctree_lock already. The remaining uses of p_oppid in ptrace(PT_DETACH) and process reapping are replaced by proc_realparent(9). Phabric: D417 Reviewed by: jhb Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2014-08-07 05:47:53 +00:00
Gleb Smirnoff	c8d2ffd6a7	Merge all MD sf_buf allocators into one MI, residing in kern/subr_sfbuf.c The MD allocators were very common, however there were some minor differencies. These differencies were all consolidated in the MI allocator, under ifdefs. The defines from machine/vmparam.h turn on features required for a particular machine. For details look in the comment in sys/sf_buf.h. As result no MD code left in sys///vm_machdep.c. Some arches still have machine/sf_buf.h, which is usually quite small. Tested by: glebius (i386), tuexen (arm32), kevlo (arm32) Reviewed by: kib Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-08-05 09:44:10 +00:00
Kirk McKusick	5f9500c358	Add support for multi-threading of soft updates. Replace a single soft updates thread with a thread per FFS-filesystem mount point. The threads are associated with the bufdaemon process. Reviewed by: kib Tested by: Peter Holm and Scott Long MFC after: 2 weeks Sponsored by: Netflix	2014-08-04 22:03:58 +00:00
Davide Italiano	4295aa9240	Fix an overflow in getsockopt(). optval isn't big enough to hold sbintime_t. Re-introduce r255030 behaviour capping socket timeouts to INT_32 if they're too large. CR: https://phabric.freebsd.org/D433 Reported by: demon Reviewed by: bde [1], jhb [2] MFC after: 2 weeks	2014-08-04 05:40:51 +00:00
Peter Wemm	6dde7ecb5d	Partial revert of r262867. r262867 was described as fixing socket buffer checks for SOCK_SEQPACKET, but also changed one of the SOCK_DGRAM code paths to use the new sbappendaddr_nospacecheck_locked() function. This lead to SOCK_DGRAM bypassing socket buffer limits.	2014-08-03 22:37:21 +00:00
Sergey Kandaurov	bcdd3bceb6	vn_path_to_global_path: update comment.	2014-08-03 07:59:19 +00:00
Warner Losh	146cbf6fa2	Make the witness lock limit an option.	2014-08-03 05:00:43 +00:00
Konstantin Belousov	168f4ee0a8	Remove Giant acquisition from the mount and unmount pathes. It could be claimed that two things were reasonable protected by Giant. One is vfsconf list links, which is converted to the new dedicated sx vfsconf_sx. Another is vfsconf.vfc_refcount, which is now updated with atomics. Note that vfc_refcount still has the same races now as it has under the Giant, the unload of filesystem modules can happen while the module is still in use. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2014-08-03 03:27:54 +00:00
Rui Paulo	551a78956c	In the shm_open() and shm_unlink() syscalls, export the path to KTR. MFC after: 1 week	2014-08-01 23:29:04 +00:00
Konstantin Belousov	634012b917	Remove one-time use macros which check for the vnode lifecycle. More, some parts of the checks are in fact redundand in the surrounding code, and it is more clear what the conditions are by direct testing of the flags. Two of the three macros were only used in assertions. In vnlru_free(), all relevant parts of vholdl() were already inlined, except the increment of v_holdcnt itself. Do not call vholdl() to do the increment as well, this allows to make assertions in vholdl()/vhold() more strict. In v_incr_usecount(), call vholdl() before incrementing other ref counters. The change is no-op, but it makes less surprising to see the vnode state in debugger if interrupted inside v_incr_usecount(). Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-07-29 16:42:34 +00:00
Konstantin Belousov	d3a3b8b038	Simplify the expression, by removing redundand calculation. Noted by: "O'Connor, Daniel" <Daniel.O'Connor@emc.com> MFC after: 3 days	2014-07-29 01:46:31 +00:00
Konstantin Belousov	5d9b4508fd	For md(4), posix shm(3) and tmpfs(5), free swap space used by paged in dirty page, which is written by the process. Reviewed by: alc Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-07-28 14:27:05 +00:00
Pietro Cerutti	adecd05bf0	Unbreak the ABI by reverting r268494 until the compat shims are provided	2014-07-28 07:20:22 +00:00
Marcel Moolenaar	1e0a021e3d	The accept filter code is not specific to the FreeBSD IPv4 network stack, so it really should not be under "optional inet". The fact that uipc_accf.c lives under kern/ lends some weight to making it a "standard" file. Moving kern/uipc_accf.c from "optional inet" to "standard" eliminates the need for #ifdef INET in kern/uipc_socket.c. Also, this meant the net.inet.accf.unloadable sysctl needed to move, as net.inet does not exist without networking compiled in (as it lives in netinet/in_proto.c.) The new sysctl has been named net.accf.unloadable. In order to support existing accept filter sysctls, the net.inet.accf node has been added netinet/in_proto.c. Submitted by: Steve Kiernan <stevek@juniper.net> Obtained from: Juniper Networks, Inc.	2014-07-26 19:27:34 +00:00
Marcel Moolenaar	be836fab6c	Don't return ERESTART when the device is gone. In ttydev_leave() ERESTART is the indication that draining got interrupted due to a revoke(2) and that tty_drain() is to be called again for draining to complete. If the device is flagged as gone, then waiting/draining is not possible. Only return ERESTART when waiting is still possible. Obtained from: Juniper Networks, Inc.	2014-07-26 15:46:41 +00:00
Gavin Atkinson	f6b4f5ca21	Add error return to dumpsys(), and use it in doadump(). This commit does not add error returns to minidumpsys() or textdump_dumpsys(); those can also be added later. Submitted by: Conrad Meyer (EMC / Isilon storage division)	2014-07-25 23:52:53 +00:00
Daniel Eischen	66d8df9dfc	Insert new threads at the end of the thread list in the process instead of at the beginning. This allows an intra process signal to be sent to the oldest thread with the signal unmasked - which, if it still exists, is the main thread. This mimics behavior found in Linux and Solaris.	2014-07-25 20:21:02 +00:00
Mateusz Guzik	a1bf811596	Prepare fget_unlocked for reading fd table only once. Some capsicum functions accept fdp + fd and lookup fde based on that. Add variants which accept fde. Reviewed by: pjd MFC after: 1 week	2014-07-23 19:33:49 +00:00
Mateusz Guzik	6a1cf96b4a	Cosmetic changes to unp_internalize Don't throw away the result of fget_unlocked. Move fdp increment to for loop to make it consistent with similar code elsewhere. MFC after: 1 week	2014-07-23 18:04:52 +00:00
Gleb Smirnoff	c71b4037ff	Use assignment instead of bcopy. Submitted by: jmg	2014-07-18 14:59:35 +00:00
Baptiste Daroussin	42e62eca52	Extend kqueue's EVFILT_TIMER by adding precision unit flags support Define the precision macros as bits sets to conform with XNU equivalent. Test fflags passed for EVFILT_TIMER and return EINVAL in case an invalid flag is passed. Phabric: https://phabric.freebsd.org/D421 Reviewed by: kib	2014-07-18 14:27:04 +00:00
Kevin Lo	c29a33213b	Deprecate m_act. Use m_nextpkt always.	2014-07-17 05:21:16 +00:00
Don Lewis	d3a6879421	Nuke the never-used RF_TIMESHARE feature, reducing the complexity of the code. The consensus on arch@ is that this feature might have been useful in the distant past, but is now just unnecessary bloat. The int_rman_activate_resource() and int_rman_deactivate_resource() functions become trivial, so manually inline them. The special deferred handling of RF_ACTIVE is no longer needed in reserve_resource_bound(), so eliminate the associated code at the end of the function. These changes reduce the object file size by more than 500 bytes on i386. Update the rman.9 man page to reflect the removal of the RF_TIMESHARE feature. MFC after: 2 weeks	2014-07-16 22:18:19 +00:00
Konstantin Belousov	65589a29f4	Check for the cross-device cross-link attempt in the VFS, instead of forcing filesystem VOP_LINK() methods to repeat the code. In tmpfs_link(), remove redundand check for the type of the source, already done by VFS. Note that NFS server already performs this check before calling VOP_LINK(). Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2014-07-16 14:04:46 +00:00
Konstantin Belousov	a62eb1398a	Followup to r268466. - Move the code to calculate resident count into separate function. It reduces the indent level and makes the operation of vmmap_skip_res_cnt tunable more clear. - Optimize the calculation of the resident page count for map entry. Skip directly to the next lowest available index and page among the whole shadow chain. - Restore the use of pmap_incore(9), only to verify that current mapping is indeed superpage. - Note the issue with the invalid pages. Suggested and reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-07-15 19:57:03 +00:00
Konstantin Belousov	3760e341ca	Change the calculation of the kinfo_vmentry field kve_private_resident to reflect its name. Noted and reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-07-15 19:49:00 +00:00
Mateusz Guzik	965d08605f	Plug p_pptr null test in do_execve. It is always true.	2014-07-14 22:40:46 +00:00
Mateusz Guzik	c959c23740	Manage struct sigacts refcnt with atomics instead of a mutex. MFC after: 1 week	2014-07-14 21:12:59 +00:00
Konstantin Belousov	895b3782c6	Extract the code to put a filesystem into the suspended state (at the unmount time) in the helper vfs_write_suspend_umnt(). Use it instead of two inline copies in FFS. Fix the bug in the FFS unmount, when suspension failed, the ufs extattrs were not reinitialized. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2014-07-14 09:10:00 +00:00
Konstantin Belousov	57ef02ff0f	In kern_linkat(), avoid passing doomed vnode to the VOP. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2014-07-14 08:41:13 +00:00
Konstantin Belousov	a69452162a	Generalize vn_get_ino() to allow filesystems to use custom vnode producer, instead of hard-coding VFS_VGET(). New function, which takes callback, is called vn_get_ino_gen(), standard callback for vn_get_ino() is provided. Convert inline copies of vn_get_ino() in msdosfs and cd9660 into the uses of vn_get_ino_gen(). Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2014-07-14 08:34:54 +00:00
Kevin Lo	cb7df69b7e	Make bind(2) and connect(2) return EAFNOSUPPORT for AF_UNIX on wrong address family. See https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=191586 for the original discussion. Reviewed by: terry	2014-07-14 06:00:01 +00:00
Mateusz Guzik	8bedd5d782	Clear nonblock and async on devctl close instaed of open. This is a purely cosmetic change.	2014-07-12 15:35:04 +00:00
Gleb Smirnoff	1fbe6a82f4	Improve reference counting of EXT_SFBUF pages attached to mbufs. o Do not use UMA refcount zone. The problem with this zone is that several refcounting words (16 on amd64) share the same cache line, and issueing atomic(9) updates on them creates cache line contention. Also, allocating and freeing them is extra CPU cycles. Instead, refcount the page directly via vm_page_wire() and the sfbuf via sf_buf_alloc(sf_buf_page(sf)) [1]. o Call refcounting/freeing function for EXT_SFBUF via direct function call, instead of function pointer. This removes barrier for CPU branch predictor. o Do not cleanup the mbuf to be freed in mb_free_ext(), merely to satisfy assertion in mb_dtor_mbuf(). Remove the assertion from mb_dtor_mbuf(). Use bcopy() instead of manual assignments to copy m_ext in mb_dupcl(). [1] This has some problems for now. Using sf_buf_alloc() merely to increase refcount is expensive, and is broken on sparc64. To be fixed. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-07-11 19:40:50 +00:00
Gleb Smirnoff	fcc34a238c	Fix style bug: rename the refcount field of m_ext to ext_cnt, to match other members. Sponsored by: Nginx, Inc.	2014-07-11 14:34:29 +00:00
Gleb Smirnoff	15c28f87b8	All mbuf external free functions never fail, so let them be void. Sponsored by: Nginx, Inc.	2014-07-11 13:58:48 +00:00
Mateusz Guzik	88f98985aa	Eliminate plim and vtmp local vars in exit1. No functional changes. MFC after: 1 week	2014-07-10 22:54:38 +00:00
Mateusz Guzik	30d58d6b39	Don't make a temporary copy of fixed sysctl strings.	2014-07-10 21:46:57 +00:00
Mateusz Guzik	b23c40d7b1	Don't zero fd_nfiles during fdp destruction. Code trying to take a look has to check fd_refcnt and it is 0 by that time. This is a follow up to r268505, without this the code would leak memory for tables bigger than the default. MFC after: 1 week	2014-07-10 21:05:45 +00:00
Mateusz Guzik	e518baf8f9	Avoid relocking filedesc lock when closing fds during fdp destruction. Don't call bzero nor fdunused from fdfree for such cases. It would do unnecessary work and complain that the lock is not taken. MFC after: 1 week	2014-07-10 20:59:54 +00:00
Pietro Cerutti	7150b86bfe	Implement Short/Small String Optimization in SBUF(9) and change lengths and positions in the API from ssize_t and int to size_t. CR: D388 Approved by: des, bapt	2014-07-10 13:08:51 +00:00
Konstantin Belousov	479fcb4e32	Unconditionally initialize addr to handle the case of changed map timestamp while the map is unlocked. Reported by: bz Sponsored by: The FreeBSD Foundation MFC after: 6 days	2014-07-10 11:20:24 +00:00
Konstantin Belousov	a91831a261	Current code in sysctl proc.vmmap, which intent is to calculate the amount of resident pages, in fact calculates the amount of installed pte entries in the region. Resident pages which were not soft-faulted yet are not counted. Calculate the amount of resident pages by looking in the objects chain backing the region. Add a knob to disable the residency calculation at all. For large sparce regions, either previous or updated algorithm runs for too long time, while several introspection tools do not need the (advisory) RSS value at all. PR: kern/188911 Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-07-09 19:11:57 +00:00
Xin LI	2827952eb4	Don't leave the padding between the msg header and the cmsg data, and the padding after the cmsg data un-initialized. Submitted by: tuexen Security: CVE-2014-3952 Security: FreeBSD-SA-14:17.kmem	2014-07-08 21:54:23 +00:00
Konstantin Belousov	3bcc218f46	Correct the problem reported by test16 from tools/regression/file/flock/flock.c, which completes the fix in r192685. When the lock was stolen from us, retry the whole lock sequence in kernel, instead of returning EINTR to usermode and hoping that application would handle it correctly by restarting the lock acquire. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2014-07-08 08:10:15 +00:00
Don Lewis	626a79752f	Declaration whitespace changes for style(9). MFC after: 1 week	2014-07-07 22:02:39 +00:00
Mateusz Guzik	5e2554b7f8	Don't call crdup nor uifind under vnode lock. A locked vnode can get into the way of satisyfing malloc with M_WATOK. This is a fixup to r268087. Suggested by: kib MFC after: 1 week	2014-07-07 14:03:30 +00:00
Marcel Moolenaar	e7d939bda2	Remove ia64. This includes: o All directories named ia64 o All files named ia64 o All ia64-specific code guarded by __ia64__ o All ia64-specific makefile logic o Mention of ia64 in comments and documentation This excludes: o Everything under contrib/ o Everything under crypto/ o sys/xen/interface o sys/sys/elf_common.h Discussed at: BSDcan	2014-07-07 00:27:09 +00:00
Hans Petter Selasky	604bf9d37e	When getting the initial value of numeric tunables use the getenv_xxx() functions instead of strtoq(), because the getenv_xxx() functions include wrappers for various postfixes like G/M/K, which strtoq() doesn't do.	2014-07-05 06:12:48 +00:00
Konstantin Belousov	2499a5ccef	Micro-manage clang to get the expected inlining for cpu_search(). Mark cpu_search_lowest/cpu_search_highest/cpu_search_both as noinline, while cpu_search() gets always_inline. With the attributes set, cpu_search() is inlined in wrappers, and if()s with constant conditionals are optimized. On some tests on many-core machine, the hwpmc reported samples for cpu_search*() are reduced from 25% total to 9%. Submitted by: "Rang, Anton" <anton.rang@isilon.com> MFC after: 1 week	2014-07-03 11:06:27 +00:00
Marcel Moolenaar	054b57a740	Drop KTR records when we're in the debugger so that the debugger isn't changing or overwriting the trace buffer. When KTR is enabled for things like traps or pmap functions, the amount of logging can be substantial.	2014-07-02 22:13:07 +00:00
Ed Maste	969d3cc28b	Fix typos in VTY constant names from r268158	2014-07-02 14:47:48 +00:00
Ed Maste	018147eef9	Prefer vt(4) for UEFI boot The UEFI framebuffer driver vt_efifb requires vt(4), so add a mechanism for the startup routine to set the preferred console. This change is ugly because console init happens very early in the boot, making a cleaner interface difficult. This change is intended only to facilitate the sc(4) / vt(4) transition, and can be reverted once vt(4) is the default.	2014-07-02 13:24:21 +00:00
Mateusz Guzik	a6bad85e8e	Plug gcc warning after r268074 about unitialized newsigacts Reported by: Gary Jennejohn <gljennjohn gmail.com>	2014-07-02 05:45:40 +00:00
Mateusz Guzik	350d51816e	Don't call crcopysafe or uifind unnecessarily in execve. MFC after: 1 week	2014-07-01 09:21:32 +00:00
Mateusz Guzik	d00c8ea429	Perform a lockless check in sigacts_shared. It is used only during execve (i.e. singlethreaded), so there is no fear of returning 'not shared' which soon becomes 'shared'. While here reorganize the code a little to avoid proc lock/unlock in shared case. MFC after: 1 week	2014-07-01 06:29:15 +00:00
Adrian Chadd	c445c3c7f6	If we're doing RSS then ensure that the callwheel swi's are CPU pinned.	2014-06-30 04:25:51 +00:00
Hans Petter Selasky	4813ad54f8	Compile fixes: Remove duplicate "debug_ktr.mask" sysctl definition. Remove now unused variable from "kern_ktr.c". This fixes build of "ktr" which was broken by r267961. Let the default value for "vm_kmem_size_scale" be zero. It is setup after that the sysctl has been initialized from "getenv()" in the "kmeminit()" function to equal the "VM_KMEM_SIZE_MAX" value, if zero. On Sparc64 the "VM_KMEM_SIZE_MAX" macro is not a constant. This fixes build of Sparc64 which was broken by r267961. Add a special macro to dynamically create SYSCTL root nodes, because root nodes have a special parent. This fixes build of existing OFED module and CANBUS module for pc98 which was broken by r267961. Add missing "sysctl.h" includes to get the needed sysctl header file declarations. This is needed after r267961. MFC after: 2 weeks	2014-06-28 17:36:18 +00:00
Mateusz Guzik	b0bc0cadbe	Call fdcloseexec right after fdunshare. No functional changes. MFC after: 1 week	2014-06-28 05:51:45 +00:00
Mateusz Guzik	b9d32c36fa	Make fdunshare accept only td parameter. Proc had to match the thread anyway and 2 parameters were inconsistent with the rest. MFC after: 1 week	2014-06-28 05:41:53 +00:00
Mateusz Guzik	35778d7aa9	Make sure to always clear p_fd for process getting rid of its filetable. Filetable can be shared with other processes. Previous code failed to clear the pointer for all but the last process getting rid of the table. This is mostly cosmetics. Get rid of 'This should happen earlier' comment. Clearing the pointer in this place is fine as consumers can reliably check for files availability by inspecting fd_refcnt and vnodes availabity by NULL-checking them. MFC after: 1 week	2014-06-28 05:18:03 +00:00
Hans Petter Selasky	6a3287f889	Fix regression issue after r267961. Handle special string case for SYSCTLs like previously. MFC after: 2 weeks Reported by: several people	2014-06-28 03:59:04 +00:00
Hans Petter Selasky	af3b2549c4	Pull in r267961 and r267973 again. Fix for issues reported will follow.	2014-06-28 03:56:17 +00:00
Glen Barber	37a107a407	Revert r267961, r267973: These changes prevent sysctl(8) from returning proper output, such as: 1) no output from sysctl(8) 2) erroneously returning ENOMEM with tools like truss(1) or uname(1) truss: can not get etype: Cannot allocate memory	2014-06-27 22:05:21 +00:00
Marius Strobl	7344ee184b	In order to get vt(4) a bit closer to the feature set provided by sc(4), implement options TERMINAL_{KERN,NORM}_ATTR. These are aliased to SC_{KERNEL_CONS,NORM}_ATTR and like these latter, allow to change the default colors of normal and kernel text respectively. Note on the naming: Although affecting the output of vt(4), technically kern/subr_terminal.c is primarily concerned with changing default colors so it would be inconsistent to term these options VT_{KERN,NORM}_ATTR. Actually, if the architecture and abstraction of terminal+teken+vt would be perfect, dev/vt/* wouldn't be touched by this commit at all. Reviewed by: emaste MFC after: 3 days Sponsored by: Bally Wulff Games & Entertainment GmbH	2014-06-27 19:57:57 +00:00
Ed Maste	6ac6c9d5f4	Add CTLFLAG_NOFETCH flag; console vty code runs before tunable fetch Also remove redundant "" assignment for string in BSS. Submitted by: hselasky@	2014-06-27 19:07:35 +00:00
Ed Maste	59644098f8	Use a common tunable to choose between vt(4)/sc(4) With this change and previous work from ray@ it will be possible to put both in GENERIC, and have one enabled by default, but allow the other to be selected via the loader. (The previous implementation had separate kern.vt.disable and hw.syscons.disable tunables, and would panic if both drivers were compiled in and neither was explicitly disabled.) MFC after: 1 week Sponsored by: The FreeBSD Foundation	2014-06-27 17:50:33 +00:00
Hans Petter Selasky	3da1cf1e88	Extend the meaning of the CTLFLAG_TUN flag to automatically check if there is an environment variable which shall initialize the SYSCTL during early boot. This works for all SYSCTL types both statically and dynamically created ones, except for the SYSCTL NODE type and SYSCTLs which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to be used in the case a tunable sysctl has a custom initialisation function allowing the sysctl to still be marked as a tunable. The kernel SYSCTL API is mostly the same, with a few exceptions for some special operations like iterating childrens of a static/extern SYSCTL node. This operation should probably be made into a factored out common macro, hence some device drivers use this. The reason for changing the SYSCTL API was the need for a SYSCTL parent OID pointer and not only the SYSCTL parent OID list pointer in order to quickly generate the sysctl path. The motivation behind this patch is to avoid parameter loading cludges inside the OFED driver subsystem. Instead of adding special code to the OFED driver subsystem to post-load tunables into dynamically created sysctls, we generalize this in the kernel. Other changes: - Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask" to "hw.pcic.intr_mask". - Removed redundant TUNABLE statements throughout the kernel. - Some minor code rewrites in connection to removing not needed TUNABLE statements. - Added a missing SYSCTL_DECL(). - Wrapped two very long lines. - Avoid malloc()/free() inside sysctl string handling, in case it is called to initialize a sysctl from a tunable, hence malloc()/free() is not ready when sysctls from the sysctl dataset are registered. - Bumped FreeBSD version to indicate SYSCTL API change. MFC after: 2 weeks Sponsored by: Mellanox Technologies	2014-06-27 16:33:43 +00:00
Mateusz Guzik	de966666a2	Check lower bound of cmsg_len. If passed cm->cmsg_len was below cmsghdr size the experssion: datalen = (caddr_t)cm + cm->cmsg_len - (caddr_t)data; would give negative result. However, in practice it would not result in a crash because the kernel would try to obtain garbage fds for given process and would error out with EBADF. PR: 124908 Submitted by: campbell mumble.net (modified a little) MFC after: 1 week	2014-06-27 05:04:36 +00:00
Pawel Jakub Dawidek	e16406c7ba	Remove duplicated includes. Submitted by: Mariusz Zaborski <oshogbo@FreeBSD.org>	2014-06-26 13:57:44 +00:00
Attilio Rao	e989086b1d	sysctl subsystem uses sxlocks so avoid to setup dynamic sysctl nodes before sleepinit() has been fully executed in the SLEEPQUEUE_PROFILING case. Sponsored by: EMC / Isilon storage division	2014-06-24 15:16:55 +00:00
Mateusz Guzik	450570a55e	Tidy up fd-related functions called by do_execve o assert in each one that fdp is not shared o remove unnecessary NULL checks - all userspace processes have fdtables and kernel processes cannot execve o remove comments about the danger of fd_ofiles getting reallocated - fdtable is not shared and fd_ofiles could be only reallocated if new fd was about to be added, but if that was possible the code would already be buggy as setugidsafety work could be undone MFC after: 1 week	2014-06-23 01:28:18 +00:00
Mateusz Guzik	158627616c	Don't take filedesc lock in fdunshare(). We can read refcnt safely and only care if it is equal to 1. If it could suddenly change from 1 to something bigger the code would be buggy even in the previous form and transitions from > 1 to 1 are equally racy and harmless (we copy even though there is no need). MFC after: 1 week	2014-06-22 21:37:27 +00:00
Alexander V. Chernikov	811985398d	Permit changing cpu mask for cpu set 1 in presence of drivers binding their threads to particular CPU. Changing ithread cpu mask is now performed by special cpuset_setithread(). It creates additional cpuset root group on first bind invocation. No objection: jhb Tested by: hiren MFC after: 2 weeks Sponsored by: Yandex LLC	2014-06-22 11:32:23 +00:00
Mateusz Guzik	adf87ab01c	fd: replace fd_nfiles with fd_lastfile where appropriate fd_lastfile is guaranteed to be the biggest open fd, so when the intent is to iterate over active fds or lookup one, there is no point in looking beyond that limit. Few places are left unpatched for now. MFC after: 1 week	2014-06-22 01:31:55 +00:00
Mateusz Guzik	0f0b852c73	do_dup: plug redundant adjustment of fd_lastfile By that time it was already set by fdalloc, or was there in the first place if fd is replaced. MFC after: 1 week	2014-06-22 00:53:33 +00:00
Konstantin Belousov	7b81a399a4	In msdosfs_setattr(), add a check for result of the utimes(2) permissions test, forgotten in r164033. Refactor the permission checks for utimes(2) into vnode helper function vn_utimes_perm(9), and simplify its code comparing with the UFS origin, by writing the call to VOP_ACCESSX only once. Use the helper for UFS(5), tmpfs(5), devfs(5) and msdosfs(5). Reported by: bde Reviewed by: bde, trasz Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-06-17 07:11:00 +00:00
Dmitry Chagin	2dedc1281a	Revert r266925 as it can lead to instant panic at fexecve(): To allow to run the interpreter itself add a new ELF branding type. Pointed out by: kib, mjg	2014-06-17 05:29:18 +00:00
Attilio Rao	3ae10f7477	- Modify vm_page_unwire() and vm_page_enqueue() to directly accept the queue where to enqueue pages that are going to be unwired. - Add stronger checks to the enqueue/dequeue for the pagequeues when adding and removing pages to them. Of course, for unmanaged pages the queue parameter of vm_page_unwire() will be ignored, just as the active parameter today. This makes adding new pagequeues quicker. This change effectively modifies the KPI. __FreeBSD_version will be, however, bumped just when the full cache of free pages will be evicted. Sponsored by: EMC / Isilon storage division Reviewed by: alc Tested by: pho	2014-06-16 18:15:27 +00:00
Konstantin Belousov	2e501b0a9e	Use vn_io_fault for the writes from core dumping code. Recursing into VM due to copyin(9) faulting while VFS locks are held is deadlock-prone there in the same way as for the write(2) syscall. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2014-06-15 04:51:53 +00:00
Alexander Motin	781c93d405	Implement simple direct-mapped cache for popular filesystem identifiers to avoid congestion on global mountlist_mtx mutex in vfs_busyfs(), while traversing through the list of mount points. This change significantly improves NFS server scalability, since it had to do this translation for every request, and the global lock becomes quite congested. This code is more optimized for relatively small number of mount points. On systems with hundreds of active mount points this simple cache may have many collisions. But the original traversal code in that case should also behave much worse, so we are not loosing much. Reviewed by: attilio MFC after: 2 weeks Sponsored by: iXsystems, Inc.	2014-06-12 12:43:48 +00:00
Alexander Motin	4f655310bf	Remove unneeded mountlist_mtx acquisition from sync_fsync(). All struct mount fields accessed by sync_fsync() are protected by MNT_MTX.	2014-06-11 12:56:49 +00:00
Alexander Motin	eb6d6216c4	Move root_mount_hold() functionality to separate mutex. It has nothing to share with mutex protecting list of mounted file systems.	2014-06-11 08:14:08 +00:00
Konstantin Belousov	a19c5d3716	Devolatile as needed. Sponsored by: The FreeBSD Foundation MFC after: 13 days	2014-06-09 09:10:31 +00:00
Konstantin Belousov	7f82c6c17f	Change the nblock mutex, protecting the needsbuffer buffer deficit flags, to rwlock. Lock it in read mode when used from subroutines called from buffer release code paths. The needsbuffer is now updated using atomics, while read lock of nblock prevents loosing the wakeups from bufspacewakeup() and bufcountadd() in getnewbuf_bufd_help(). In several interesting loads, needsbuffer flags are never set, while buffers are reused quickly. This causes brelse() and bqrelse() from different threads to content on the nblock. Now they take nblock in read mode, together with needsbuffer not needing an update, allowing higher parallelism. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2014-06-09 03:38:03 +00:00
Alan Cox	78960940fe	Refresh a comment. The VM_STACK option was eliminated in r43209. Sponsored by: EMC / Isilon Storage Division	2014-06-09 00:15:16 +00:00
Alexander Motin	3345d73ca8	Remove extra branching from r267232. MFC after: 2 weeks	2014-06-08 19:01:37 +00:00
Alexander Motin	590d636321	Use atomics to modify numvnodes variable. This allows to mostly avoid lock usage in getnewvnode_[drop_]reserve(), that reduces number of global vnode_free_list_mtx mutex acquisitions from 4 to 2 per NFS request on ZFS, improving SMP scalability. Reviewed by: kib MFC after: 2 weeks Sponsored by: iXsystems, Inc.	2014-06-08 15:38:40 +00:00
Konstantin Belousov	a288c757d4	Remove write-only local variable. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-06-08 10:56:25 +00:00
Konstantin Belousov	23f6698fbd	Initialize the pbuf counter for directio using SYSINIT, instead of using a direct hook called from kern_vfs_bio_buffer_alloc(). Mark ffs_rawread.c as requiring both ffs and directio options to be compiled into the kernel. Add ffs_rawread.c to the list of ufs.ko module' sources. In addition to stopping breaking the layering violation, it also allows to link kernel when FFS is configured as module and DIRECTIO is enabled. One consequence of the change is that ffs_rawread.o is always linked into the module regardless of the DIRECTIO option. This is similar to the option QUOTA and ufs_quota.c. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-06-08 10:55:06 +00:00
Jilles Tjoelker	093e059c7d	ktrace: Use designated initializers for the data_lengths array. In the .o file, this only changes some line numbers (head amd64) because element 0 is no longer explicitly initialized. This should make bugs like FreeBSD-SA-14:12.ktrace less likely. Discussed with: des MFC after: 1 week	2014-06-06 14:49:00 +00:00
Davide Italiano	e392e44c27	Convert functions to the new-style format. Submitted by: Vijay Singh <vijju.singh@gmail.com> via -hackers	2014-06-05 03:46:46 +00:00
Marcel Moolenaar	62d76917b8	Introduce a procedural interface to the ifnet structure. The new interface allows the ifnet structure to be defined as an opaque type in NIC drivers. This then allows the ifnet structure to be changed without a need to change or recompile NIC drivers. Put differently, NIC drivers can be written and compiled once and be used with different network stack implementations, provided of course that those network stack implementations have an API and ABI compatible interface. This commit introduces the 'if_t' type to replace 'struct ifnet ' as the type of a network interface. The 'if_t' type is defined as 'void ' to enable the compiler to perform type conversion to 'struct ifnet *' and vice versa where needed and without warnings. The functions that implement the API are the only functions that need to have an explicit cast. The MII code has been converted to use the driver API to avoid unnecessary code churn. Code churn comes from having to work with both converted and unconverted drivers in correlation with having callback functions that take an interface. By converting the MII code first, the callback functions can be defined so that the compiler will perform the typecasts automatically. As soon as all drivers have been converted, the if_t type can be redefined as needed and the API functions can be fix to not need an explicit cast. The immediate benefactors of this change are: 1. Juniper Networks - The network stack implementation in Junos is entirely different from FreeBSD's one and this change allows Juniper to build "stock" NIC drivers that can be used in combination with both the FreeBSD and Junos stacks. 2. FreeBSD - This change opens the door towards changing ifnet and implementing new features and optimizations in the network stack without it requiring a change in the many NIC drivers FreeBSD has. Submitted by: Anuranjan Shukla <anshukla@juniper.net> Reviewed by: glebius@ Obtained from: Juniper Networks, Inc.	2014-06-02 17:54:39 +00:00
Adrian Chadd	924aaf69ff	Pin the right thread. This _was_ right, a last minute suggestion and not enough testing makes Adrian a bad boy. Tested: * igb(4) with RSS patches, by hand verifying each igb(4) taskqueue tid from procstat -ka using cpuset -g -t <tid>.	2014-06-01 04:11:05 +00:00
Dmitry Chagin	5f56da1891	To allow to run the interpreter itself add a new ELF branding type. Allow Linux ABI to run ELF interpreter. MFC after: 3 days	2014-05-31 15:01:51 +00:00
Gleb Smirnoff	c46713e636	Whitespace only.	2014-05-30 08:22:58 +00:00
Mark Johnston	f2789bd5c7	Commit the rest of the changes that were intended to be part of r266826. X-MFC-with: r266826	2014-05-29 01:42:22 +00:00
Don Lewis	5b892e7363	Initialize r_flags the same way in all cases using a sanitized copy of flags that has several bits cleared. The RF_WANTED and RF_FIRSTSHARE bits are invalid in this context, and we want to defer setting RF_ACTIVE in r_flags until later. This should make rman_get_flags() return the correct answer in all cases. Add a KASSERT() to catch callers which incorrectly pass the RF_WANTED or RF_FIRSTSHARE flags. Do a strict equality check on the share type bits of flags. In particular, do an equality check on RF_PREFETCHABLE. The previous code would allow one type of mismatch of RF_PREFETCHABLE but disallow the other type of mismatch. Also, ignore the the RF_ALIGNMENT_MASK bits since alignment validity should be handled by the amask check. This field contains an integer value, but previous code did a strange bitwise comparison on it. Leave the original value of flags unmolested as a minor debug aid. Change the start+amask overflow check to a KASSERT() since it is just meant to catch a highly unlikely programming error in the caller. Reviewed by: jhb MFC after: 1 month	2014-05-28 16:57:17 +00:00
Adrian Chadd	5a6f0eee47	Add a new taskqueue setup method that takes a cpuid to pin the taskqueue worker thread(s) to. For now it isn't a taskqueue/taskthread error to fail to pin to the given cpuid. Thanks to rpaulo@, kib@ and jhb@ for feedback. Tested: * igb(4), with local RSS patches to pin taskqueues. TODO: * ask the doc team for help in documenting the new API call. * add a taskqueue_start_threads_cpuset() method which takes a cpuset_t - but this may require a bunch of surgery to bring cpuset_t into scope.	2014-05-24 20:37:15 +00:00
Benjamin Kaduk	bf09eca2cb	Check for mismatched vref()/vdrop() Assert that the hold count has not fallen below the use count, a situation that would only happen when a vref() (or similar) is erroneously paired with a vdrop(). This situation has not been observed in the wild, but could be helpful for someone implementing a new filesystem. Reviewed by: kib Approved by: hrs (mentor)	2014-05-21 03:11:27 +00:00
Konstantin Belousov	7032434e98	When exec_new_vmspace() decides that current vmspace cannot be reused on execve(2), it calls vmspace_exec(), which frees the current vmspace. The thread executing an exec syscall gets new vmspace assigned, and old vmspace is freed if only referenced by the current process. The free operation includes pmap_release(), which de-constructs the paging structures used by hardware. If the calling process is multithreaded, other threads are suspended in the thread_suspend_check(), and need to be unsuspended and run to be able to exit on successfull exec. Now, since the old vmspace is destroyed, paging structures are invalid, threads are resumed on the non-existent pmaps (page tables), which leads to triple fault on x86. To fix, postpone the free of old vmspace until the threads are resumed and exited. To avoid modifications to all image activators all of which use exec_new_vmspace(), memoize the current (old) vmspace in kern_execve(), and notify it about the need to call vmspace_free() with a thread-private flag TDP_EXECVMSPC. http://bugs.debian.org/743141 Reported by: Ivo De Decker <ivo.dedecker@ugent.be> through secteam Sponsored by: The FreeBSD Foundation MFC after: 3 days	2014-05-20 09:19:35 +00:00
Don Lewis	c201b03fc3	Slightly restructure the final loop in rman_reserve_resource_bound(). Replace with the existing loop termination test with a similar condition from the nested "if" that may terminate the loop a bit sooner, but still not too early. This condition can then be removed from the nested "if". Relocate an operator to be style(9) compliant. MFC after: 3 days	2014-05-19 04:44:27 +00:00
Edward Tomasz Napierala	fbaadda60b	Initialize loginclass mutex using MTX_SYSINIT instead of using SI_SUB_CPU. Suggested by: rwatson@ MFC after: 1 month Sponsored by: The FreeBSD Foundation	2014-05-14 09:03:02 +00:00
Don Lewis	11e104c50f	Be even more paranoid about overflow. Requested by: ache	2014-05-12 20:22:42 +00:00
Don Lewis	11ada7013a	Nuke a couple of unnecessary assigments. Nothing uses the values of rstart and rend after this point. MFC after: 1 week	2014-05-12 17:56:52 +00:00
Jilles Tjoelker	857ce8a246	accept(),accept4(): Don't set addrlen = 0 on [ECONNABORTED]. If the underlying protocol reported an error (e.g. because a connection was closed while waiting in the queue), this error was also indicated by returning a zero-length address. For all other kinds of errors (e.g. [EAGAIN], [ENFILE], [EMFILE]), addrlen is unmodified and there are successful cases where a zero-length address is returned (e.g. a connection from an unbound Unix-domain socket), so this error indication is not reliable. As reported in Austin Group bug #836, modifying addrlen on error may cause subtle bugs if applications retry the call without resetting addrlen.	2014-05-11 21:21:14 +00:00
Colin Percival	760f4dec67	In cf_get_method, when we don't already know what clock speed the CPU is running at, guess the nearest value instead of looking for a value within 25 MHz of the observed frequency. Prior to this change, if a system booted with Intel Turbo Boost enabled, the dev.cpu.0.freq sysctl is nonfunctional, since the ACPI-reported frequency for Turbo Boost states does not match the actual clock frequency (and thus no levels are within 25 MHz of the observed frequency) and the current performance level is read before a new level is set. MFC after: 3 days Relnotes: Bug fix in power management on CPUs with Intel Turbo Boost	2014-05-11 10:32:58 +00:00
Adrian Chadd	ac75ee9fa3	Add in support to optionally pin the swi threads. Under enough load, the swi's can actually be preempted and migrated to other currently free cores. When doing RSS experiments, this lead to the per-CPU TCP timers not lining up any more with the RX CPU said flows were ending up on, leading to increased lock contention. Since there was a little pushback on flipping them on by default, I've left the default at "don't pin." The other less obvious problem here is that the default swi is also the same as the destination swi for CPU #0. So if one pins the swi on CPU #0, there's no default floating swi. A nice future project would be to create a separate swi for the "default" floating swi, as well as per-CPU swis that are (optionally) pinned. Tested: * parallel TCP tests (2 x 1g unfortunately for now); CPU: Intel(R) Xeon(R) CPU E5-2650 Note: This is based on some initial investigation into RSS/TCP stack lock contention on FreeBSD-HEAD whilst at Netflix in January 2014.	2014-05-10 00:53:36 +00:00
Don Lewis	1237b6d9ed	Avoid unsigned integer overflow which can cause rman_reserve_resource_bound() to return incorrect results. Continue the initial search until the first viable region is found. Add a comment to explain the search termination test. PR: kern/188534 Reviewed by: jhb (previous version) MFC after: 1 week	2014-05-05 15:59:31 +00:00
Mateusz Guzik	f2b1eaec33	Request a non-exiting process in sysctl_kern_proc_{o,}filedesc This fixes a race with exit1 freeing p_textvp. Suggested by: kib MFC after: 1 week	2014-05-02 21:55:09 +00:00
Christian Brueffer	ed472910ba	Free resources in an error case. CID: 1018947 Found with: Coverity Prevent(tm) MFC after: 1 week	2014-05-02 21:34:17 +00:00
Robert Watson	a2496f6e01	Garbage collect mtxpool_lockbuilder, the mutex pool historically used for lockmgr and sx interlocks, but unused since optimised versions of those sleep locks were introduced. This will save a (quite) small amount of memory in all kernel configurations. The sleep mutex pool is retained as it is used for 'struct bio' and several other consumers. Discussed with: jhb MFC after: 3 days	2014-05-02 07:57:40 +00:00
Mateusz Guzik	183870cf75	Ignore the error from pipespace_new when creating a pipe. It can fail if pipe map is exhausted (as a result of too many pipes created), but it is not fatal and could be provoked by unprivileged users. The only consequence is worse performance with given pipe. Reported by: ivoras Suggested by: kib MFC after: 1 week	2014-05-02 00:52:13 +00:00
Brooks Davis	ee9bc59982	Fix a 2038 bug. If time_t is 64-bit (i.e. isn't 32-bit) allow any value of year, not just years less than 2038. Don't bother fixing the underflow in the case of years before 1903. MFC after: 1 week Sponsored by: DARPA, AFRL	2014-05-01 22:28:14 +00:00
Marius Strobl	0d13d5fce2	Given that as of r258002 the last external user is gone, make sched_lock static.	2014-04-29 20:51:57 +00:00
Peter Grehan	d6cd193e5e	Bump WITNESS_PENDLIST by MAXCPU to account for the pmap pvlist locks which are scaled by MAXCPU. This allows an amd64 system to boot with MAXCPU set to 256, which is currently FreeBSD's hard limit without x2apic support. Compile-tested for other arch's. PR: 185831 Discussed with: jhb MFC after: 3 weeks	2014-04-29 17:22:29 +00:00
Brooks Davis	a3fe2bc59e	Revert r263754, re-adding support for hw.bus.devctl_disable. Breaking old devd's and thus hosts that get IP addresses from DHCP was too much of a POLA violation. The sysctl may be removed again after r263758 has been merged to at least stable/9 and stable/10, and releases have been cut from those branches. Discussed with: mjg Reported by: theraven, rwatson	2014-04-28 20:38:08 +00:00
Scott Long	60ad8150c7	Retire smp_active. It was racey and caused demonstrated problems with the cpufreq code. Replace its use with smp_started. There's at least one userland tool that still looks at the kern.smp.active sysctl, so preserve it but point it to smp_started as well. Discussed with: peter, jhb MFC after: 3 days Obtained from: Netflix	2014-04-26 20:27:54 +00:00
Bryan Drewery	2809a6dfa4	Fix grammar error and trailing newline. Submitted by: danfe MFC after: 3 days	2014-04-23 02:21:17 +00:00
Ian Lepore	6afc723819	Fix a comment typo; conversion tables are for leap years, not leap seconds.	2014-04-20 13:37:22 +00:00
Konstantin Belousov	beb4f781a5	Fix typo. MFC after: 3 days	2014-04-17 18:13:23 +00:00
Navdeep Parhar	c7a3775adf	Do not set M_BESTFIT if a strategy has already been provided. This fixes problems when using M_FIRSTFIT. Reviewed by: jeff@ MFC after: 1 week	2014-04-16 21:39:43 +00:00
Alexander Motin	d10a1df8d7	Fix VIRTUAL and PROF interval timers for short intervals, broken at r247903. Due to the way those timers are implemented, we can't handle very short intervals. In addition to that mentioned patch caused math overflows for short intervals. To avoid that round those intervals to 1 tick. PR: kern/187668 MFC after: 1 week	2014-04-16 18:37:46 +00:00
Christian Brueffer	83a396ce95	Refine r264422: set buf to NULL only when we don't allocate memory, and free buf unconditionally. Requested by: kib MFC after: 1 week	2014-04-14 21:02:20 +00:00
Christian Brueffer	a1761d7335	Free buf after usage. CID: 1199377 Found with: Coverity Prevent(tm) MFC after: 1 week	2014-04-13 21:23:15 +00:00
Davide Italiano	4bc38a5ab0	Hide internal details of sbintime_t implementation wrapping INT64_MAX into SBT_MAX, to make it more robust in case internal type representation will change in the future. All the consumers were migrated to SBT_MAX and every new consumer (if any) should from now use this interface. Requested by: bapt, jmg, Ryan Lortie (implictly) Reviewed by: mav, bde	2014-04-12 23:29:29 +00:00
Bryan Drewery	97c0df733f	Use proper MFSNAMELEN for fs type. MFC after: 2 weeks Reviewed by: rodrigc Also spotted by:ambrisko	2014-04-12 21:39:17 +00:00
David Xu	7d62aec6fe	Add kqueue support for devctl. Reviewed by: kib,mjg	2014-04-10 02:30:51 +00:00
Sean Bruno	b888dae4c8	sys/kern/imgact_binmisc.c -- free the right pointer mask vs magic sys/sys/imagact_binmisc.h -- cleanup white space tabs vs spaces -- remove stray " in comment Submitted by: jmallett@	2014-04-08 22:12:01 +00:00
Sean Bruno	6d75644981	Add Stacey Son's binary activation patches that allow remapping of execution to a emumation program via parsing of ELF header information. With this kernel module and userland tool, poudriere is able to build ports packages via the QEMU userland tools (or another emulator program) in a different architecture chroot, e.g. TARGET=mips TARGET_ARCH=mips I'm not connecting this to GENERIC for obvious reasons, but this should allow the kernel module to be built by default and enable the building of the userland tool (which automatically loads the kernel module). Submitted by: sson@ Reviewed by: jhb@	2014-04-08 20:10:22 +00:00
Aleksandr Rybalko	19fbe1ea90	Do not fill screen, while muted. Sponsored by: The FreeBSD Foundation	2014-04-07 22:37:13 +00:00
Ed Schouten	8f5b107b84	Thinko: don't forget to apply 'howto' in case init(8) isn't running.	2014-04-07 21:18:12 +00:00
Ed Schouten	912d59378b	Clean up shutdown_nice(). Just send the right signal to init(8). Right now, init(8) cannot distinguish between an ACPI power button press or a Ctrl+Alt+Del sequence on the keyboard. This is because shutdown_nice() sends SIGINT to init(8) unconditionally, but later modifies the arguments to reboot(2) to force a certain behaviour. Instead of doing this, patch up the code to just forward the appropriate signal to userspace. SIGUSR1 and SIGUSR2 can already be used to halt the system. While there, move waittime to the function where it's used; kern_reboot().	2014-04-07 21:11:29 +00:00
Ed Schouten	38219d6acd	Implement kqueue(2) for procdesc(4). kqueue(2) already supports EVFILT_PROC. Add an EVFILT_PROCDESC that behaves the same, but operates on a procdesc(4) instead. Only implement NOTE_EXIT for now. The nice thing about NOTE_EXIT is that it also returns the exit status of the process, meaning that we can now obtain this value, even if pdwait4(2) is still unimplemented. Notes: - Simply reuse EVFILT_NETDEV for EVFILT_PROCDESC. As both of these will be used on totally different descriptor types, this should not clash. - Let procdesc_kqops_event() reuse the same structure as filt_proc(). The only difference is that procdesc_kqops_event() should also be able to deal with the case where the process was already terminated after registration. Simply test this when hint == 0. - Fix some style(9) issues in filt_proc() to keep it consistent with the newly added procdesc_kqops_event(). - Save the exit status of the process in pd->pd_xstat, as we cannot pick up the proctree_lock from within procdesc_kqops_event(). Discussed on: arch@ Reviewed by: kib@	2014-04-07 18:10:49 +00:00
Ed Schouten	d7a39436e5	Fix a typo. The function name is pdfork; not pfork.	2014-04-06 20:20:07 +00:00
Ed Schouten	a90feb39a2	Nit: fix locking of p->p_state in procdesc_close(). According to <sys/proc.h>, this field needs to be locked with either the p_mtx or the p_slock. In this case the damage was quite small. Instead of being reaped, the process would just be reparented to init, so it could be reaped from there.	2014-04-06 20:00:42 +00:00
Konstantin Belousov	14fcb4b4f8	Use realloc(9) instead of doing the reallocation inline. Submitted by: bde MFC after: 1 week	2014-04-05 20:44:52 +00:00
Dmitry Chagin	6b57eff4c0	Prevent alq from panic when the invalid alq_file path specified. MFC after: 1 week	2014-04-05 16:54:47 +00:00
Konstantin Belousov	1a5edcf8ea	When KN_INFLUX is set on the knote due to kqueue_register() or kqueue_scan() unlocking the kqueue to call f_event, knote() or knote_fork() should not skip the knote. The knote is not going to disappear during the influx time, and the mutual exclusion between scan and knote() is ensured by both code pathes taking knlist lock. The race appears since knlist lock is before kq lock, so KN_INFLUX must be set, kq lock must be dropped and only then knlist lock can be taken. The window between kq unlock and knlist lock causes lost events. Add a flag KN_SCAN to indicate that KN_INFLUX is set in a manner safe for the knote(), and check for it to ignore KN_INFLUX in the knote*() as needed. Also, in knote(), remove the lockless check for the KN_INFLUX flag, which could also result in the lost notification. Reported and tested by: Kohji Okuno <okuno.kohji@jp.panasonic.com> Discussed with: jmg Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-04-05 14:09:16 +00:00
Ed Maste	b7bd677fe1	Initialise m_pkthdr via bzero instead of explicitly zeroing each member Sponsored by: The FreeBSD Foundation	2014-04-04 21:09:06 +00:00
David Xu	5055c92801	Fix SIGIO delivery. Use fsetown() to handle file descriptor owner ioctl and use pgsigio() to send SIGIO. Submitted by: truckman Reviewed by: mjg	2014-04-04 12:31:13 +00:00
Mateusz Guzik	210a5d1689	Garbage collect fdavail. It rarely returns an error and fdallocn handles the failure of fdalloc just fine.	2014-04-04 05:07:36 +00:00
Ian Lepore	9e24f23880	Fix build breakage. Apparently all ARM configs build kern_et.c, but only a few of them also build kern_clocksource.c. That strikes me as insane, but maybe there's a good reason for it. Until I figure that out, un-break the build by not referencing functions in kern_clocksource if NO_EVENTTIMERS is defined.	2014-04-02 17:34:17 +00:00
Ian Lepore	cfc4b56b57	Add support for event timers whose clock frequency can change while running.	2014-04-02 15:56:11 +00:00
Mateusz Guzik	0ab7a1f396	Document a known problem with handling the process intended to receive SIGIO in /dev/devctl. Suggested by: adrian MFC after: 6 days	2014-03-25 23:30:35 +00:00
Mateusz Guzik	88b7c833d2	Remove long obsolete sysctl hw.bus.devctl_disable. Suggested by: imp Relnotes: yes	2014-03-25 23:19:45 +00:00
Mateusz Guzik	6abaea7d58	Remove lockless check in devopen, while correct it does not make much sense. Suggested by: imp MFC after: 6 days	2014-03-25 23:13:46 +00:00
Mateusz Guzik	37dbba2a44	Make /dev/devctl mpsafe. MFC after: 1 week	2014-03-25 03:28:58 +00:00
Maksim Yevmenkin	b646225a13	change defaule permissions on /dev/devstat. while i'm here remove D_NEEDGIANT flag Submitted by: jhb Reviewed by: jhb, scottl, rwatson, delphij, phk MFC after: 1 week	2014-03-24 18:13:41 +00:00
Neel Natu	d6543c678c	Don't lose track of the KTR entries copied from 'ktr_buf_init[]' to the dynamically allocated 'ktr_buf[]'. The memcpy arranges 'ktr_buf[]' such that the latest KTR entry is at 'KTR_BOOT_ENTRIES - 1'.	2014-03-22 22:35:57 +00:00
Bryan Drewery	44f1c91610	Rename global cnt to vm_cnt to avoid shadowing. To reduce the diff struct pcu.cnt field was not renamed, so PCPU_OP(cnt.field) is still used. pc_cnt and pcpu are also used in kvm(3) and vmstat(8). The goal was to not affect externally used KPI. Bump __FreeBSD_version_ in case some out-of-tree module/code relies on the the global cnt variable. Exp-run revealed no ports using it directly. No objection from: arch@ Sponsored by: EMC / Isilon Storage Division	2014-03-22 10:26:09 +00:00
Mateusz Guzik	f804336026	Mark the following sysctls as MPSAFE: kern.file kern.proc.filedesc kern.proc.ofiledesc MFC after: 7 days	2014-03-21 19:12:05 +00:00
Konstantin Belousov	52f3c44efe	Fix two issues with /dev/mem access on amd64, both causing kernel page faults. First, for accesses to direct map region should check for the limit by which direct map is instantiated. Second, for accesses to the kernel map, success returned from the kernacc(9) does not guarantee that consequent attempt to read or write to the checked address succeed, since other thread might invalidate the address meantime. Add a new thread private flag TDP_DEVMEMIO, which instructs vm_fault() to return error when fault happens on the MAP_ENTRY_NOFAULT entry, instead of panicing. The trap handler would then see a page fault from access, and recover in normal way, making /dev/mem access safer. Remove GIANT_REQUIRED from the amd64 memrw(), since it is not needed and having Giant locked does not solve issues for amd64. Note that at least the second issue exists on other architectures, and requires similar patching for md code. Reported and tested by: clusteradm (gjb, sbruno) Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-03-21 14:25:09 +00:00
Mateusz Guzik	4c73e705a5	Take filedesc lock only for reading when allocating new fdtable. Code populating the table does this already. MFC after: 1 week	2014-03-21 01:34:19 +00:00
Attilio Rao	3198603edd	Fix comments. Sponsored by: EMC / Isilon Storage Division	2014-03-19 12:45:40 +00:00
Konstantin Belousov	88b124cede	Make the array pointed to by AT_PAGESIZES auxv properly aligned. Also, remove the expression which calculated the location of the strings for a new image and grown over the time to be non-comprehensible. Instead, calculate the offsets by steps, which also makes fixing the alignments much cleaner. Reported and reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-03-19 12:35:04 +00:00
Attilio Rao	c149e542a5	Fix GENERIC build.	2014-03-19 00:38:27 +00:00
Attilio Rao	4f11a684ff	Regen per r263318. Sponsored by: EMC / Isilon storage division	2014-03-18 21:34:11 +00:00
Attilio Rao	ce42e79310	Remove dead code from umtx support: - Retire long time unused (basically always unused) sys__umtx_lock() and sys__umtx_unlock() syscalls - struct umtx and their supporting definitions - UMUTEX_ERROR_CHECK flag - Retire UMTX_OP_LOCK/UMTX_OP_UNLOCK from _umtx_op() syscall __FreeBSD_version is not bumped yet because it is expected that further breakages to the umtx interface will follow up in the next days. However there will be a final bump when necessary. Sponsored by: EMC / Isilon storage division Reviewed by: jhb	2014-03-18 21:32:03 +00:00
Robert Watson	4a14441044	Update kernel inclusions of capability.h to use capsicum.h instead; some further refinement is required as some device drivers intended to be portable over FreeBSD versions rely on __FreeBSD_version to decide whether to include capability.h. MFC after: 3 weeks	2014-03-16 10:55:57 +00:00
John-Mark Gurney	6f2b769cac	change td_retval into a union w/ off_t, with defines to mask the change... This eliminates a cast, and also forces td_retval (often 2 32-bit registers) to be aligned so that off_t's can be stored there on arches with strict alignment requirements like armeb (AVILA)... On i386, this doesn't change alignment, and on amd64 it doesn't either, as register_t is already 64bits... This will also prevent future breakage due to people adding additional fields to the struct... This gets AVILA booting a bit farther... Reviewed by: bde	2014-03-16 00:53:40 +00:00
Gleb Smirnoff	45c203fce2	Remove AppleTalk support. AppleTalk was a network transport protocol for Apple Macintosh devices in 80s and then 90s. Starting with Mac OS X in 2000 the AppleTalk was a legacy protocol and primary networking protocol is TCP/IP. The last Mac OS X release to support AppleTalk happened in 2009. The same year routing equipment vendors (namely Cisco) end their support. Thus, AppleTalk won't be supported in FreeBSD 11.0-RELEASE.	2014-03-14 06:29:43 +00:00
Gleb Smirnoff	2c284d9395	Remove IPX support. IPX was a network transport protocol in Novell's NetWare network operating system from late 80s and then 90s. The NetWare itself switched to TCP/IP as default transport in 1998. Later, in this century the Novell Open Enterprise Server became successor of Novell NetWare. The last release that claimed to still support IPX was OES 2 in 2007. Routing equipment vendors (e.g. Cisco) discontinued support for IPX in 2011. Thus, IPX won't be supported in FreeBSD 11.0-RELEASE.	2014-03-14 02:58:48 +00:00
Bryan Drewery	ae8959dd57	Combine similar code from vprintf(9) and log(9). MFC after: 2 weeks	2014-03-14 01:17:11 +00:00
Alan Somers	c2090e73d7	Replace 4.4BSD Lite's unix domain socket backpressure hack with a cleaner mechanism, based on the new SB_STOP sockbuf flag. The old hack dynamically changed the sending sockbuf's high water mark whenever adding or removing data from the receiving sockbuf. It worked for stream sockets, but it never worked for SOCK_SEQPACKET sockets because of their atomic nature. If the sockbuf was partially full, it might return EMSGSIZE instead of blocking. The new solution is based on DragonFlyBSD's fix from commit 3a6117bbe0ed6a87605c1e43e12a1438d8844380 on 2008-05-27. It adds an SB_STOP flag to sockbufs. Whenever uipc_send surpasses the socket's size limit, it sets SB_STOP on the sending sockbuf. sbspace() will then return 0 for that sockbuf, causing sosend_generic and friends to block. uipc_rcvd will likewise clear SB_STOP. There are two fringe benefits: uipc_{send,rcvd} no longer need to call chgsbsize() on every send and receive because they don't change the sockbuf's high water mark. Also, uipc_sense no longer needs to acquire the UIPC linkage lock, because it's simpler to compute the st_blksizes. There is one drawback: since sbspace() will only ever return 0 or the maximum, sosend_generic will allow the sockbuf to exceed its nominal maximum size by at most one packet of size less than the max. I don't think that's a serious problem. In fact, I'm not even positive that FreeBSD guarantees a socket will always stay within its nominal size limit. sys/sys/sockbuf.h Add the SB_STOP flag and adjust sbspace() sys/sys/unpcb.h Delete the obsolete unp_cc and unp_mbcnt fields from struct unpcb. sys/kern/uipc_usrreq.c Adjust uipc_rcvd, uipc_send, and uipc_sense to use the SB_STOP backpressure mechanism. Removing obsolete unpcb fields from db_show_unpcb. tests/sys/kern/unix_seqpacket_test.c Clear expected failures from ATF. Obtained from: DragonFly BSD PR: kern/185812 Reviewed by: silence from freebsd-net@ and rwatson@ MFC after: 3 weeks Sponsored by: Spectra Logic Corporation	2014-03-13 18:42:12 +00:00
Konstantin Belousov	cee9542d51	Use correct types for sizeof() in the calculations for the malloc(9) sizes [1]. While there, remove unneeded checks for failed allocations with M_WAITOK flag. Submitted by: Conrad Meyer <cemeyer@uw.edu> [1] MFC after: 1 week	2014-03-12 10:25:26 +00:00
Konstantin Belousov	9d2437a6f5	The auio structure is only initialized when the vnode is symlink, avoid reading from it otherwise. Submitted by: Conrad Meyer <cemeyer@uw.edu> MFC after: 1 week	2014-03-12 10:23:51 +00:00
Jeff Roberson	8bc713f6c5	- Make runq_steal_from more aggressive. Previously it would examine only a single priority queue. If that queue had a thread or threads which could not be migrated we would fail to steal load. This could cause starvation in situations where cores are idle. Submitted by: Doug Kilpatrick <dkilpatrick@isilon.com> Tested by: pho Reviewed by: mav Sponsored by: EMC / Isilon Storage Division	2014-03-08 00:35:06 +00:00
Alan Somers	74107e870a	Partial revert of change 262914. I screwed up subversion syntax with perforce syntax and committed some unrelated files. Only devd files should've been committed. Reported by: imp Pointy hat to: asomers MFC after: 3 weeks X-MFC-With: r262914	2014-03-07 23:40:36 +00:00
Alan Somers	6a2ae0eb16	sbin/devd/devd.8 sbin/devd/devd.cc Add a -q flag to devd that will suppress syslog logging at LOG_NOTICE or below. Requested by: ian@ and imp@ MFC after: 3 weeks Sponsored by: Spectra Logic Corporation	2014-03-07 23:30:48 +00:00
Alan Somers	8de34a88de	Fix PR kern/185813 "SOCK_SEQPACKET AF_UNIX sockets with asymmetrical buffers drop packets". It was caused by a check for the space available in a sockbuf, but it was checking the wrong sockbuf. sys/sys/sockbuf.h sys/kern/uipc_sockbuf.c Add sbappendaddr_nospacecheck_locked(), which is just like sbappendaddr_locked but doesn't validate the receiving socket's space. Factor out common code into sbappendaddr_locked_internal(). We shouldn't simply make sbappendaddr_locked check the space and then call sbappendaddr_nospacecheck_locked, because that would cause the O(n) function m_length to be called twice. sys/kern/uipc_usrreq.c Use sbappendaddr_nospacecheck_locked for SOCK_SEQPACKET sockets, because the receiving sockbuf's size limit is irrelevant. tests/sys/kern/unix_seqpacket_test.c Now that 185813 is fixed, pipe_128k_8k fails intermittently due to 185812. Make it fail every time by adding a usleep after starting the writer thread and before starting the reader thread in test_pipe. That gives the writer time to fill up its send buffer. Also, clear the expected failure message due to 185813. It actually said "185812", but that was a typo. PR: kern/185813 Reviewed by: silence from freebsd-net@ and rwatson@ MFC after: 3 weeks Sponsored by: Spectra Logic Corporation	2014-03-06 20:24:15 +00:00
Dimitry Andric	892620150f	Merge from head up to r262415.	2014-02-23 23:33:11 +00:00
Dimitry Andric	f9d498ad60	On sparc64, VM_KMEM_SIZE_SCALE is not a constant expression, so it cannot be tested in a CTASSERT().	2014-02-23 17:37:24 +00:00
Bryan Drewery	63d8fe5531	Fix style of comment blocks. Reported by: peter Approved by: bapt (mentor, implicit) X-MFC with: r262006	2014-02-22 04:28:49 +00:00
Mark Johnston	9e9ea73715	Print a backtrace if the SDT(9) stub gets called so that there's at least some hope of figuring out how it happened. Suggested by: rstone MFC after: 1 week	2014-02-22 01:41:45 +00:00
Mateusz Guzik	1f9e8f8ad9	Fix a race between kern_proc_{o,}filedesc_out and fdescfree leading to use-after-free. fdescfree proceeds to free file pointers once fd_refcnt reaches 0, but kern_proc_{o,}filedesc_out only checked for hold count. MFC after: 3 days	2014-02-21 22:29:09 +00:00
Bryan Drewery	70f82cfbaf	Fix M_FILEDESC leak in fdgrowtable() introduced in r244510. fdgrowtable() now only reallocates fd_map when necessary. This fixes fdgrowtable() to use the same logic as fdescfree() for when to free the fd_map. The logic in fdescfree() is intended to not free the initial static allocation, however the fd_map grows at a slower rate than the table does. The table is intended to hold 20 fd, but its initial map has many more slots than 20. The slot sizing causes NDSLOTS(20) through NDSLOTS(63) to be 1 which matches NDSLOTS(20), so fdescfree() was assuming that the fd_map was still the initial allocation and not freeing it. This partially reverts r244510 by reintroducing some of the logic it removed in fdgrowtable(). Reviewed by: mjg Approved by: bapt (mentor) MFC after: 2 weeks	2014-02-17 00:00:39 +00:00
Bryan Drewery	88812f91aa	Remove redundant memcpy of fd_ofiles in fdgrowtable() added in r247602 Discussed with: mjg Approved by: bapt (mentor) MFC after: 2 weeks	2014-02-16 23:10:46 +00:00
Adrian Chadd	f44e2a4c0f	Include the CPU id in the per-CPU timer swi thread descriptions. Original patch by: jhb	2014-02-14 23:19:51 +00:00
Sergey Kandaurov	54bb553005	Preserve one character space for a trailing '\0'. Found by: Ivan Klymenko via cppcheck Discussed with: ae MFC after: 1 week	2014-02-14 20:54:03 +00:00
Christian Brueffer	53c4471833	Fix a bug in be_uuid_dec(); it called le16dec() instead of be16dec(), probably due to copy+pasting le_uuid_dec(). PR: 146588 Submitted by: Erwin Rol <erwin at erwinrol.com> Reviewed by: marcel MFC after: 1 week	2014-02-13 22:24:36 +00:00
Ian Lepore	42c8459bed	Rework the EARLY_PRINTF mechanism. Instead of defining a special eprintf() routine, now a platform can provide a pointer to an early_putc() routine which is used instead of cn_putc(). Control can be handed off from early printf support to standard console support by NULLing out the pointer during standard console init. This leverages all the existing error reporting that uses printf calls, such as panic() which can now be usefully employed even in early platform init code (useful at least to those who maintain that code and build kernels with EARLY_PRINTF defined). Reviewed by: imp, eadler	2014-02-12 00:53:38 +00:00
John Baldwin	2db08c03f0	Expose OBJT_MGTDEVICE VM objects used for GEM/TTM with drm2 as an explicit object type. Reviewed by: kib MFC after: 1 week	2014-02-11 21:57:37 +00:00
Gleb Smirnoff	49fef6a202	Create two public UMA_ZONE_PCPU zones: 64 bit sized and pointer sized. Sponsored by: Nginx, Inc.	2014-02-10 19:59:46 +00:00
Gleb Smirnoff	b5c32cf481	Remove identical vnet sysctl handlers, and handle CTLFLAG_VNET in the sysctl_root(). Note: SYSCTL_VNET_* macros can be removed as well. All is needed to virtualize a sysctl oid is set CTLFLAG_VNET on it. But for now keep macros in place to avoid large code churn. Sponsored by: Nginx, Inc.	2014-02-07 13:47:33 +00:00
John Baldwin	e432d5f6a7	Drop the 3rd clause from all 3 clause BSD licenses where I am the sole holder to convert them to 2 clause BSD licenses. MFC after: 1 week	2014-02-05 18:13:27 +00:00

... 2 3 4 5 6 ...

13941 Commits