freebsd-dev

Author	SHA1	Message	Date
Sean Bruno	6d75644981	Add Stacey Son's binary activation patches that allow remapping of execution to a emumation program via parsing of ELF header information. With this kernel module and userland tool, poudriere is able to build ports packages via the QEMU userland tools (or another emulator program) in a different architecture chroot, e.g. TARGET=mips TARGET_ARCH=mips I'm not connecting this to GENERIC for obvious reasons, but this should allow the kernel module to be built by default and enable the building of the userland tool (which automatically loads the kernel module). Submitted by: sson@ Reviewed by: jhb@	2014-04-08 20:10:22 +00:00
Aleksandr Rybalko	19fbe1ea90	Do not fill screen, while muted. Sponsored by: The FreeBSD Foundation	2014-04-07 22:37:13 +00:00
Ed Schouten	8f5b107b84	Thinko: don't forget to apply 'howto' in case init(8) isn't running.	2014-04-07 21:18:12 +00:00
Ed Schouten	912d59378b	Clean up shutdown_nice(). Just send the right signal to init(8). Right now, init(8) cannot distinguish between an ACPI power button press or a Ctrl+Alt+Del sequence on the keyboard. This is because shutdown_nice() sends SIGINT to init(8) unconditionally, but later modifies the arguments to reboot(2) to force a certain behaviour. Instead of doing this, patch up the code to just forward the appropriate signal to userspace. SIGUSR1 and SIGUSR2 can already be used to halt the system. While there, move waittime to the function where it's used; kern_reboot().	2014-04-07 21:11:29 +00:00
Ed Schouten	38219d6acd	Implement kqueue(2) for procdesc(4). kqueue(2) already supports EVFILT_PROC. Add an EVFILT_PROCDESC that behaves the same, but operates on a procdesc(4) instead. Only implement NOTE_EXIT for now. The nice thing about NOTE_EXIT is that it also returns the exit status of the process, meaning that we can now obtain this value, even if pdwait4(2) is still unimplemented. Notes: - Simply reuse EVFILT_NETDEV for EVFILT_PROCDESC. As both of these will be used on totally different descriptor types, this should not clash. - Let procdesc_kqops_event() reuse the same structure as filt_proc(). The only difference is that procdesc_kqops_event() should also be able to deal with the case where the process was already terminated after registration. Simply test this when hint == 0. - Fix some style(9) issues in filt_proc() to keep it consistent with the newly added procdesc_kqops_event(). - Save the exit status of the process in pd->pd_xstat, as we cannot pick up the proctree_lock from within procdesc_kqops_event(). Discussed on: arch@ Reviewed by: kib@	2014-04-07 18:10:49 +00:00
Ed Schouten	d7a39436e5	Fix a typo. The function name is pdfork; not pfork.	2014-04-06 20:20:07 +00:00
Ed Schouten	a90feb39a2	Nit: fix locking of p->p_state in procdesc_close(). According to <sys/proc.h>, this field needs to be locked with either the p_mtx or the p_slock. In this case the damage was quite small. Instead of being reaped, the process would just be reparented to init, so it could be reaped from there.	2014-04-06 20:00:42 +00:00
Konstantin Belousov	14fcb4b4f8	Use realloc(9) instead of doing the reallocation inline. Submitted by: bde MFC after: 1 week	2014-04-05 20:44:52 +00:00
Dmitry Chagin	6b57eff4c0	Prevent alq from panic when the invalid alq_file path specified. MFC after: 1 week	2014-04-05 16:54:47 +00:00
Konstantin Belousov	1a5edcf8ea	When KN_INFLUX is set on the knote due to kqueue_register() or kqueue_scan() unlocking the kqueue to call f_event, knote() or knote_fork() should not skip the knote. The knote is not going to disappear during the influx time, and the mutual exclusion between scan and knote() is ensured by both code pathes taking knlist lock. The race appears since knlist lock is before kq lock, so KN_INFLUX must be set, kq lock must be dropped and only then knlist lock can be taken. The window between kq unlock and knlist lock causes lost events. Add a flag KN_SCAN to indicate that KN_INFLUX is set in a manner safe for the knote(), and check for it to ignore KN_INFLUX in the knote*() as needed. Also, in knote(), remove the lockless check for the KN_INFLUX flag, which could also result in the lost notification. Reported and tested by: Kohji Okuno <okuno.kohji@jp.panasonic.com> Discussed with: jmg Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-04-05 14:09:16 +00:00
Ed Maste	b7bd677fe1	Initialise m_pkthdr via bzero instead of explicitly zeroing each member Sponsored by: The FreeBSD Foundation	2014-04-04 21:09:06 +00:00
David Xu	5055c92801	Fix SIGIO delivery. Use fsetown() to handle file descriptor owner ioctl and use pgsigio() to send SIGIO. Submitted by: truckman Reviewed by: mjg	2014-04-04 12:31:13 +00:00
Mateusz Guzik	210a5d1689	Garbage collect fdavail. It rarely returns an error and fdallocn handles the failure of fdalloc just fine.	2014-04-04 05:07:36 +00:00
Ian Lepore	9e24f23880	Fix build breakage. Apparently all ARM configs build kern_et.c, but only a few of them also build kern_clocksource.c. That strikes me as insane, but maybe there's a good reason for it. Until I figure that out, un-break the build by not referencing functions in kern_clocksource if NO_EVENTTIMERS is defined.	2014-04-02 17:34:17 +00:00
Ian Lepore	cfc4b56b57	Add support for event timers whose clock frequency can change while running.	2014-04-02 15:56:11 +00:00
Mateusz Guzik	0ab7a1f396	Document a known problem with handling the process intended to receive SIGIO in /dev/devctl. Suggested by: adrian MFC after: 6 days	2014-03-25 23:30:35 +00:00
Mateusz Guzik	88b7c833d2	Remove long obsolete sysctl hw.bus.devctl_disable. Suggested by: imp Relnotes: yes	2014-03-25 23:19:45 +00:00
Mateusz Guzik	6abaea7d58	Remove lockless check in devopen, while correct it does not make much sense. Suggested by: imp MFC after: 6 days	2014-03-25 23:13:46 +00:00
Mateusz Guzik	37dbba2a44	Make /dev/devctl mpsafe. MFC after: 1 week	2014-03-25 03:28:58 +00:00
Maksim Yevmenkin	b646225a13	change defaule permissions on /dev/devstat. while i'm here remove D_NEEDGIANT flag Submitted by: jhb Reviewed by: jhb, scottl, rwatson, delphij, phk MFC after: 1 week	2014-03-24 18:13:41 +00:00
Neel Natu	d6543c678c	Don't lose track of the KTR entries copied from 'ktr_buf_init[]' to the dynamically allocated 'ktr_buf[]'. The memcpy arranges 'ktr_buf[]' such that the latest KTR entry is at 'KTR_BOOT_ENTRIES - 1'.	2014-03-22 22:35:57 +00:00
Bryan Drewery	44f1c91610	Rename global cnt to vm_cnt to avoid shadowing. To reduce the diff struct pcu.cnt field was not renamed, so PCPU_OP(cnt.field) is still used. pc_cnt and pcpu are also used in kvm(3) and vmstat(8). The goal was to not affect externally used KPI. Bump __FreeBSD_version_ in case some out-of-tree module/code relies on the the global cnt variable. Exp-run revealed no ports using it directly. No objection from: arch@ Sponsored by: EMC / Isilon Storage Division	2014-03-22 10:26:09 +00:00
Mateusz Guzik	f804336026	Mark the following sysctls as MPSAFE: kern.file kern.proc.filedesc kern.proc.ofiledesc MFC after: 7 days	2014-03-21 19:12:05 +00:00
Konstantin Belousov	52f3c44efe	Fix two issues with /dev/mem access on amd64, both causing kernel page faults. First, for accesses to direct map region should check for the limit by which direct map is instantiated. Second, for accesses to the kernel map, success returned from the kernacc(9) does not guarantee that consequent attempt to read or write to the checked address succeed, since other thread might invalidate the address meantime. Add a new thread private flag TDP_DEVMEMIO, which instructs vm_fault() to return error when fault happens on the MAP_ENTRY_NOFAULT entry, instead of panicing. The trap handler would then see a page fault from access, and recover in normal way, making /dev/mem access safer. Remove GIANT_REQUIRED from the amd64 memrw(), since it is not needed and having Giant locked does not solve issues for amd64. Note that at least the second issue exists on other architectures, and requires similar patching for md code. Reported and tested by: clusteradm (gjb, sbruno) Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-03-21 14:25:09 +00:00
Mateusz Guzik	4c73e705a5	Take filedesc lock only for reading when allocating new fdtable. Code populating the table does this already. MFC after: 1 week	2014-03-21 01:34:19 +00:00
Attilio Rao	3198603edd	Fix comments. Sponsored by: EMC / Isilon Storage Division	2014-03-19 12:45:40 +00:00
Konstantin Belousov	88b124cede	Make the array pointed to by AT_PAGESIZES auxv properly aligned. Also, remove the expression which calculated the location of the strings for a new image and grown over the time to be non-comprehensible. Instead, calculate the offsets by steps, which also makes fixing the alignments much cleaner. Reported and reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-03-19 12:35:04 +00:00
Attilio Rao	c149e542a5	Fix GENERIC build.	2014-03-19 00:38:27 +00:00
Attilio Rao	4f11a684ff	Regen per r263318. Sponsored by: EMC / Isilon storage division	2014-03-18 21:34:11 +00:00
Attilio Rao	ce42e79310	Remove dead code from umtx support: - Retire long time unused (basically always unused) sys__umtx_lock() and sys__umtx_unlock() syscalls - struct umtx and their supporting definitions - UMUTEX_ERROR_CHECK flag - Retire UMTX_OP_LOCK/UMTX_OP_UNLOCK from _umtx_op() syscall __FreeBSD_version is not bumped yet because it is expected that further breakages to the umtx interface will follow up in the next days. However there will be a final bump when necessary. Sponsored by: EMC / Isilon storage division Reviewed by: jhb	2014-03-18 21:32:03 +00:00
Robert Watson	4a14441044	Update kernel inclusions of capability.h to use capsicum.h instead; some further refinement is required as some device drivers intended to be portable over FreeBSD versions rely on __FreeBSD_version to decide whether to include capability.h. MFC after: 3 weeks	2014-03-16 10:55:57 +00:00
John-Mark Gurney	6f2b769cac	change td_retval into a union w/ off_t, with defines to mask the change... This eliminates a cast, and also forces td_retval (often 2 32-bit registers) to be aligned so that off_t's can be stored there on arches with strict alignment requirements like armeb (AVILA)... On i386, this doesn't change alignment, and on amd64 it doesn't either, as register_t is already 64bits... This will also prevent future breakage due to people adding additional fields to the struct... This gets AVILA booting a bit farther... Reviewed by: bde	2014-03-16 00:53:40 +00:00
Gleb Smirnoff	45c203fce2	Remove AppleTalk support. AppleTalk was a network transport protocol for Apple Macintosh devices in 80s and then 90s. Starting with Mac OS X in 2000 the AppleTalk was a legacy protocol and primary networking protocol is TCP/IP. The last Mac OS X release to support AppleTalk happened in 2009. The same year routing equipment vendors (namely Cisco) end their support. Thus, AppleTalk won't be supported in FreeBSD 11.0-RELEASE.	2014-03-14 06:29:43 +00:00
Gleb Smirnoff	2c284d9395	Remove IPX support. IPX was a network transport protocol in Novell's NetWare network operating system from late 80s and then 90s. The NetWare itself switched to TCP/IP as default transport in 1998. Later, in this century the Novell Open Enterprise Server became successor of Novell NetWare. The last release that claimed to still support IPX was OES 2 in 2007. Routing equipment vendors (e.g. Cisco) discontinued support for IPX in 2011. Thus, IPX won't be supported in FreeBSD 11.0-RELEASE.	2014-03-14 02:58:48 +00:00
Bryan Drewery	ae8959dd57	Combine similar code from vprintf(9) and log(9). MFC after: 2 weeks	2014-03-14 01:17:11 +00:00
Alan Somers	c2090e73d7	Replace 4.4BSD Lite's unix domain socket backpressure hack with a cleaner mechanism, based on the new SB_STOP sockbuf flag. The old hack dynamically changed the sending sockbuf's high water mark whenever adding or removing data from the receiving sockbuf. It worked for stream sockets, but it never worked for SOCK_SEQPACKET sockets because of their atomic nature. If the sockbuf was partially full, it might return EMSGSIZE instead of blocking. The new solution is based on DragonFlyBSD's fix from commit 3a6117bbe0ed6a87605c1e43e12a1438d8844380 on 2008-05-27. It adds an SB_STOP flag to sockbufs. Whenever uipc_send surpasses the socket's size limit, it sets SB_STOP on the sending sockbuf. sbspace() will then return 0 for that sockbuf, causing sosend_generic and friends to block. uipc_rcvd will likewise clear SB_STOP. There are two fringe benefits: uipc_{send,rcvd} no longer need to call chgsbsize() on every send and receive because they don't change the sockbuf's high water mark. Also, uipc_sense no longer needs to acquire the UIPC linkage lock, because it's simpler to compute the st_blksizes. There is one drawback: since sbspace() will only ever return 0 or the maximum, sosend_generic will allow the sockbuf to exceed its nominal maximum size by at most one packet of size less than the max. I don't think that's a serious problem. In fact, I'm not even positive that FreeBSD guarantees a socket will always stay within its nominal size limit. sys/sys/sockbuf.h Add the SB_STOP flag and adjust sbspace() sys/sys/unpcb.h Delete the obsolete unp_cc and unp_mbcnt fields from struct unpcb. sys/kern/uipc_usrreq.c Adjust uipc_rcvd, uipc_send, and uipc_sense to use the SB_STOP backpressure mechanism. Removing obsolete unpcb fields from db_show_unpcb. tests/sys/kern/unix_seqpacket_test.c Clear expected failures from ATF. Obtained from: DragonFly BSD PR: kern/185812 Reviewed by: silence from freebsd-net@ and rwatson@ MFC after: 3 weeks Sponsored by: Spectra Logic Corporation	2014-03-13 18:42:12 +00:00
Konstantin Belousov	cee9542d51	Use correct types for sizeof() in the calculations for the malloc(9) sizes [1]. While there, remove unneeded checks for failed allocations with M_WAITOK flag. Submitted by: Conrad Meyer <cemeyer@uw.edu> [1] MFC after: 1 week	2014-03-12 10:25:26 +00:00
Konstantin Belousov	9d2437a6f5	The auio structure is only initialized when the vnode is symlink, avoid reading from it otherwise. Submitted by: Conrad Meyer <cemeyer@uw.edu> MFC after: 1 week	2014-03-12 10:23:51 +00:00
Jeff Roberson	8bc713f6c5	- Make runq_steal_from more aggressive. Previously it would examine only a single priority queue. If that queue had a thread or threads which could not be migrated we would fail to steal load. This could cause starvation in situations where cores are idle. Submitted by: Doug Kilpatrick <dkilpatrick@isilon.com> Tested by: pho Reviewed by: mav Sponsored by: EMC / Isilon Storage Division	2014-03-08 00:35:06 +00:00
Alan Somers	74107e870a	Partial revert of change 262914. I screwed up subversion syntax with perforce syntax and committed some unrelated files. Only devd files should've been committed. Reported by: imp Pointy hat to: asomers MFC after: 3 weeks X-MFC-With: r262914	2014-03-07 23:40:36 +00:00
Alan Somers	6a2ae0eb16	sbin/devd/devd.8 sbin/devd/devd.cc Add a -q flag to devd that will suppress syslog logging at LOG_NOTICE or below. Requested by: ian@ and imp@ MFC after: 3 weeks Sponsored by: Spectra Logic Corporation	2014-03-07 23:30:48 +00:00
Alan Somers	8de34a88de	Fix PR kern/185813 "SOCK_SEQPACKET AF_UNIX sockets with asymmetrical buffers drop packets". It was caused by a check for the space available in a sockbuf, but it was checking the wrong sockbuf. sys/sys/sockbuf.h sys/kern/uipc_sockbuf.c Add sbappendaddr_nospacecheck_locked(), which is just like sbappendaddr_locked but doesn't validate the receiving socket's space. Factor out common code into sbappendaddr_locked_internal(). We shouldn't simply make sbappendaddr_locked check the space and then call sbappendaddr_nospacecheck_locked, because that would cause the O(n) function m_length to be called twice. sys/kern/uipc_usrreq.c Use sbappendaddr_nospacecheck_locked for SOCK_SEQPACKET sockets, because the receiving sockbuf's size limit is irrelevant. tests/sys/kern/unix_seqpacket_test.c Now that 185813 is fixed, pipe_128k_8k fails intermittently due to 185812. Make it fail every time by adding a usleep after starting the writer thread and before starting the reader thread in test_pipe. That gives the writer time to fill up its send buffer. Also, clear the expected failure message due to 185813. It actually said "185812", but that was a typo. PR: kern/185813 Reviewed by: silence from freebsd-net@ and rwatson@ MFC after: 3 weeks Sponsored by: Spectra Logic Corporation	2014-03-06 20:24:15 +00:00
Dimitry Andric	892620150f	Merge from head up to r262415.	2014-02-23 23:33:11 +00:00
Dimitry Andric	f9d498ad60	On sparc64, VM_KMEM_SIZE_SCALE is not a constant expression, so it cannot be tested in a CTASSERT().	2014-02-23 17:37:24 +00:00
Bryan Drewery	63d8fe5531	Fix style of comment blocks. Reported by: peter Approved by: bapt (mentor, implicit) X-MFC with: r262006	2014-02-22 04:28:49 +00:00
Mark Johnston	9e9ea73715	Print a backtrace if the SDT(9) stub gets called so that there's at least some hope of figuring out how it happened. Suggested by: rstone MFC after: 1 week	2014-02-22 01:41:45 +00:00
Mateusz Guzik	1f9e8f8ad9	Fix a race between kern_proc_{o,}filedesc_out and fdescfree leading to use-after-free. fdescfree proceeds to free file pointers once fd_refcnt reaches 0, but kern_proc_{o,}filedesc_out only checked for hold count. MFC after: 3 days	2014-02-21 22:29:09 +00:00
Bryan Drewery	70f82cfbaf	Fix M_FILEDESC leak in fdgrowtable() introduced in r244510. fdgrowtable() now only reallocates fd_map when necessary. This fixes fdgrowtable() to use the same logic as fdescfree() for when to free the fd_map. The logic in fdescfree() is intended to not free the initial static allocation, however the fd_map grows at a slower rate than the table does. The table is intended to hold 20 fd, but its initial map has many more slots than 20. The slot sizing causes NDSLOTS(20) through NDSLOTS(63) to be 1 which matches NDSLOTS(20), so fdescfree() was assuming that the fd_map was still the initial allocation and not freeing it. This partially reverts r244510 by reintroducing some of the logic it removed in fdgrowtable(). Reviewed by: mjg Approved by: bapt (mentor) MFC after: 2 weeks	2014-02-17 00:00:39 +00:00
Bryan Drewery	88812f91aa	Remove redundant memcpy of fd_ofiles in fdgrowtable() added in r247602 Discussed with: mjg Approved by: bapt (mentor) MFC after: 2 weeks	2014-02-16 23:10:46 +00:00
Adrian Chadd	f44e2a4c0f	Include the CPU id in the per-CPU timer swi thread descriptions. Original patch by: jhb	2014-02-14 23:19:51 +00:00
Sergey Kandaurov	54bb553005	Preserve one character space for a trailing '\0'. Found by: Ivan Klymenko via cppcheck Discussed with: ae MFC after: 1 week	2014-02-14 20:54:03 +00:00
Christian Brueffer	53c4471833	Fix a bug in be_uuid_dec(); it called le16dec() instead of be16dec(), probably due to copy+pasting le_uuid_dec(). PR: 146588 Submitted by: Erwin Rol <erwin at erwinrol.com> Reviewed by: marcel MFC after: 1 week	2014-02-13 22:24:36 +00:00
Ian Lepore	42c8459bed	Rework the EARLY_PRINTF mechanism. Instead of defining a special eprintf() routine, now a platform can provide a pointer to an early_putc() routine which is used instead of cn_putc(). Control can be handed off from early printf support to standard console support by NULLing out the pointer during standard console init. This leverages all the existing error reporting that uses printf calls, such as panic() which can now be usefully employed even in early platform init code (useful at least to those who maintain that code and build kernels with EARLY_PRINTF defined). Reviewed by: imp, eadler	2014-02-12 00:53:38 +00:00
John Baldwin	2db08c03f0	Expose OBJT_MGTDEVICE VM objects used for GEM/TTM with drm2 as an explicit object type. Reviewed by: kib MFC after: 1 week	2014-02-11 21:57:37 +00:00
Gleb Smirnoff	49fef6a202	Create two public UMA_ZONE_PCPU zones: 64 bit sized and pointer sized. Sponsored by: Nginx, Inc.	2014-02-10 19:59:46 +00:00
Gleb Smirnoff	b5c32cf481	Remove identical vnet sysctl handlers, and handle CTLFLAG_VNET in the sysctl_root(). Note: SYSCTL_VNET_* macros can be removed as well. All is needed to virtualize a sysctl oid is set CTLFLAG_VNET on it. But for now keep macros in place to avoid large code churn. Sponsored by: Nginx, Inc.	2014-02-07 13:47:33 +00:00
John Baldwin	e432d5f6a7	Drop the 3rd clause from all 3 clause BSD licenses where I am the sole holder to convert them to 2 clause BSD licenses. MFC after: 1 week	2014-02-05 18:13:27 +00:00
Nathan Whitehorn	a8a9b1c250	ULE works on Book-E since r258002, so remove statements to the contrary.	2014-02-01 20:46:35 +00:00
Jamie Gritton	f15444cc97	Back out r261266 pending security buy-in. r261266: Add a jail parameter, allow.kmem, which lets jailed processes access /dev/kmem and related devices (i.e. grants PRIV_IO and PRIV_KMEM_WRITE). This in conjunction with changing the drm driver's permission check from PRIV_DRIVER to PRIV_KMEM_WRITE will allow a jailed Xorg server.	2014-01-31 17:39:51 +00:00
Konstantin Belousov	49d39308ba	The posix_madvise(3) and posix_fadvise(2) should return error on failure, same as posix_fallocate(2). Noted by: Bob Bishop <rb@gid.co.uk> Discussed with: bde Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-01-30 18:04:39 +00:00
Jamie Gritton	109ca2d5f1	Add a jail parameter, allow.kmem, which lets jailed processes access /dev/kmem and related devices (i.e. grants PRIV_IO and PRIV_KMEM_WRITE). This in conjunction with changing the drm driver's permission check from PRIV_DRIVER to PRIV_KMEM_WRITE will allow a jailed Xorg server. Submitted by: netchild MFC after: 1 week	2014-01-29 13:41:13 +00:00
John-Mark Gurney	3a6cdc4e55	fix spelling of lock_initialized.. jhb approved.. MFC after: 1 week	2014-01-28 17:27:54 +00:00
Christian S.J. Peron	c297f0e497	Allow sigwait(2) in capabilities mode. It's common for multi-threaded processes to create a thread for the purpose of synchronously processing signals. Allow such processes to utilize a capabilities sandbox. Discussed with: rwatson, pjd MFC after: 2 weeks	2014-01-28 01:49:49 +00:00
Robert Millan	f55a7d3058	Accept O_CLOEXEC in shm_open(). Reviewed by: jilles, jhb MFC after: 1 week	2014-01-24 21:05:07 +00:00
Konstantin Belousov	2852de0489	The posix_fallocate(2) syscall should return error number on error, without modifying errno. Reported and tested by: Gennady Proskurin <gpr@mail.ru> Reviewed by: mdf PR: standards/186028 Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-01-23 17:24:26 +00:00
Warner Losh	26fbe13c56	Implement generic support for early printf. Thought I can't find the paper trail now, this patch is similar to one posted for one of the preliminary versions of a new armv6 port. I took them and made them more generic. Option not enabled by default since each board/port has to provide its own eputc, and possibly do other things as well...	2014-01-22 21:20:08 +00:00
John Baldwin	44afcdabf3	Fix a typo.	2014-01-21 03:24:52 +00:00
Neel Natu	84cc772fe5	Bump up WITNESS_COUNT from 1024 to 1536 so there are sufficient entries for WITNESS to actually work. Reviewed by: jhb@	2014-01-20 01:59:35 +00:00
Gleb Smirnoff	761a9a1f8d	Fix comment.	2014-01-17 11:09:05 +00:00
Adrian Chadd	0cfea1c8fc	Implement a kqueue notification path for sendfile. This fires off a kqueue note (of type sendfile) to the configured kqfd when the sendfile transaction has completed and the relevant memory backing the transaction is no longer in use by this transaction. This is analogous to SF_SYNC waiting for the mbufs to complete - except now you don't have to wait. Both SF_SYNC and SF_KQUEUE should work together, even if it doesn't necessarily make any practical sense. This is designed for use by applications which use backing cache/store files (eg Varnish) or POSIX shared memory (not sure anything is using it yet!) to know when a region of memory is free for re-use. Note it doesn't mark the region as free overall - only free from this transaction. The application developer still needs to track which ranges are in the process of being recycled and wait until all pending transactions are completed. TODO: * documentation, as always Sponsored by: Netflix, Inc.	2014-01-17 05:26:55 +00:00
Adrian Chadd	fda21f4d2a	Add in a default initialiser for the EVOPS_SENDFILE kqueue filterops. Sponsored by: Netflix, Inc.	2014-01-17 05:15:44 +00:00
Gleb Smirnoff	d978bbea8a	Simplify wait/nowait code, eventually killing last remnant of historical mbuf(9) allocator flag. Sponsored by: Nginx, Inc.	2014-01-16 13:45:41 +00:00
Gleb Smirnoff	94985f742b	Remove historical macro. Sponsored by: Nginx, Inc.	2014-01-16 13:42:50 +00:00
Bryan Venteicher	fb6c25186b	Add sglist_append_bio(9) to append a struct bio's data to a sglist Reviewed by: jhb, kib (long ago)	2014-01-13 04:41:08 +00:00
Adrian Chadd	a43caef195	Refactor out the common sendfile code from the do_sendfile() and the compat32 sendfile syscall. Sponsored by: Netflix, Inc.	2014-01-09 00:11:14 +00:00
Adrian Chadd	faa9b054a0	Add a compile-time control over the size of KN_HASHSIZE. This is needed for applications that use a lot of non-filedescriptor knotes. MFC after: 1 week Sponsored by: Netflix, Inc.	2014-01-07 01:17:27 +00:00
Mateusz Guzik	231a0fe857	Plug a memory leak in dup2 when both old and new fd have ioctl caps. Reviewed by: pjd MFC after: 3 days	2014-01-03 16:36:55 +00:00
Mateusz Guzik	0918d4b21f	Don't check for fd limits in fdgrowtable_exp. Callers do that already and additional check races with process decreasing limits and can result in not growing the table at all, which is currently not handled. MFC after: 3 days	2014-01-03 16:34:16 +00:00
Warner Losh	9520f95242	Delete echoed doesn't rub out the previous character, so always use <backspace> <space> <backspace> instead. This fixes hitting DELETE instead of BACKSPACE at mountroot> prompt.	2013-12-31 04:40:25 +00:00
Mark Johnston	7dba784986	The arguments to sched:::off-cpu are the thread and associated process of the thread selected to run, not the currently running thread. This fix has already been made for ULE in r252070. PR: 177706 MFC after: 1 week	2013-12-29 17:08:30 +00:00
Konstantin Belousov	fe20047039	Fix accounting for the negative cache entries when reusing v_cache_dd. Having ncneg diverge with the actual length of the ncneg tailq causes NULL dereference. Add assertion that an entry taken from ncneg queue is indeed negative. Reported by and discussed with: avg Sponsored by: The FreeBSD Foundation MFC after: 1 week	2013-12-27 17:09:59 +00:00
Konstantin Belousov	e136eac224	Revert r259200. There are geoms/drivers which do not update bio_completed, only manage bio_resid, e.g. sa(4). Reported and tested by: Manfred Antar <null@pozo.com> Sponsored by: The FreeBSD Foundation MFC after: 1 week	2013-12-27 17:04:51 +00:00
Dimitry Andric	d3fdc73431	In sys/kern/vfs_mountroot.c, remove static function parse_isspace(), which is unused since r214006. MFC after: 3 days	2013-12-25 22:14:42 +00:00
Dimitry Andric	951d674203	In sys/kern/subr_witness.c, remove static function witness_lock_order_key_empty(), which is unused since r181695. MFC after: 3 days	2013-12-25 16:58:14 +00:00
Dimitry Andric	3371b88c7b	In sys/kern/sched_ule.c, remove static function sched_both(), which is unused since r232207. MFC after: 3 days	2013-12-25 16:25:54 +00:00
Ed Schouten	a3da01fc76	Fix copy-pasting of CJK fullwidth characters. They are stored as two separate characters in the vtbuf, so copy-pasting will cause them to be passed to terminal_input_char() twice. Extend terminal_input_char() to explicitly discard characters with TF_CJK_RIGHT set. This causes only the left part to generate input.	2013-12-24 18:42:26 +00:00
Aleksandr Rybalko	7a1a32c4ef	o Add virtual terminal mmap request handler. o Forward termianl framebuffer ioctl to fbd. o Forward terminal mmap request to fbd. o Move inclusion of sys/conf.h to vt.h. Sponsored by: The FreeBSD Foundation	2013-12-23 18:09:10 +00:00
Ed Schouten	a6c26592f1	Extend libteken to support CJK fullwidth characters. Introduce a new formatting bit (TF_CJK_RIGHT) that is set when putting a cell that is the right part of a CJK fullwidth character. This will allow drivers like vt(9) to support fullwidth characters properly. emaste@ has a patch to extend vt(9)'s font handling to increase the number of Unicode -> glyph maps from 2 ({normal,bold)} to 4 ({normal,bold} x {left,right}). This will need to use this formatting bit to determine whether to draw the left or right glyph. Reviewed by: emaste	2013-12-20 21:31:50 +00:00
Gleb Smirnoff	7276319825	Move list of ttys handling from the allocating procedures, to the device creation stage. A device creation can fail, and in that case an entry already on the list will be freed. Sponsored by: Nginx, Inc.	2013-12-20 19:45:51 +00:00
Stefan Eßer	774e8d906f	Fix compilation on 32 bit architectures and use INT64_MAX instead of LONG_MAX for the upper bound check.	2013-12-19 21:35:33 +00:00
Stefan Eßer	53d5cc255d	Fix overflow for timeout values of more than 68 years, which is the maximum covered by sbintime (LONG_MAX seconds). Some programs use timeout values in excess of 1000 years. The conversion to sbintime caused wrap-around on overflow, which resulted in short or negative timeout values. This caused long delays on sockets opened by affected programs (e.g. OpenSSH). Kernels compiled without -fno-strict-overflow were not affected, apparently because the compiler tested the sign of the timeout value before performing the multiplication that lead to overflow. When the -fno-strict-overflow option was added to CFLAGS, this optimization was disabled and the test was performed on the result of the multiplication. Negative products were caught and resulted in EINVAL being returned, but wrap-around to positive values just shortened the timeout value to the residue of the result that could be represented by sbintime. The fix is to cap the timeout values at the maximum that can be represented by sbintime, which is 2^31 - 1 seconds or more than 68 years. After this change, the kernel can be compiled with -fno-strict-overflow with no ill effects. MFC after: 3 days	2013-12-19 09:01:46 +00:00
Mark Johnston	8f7254629f	Invoke the kld_* event handlers from linker_load_file() and linker_unload_file() rather than kern_kldload() and kern_kldunload(). This ensures that the handlers are invoked for files that are loaded/unloaded automatically as dependencies. Previously, they were only invoked for files loaded by a user. As a side effect, the kld_load and kld_unload handlers are now invoked with the kernel linker lock exclusively held. Reported by: avg Reviewed by: jhb MFC after: 2 weeks	2013-12-19 03:48:36 +00:00
Gleb Smirnoff	e1e585a87c	- Rename tty_makedev() into tty_makedevf() and make it capable to fail and return error. - Use make_dev_p() in tty_makedevf() instead of make_dev_cred(). - Always pass MAKEDEV_CHECKNAME flag. - Optionally pass MAKEDEV_REF flag. - Provide macro for compatibility with old API. This fixes races with simultaneous creation and desctruction of ttys, and makes it possible to call tty_makedevf() from device cloners. A race in tty_watermarks() still exist, since the latter drops lock for M_WAITOK allocation. This will be addressed in separate commit. Reviewed by: kib Sponsored by: Nginx, Inc.	2013-12-18 12:50:43 +00:00
Mark Johnston	7159310fa6	The fasttrap fork handler is responsible for removing tracepoints in the child process that were inherited from its parent. However, this should not be done in the case of a vfork, since the fork handler ends up removing the tracepoints from the shared vm space, and userland DTrace probes in the parent will no longer fire as a result. Now the child of a vfork may trigger userland DTrace probes enabled in its parent, so modify the fasttrap probe handler to handle this case and handle the child process in the same way that it would handle the traced process. In particular, if once traces function foo() in a process that vforks, and the child calls foo(), fasttrap will treat this call as having come from the parent. This is the behaviour of the upstream code. While here, add #ifdef guards to some code that isn't present upstream. MFC after: 1 month	2013-12-18 01:41:52 +00:00
Konstantin Belousov	65f05eeb3d	If vn_open_vnode() succeeded in opening the vnode, but subsequent advisory lock cannot be obtained, prevent double-close of the vnode in vn_close() called from the fdrop(), by resetting file' f_ops methods. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2013-12-17 17:31:16 +00:00
Andrey V. Elsukov	da0770bd57	Fix copy/paste typo. MFC after: 1 week	2013-12-17 16:45:19 +00:00
Attilio Rao	e7a9eed7a8	- Assert for not leaking readers rw locks counter on userland return. - Use a correct spin_cnt for KDTRACE_HOOK case in rw read lock. Sponsored by: EMC / Isilon storage division	2013-12-17 13:37:02 +00:00
Adrian Chadd	dc3bdd4ad9	Remove the invariants stuff I copy/paste'd from the mbuf code when setting up the UMA zone. This should (a) be correct(er) and (b) it should build on non-amd64. Pointed out by: glebius	2013-12-17 03:06:21 +00:00
Adrian Chadd	73242a5ee1	Migrate the sendfile_sync struct to use a UMA zone rather than M_TEMP. This allows it to be better tracked as well as being able to leverage UMA for more interesting/useful behaviour at a later date. Sponsored by: Netflix, Inc.	2013-12-16 19:31:23 +00:00
Alexander Motin	e37e08c7bf	Fix periodic per-CPU timers startup on boot. Reported by: neel MFC after: 2 weeks	2013-12-16 13:52:18 +00:00
Marcel Moolenaar	15773775f7	Properly drain the TTY when both revoke(2) and close(2) end up closing the TTY. In such a case, ttydev_close() is called multiple times and each time, t_revokecnt is incremented and cv_broadcast() is called for both the t_outwait and t_inwait condition variables. Let's say revoke(2) comes in first and gets to call tty_drain() from ttydev_leave(). Let's say that the revoke comes from init(8) as the result of running "shutdown -r now". Since shutdown prints various messages to the console before announing that the machine will reboot immediately, let's also say that the output queue is not empty and that tty_drain() has something to do. Let's assume this all happens on a 9600 baud serial console, so it takes a time to drain. The shutdown command will exit(2) and as such will end up closing stdout. Let's say this close will come in second, bump t_revokecnt and call tty_wakeup(). This has tty_wait() return prematurely and the next thing that will happen is that the thread doing revoke(2) will flush the TTY. Since the drain wasn't complete, the flush will effectively drop whatever is left in t_outq. This change takes into account that tty_drain() will return ERESTART due to the fact that t_revokecnt was bumped and in that case simply call tty_drain() again. The thread in question is already performing the close so it can safely finish draining the TTY before destroying the TTY structure. Now all messages from shutdown will be printed on the serial console. Obtained from: Juniper Networks, Inc.	2013-12-16 00:50:14 +00:00
Pawel Jakub Dawidek	007e4f41a7	Regenerate after r259438.	2013-12-15 23:20:26 +00:00
Pawel Jakub Dawidek	82845da3fa	Fix syscalls that can be loaded as kernel modules - they were not given the flag allowing to call them from capability mode sandbox. Noticed by: David Drysdale <drysdale@google.com>	2013-12-15 23:19:42 +00:00
Pawel Jakub Dawidek	61a9fc8fe2	Regenerate after r259436.	2013-12-15 23:15:12 +00:00
Pawel Jakub Dawidek	e1e16d2419	Allow for pselect(2) in capability mode. Noticed by: David Drysdale <drysdale@google.com>	2013-12-15 23:14:27 +00:00
Pawel Jakub Dawidek	73a4fbbb39	Forgot to regenerate after r257736.	2013-12-15 23:12:42 +00:00
Mateusz Guzik	374ce66b66	proc exit: don't take PROC_LOCK while freeing rlimits Code wishing to check rlimits of some process should check whether it is exiting first, which current consumers do. MFC after: 2 weeks	2013-12-15 04:11:43 +00:00
Mateusz Guzik	c2a48c0d1b	rlimit: avoid unnecessary copying of rlimits If refcount is 1 just modify rlimits in place. MFC after: 2 weeks	2013-12-13 20:54:45 +00:00
Mateusz Guzik	3318a9c895	rlimit: add and utilize lim_shared MFC after: 2 weeks	2013-12-13 20:53:31 +00:00
Alexander Motin	1cf78c85c5	Create own free list for each of the first 32 possible allocation sizes. In case of 4K allocation quantum that means for allocations up to 128K. With growth of memory fragmentation these lists may grow to quite a large sizes (tenths and hundreds of thousands items). Having in one list items of different sizes in worst case may require full linear list traversal, that may be very expensive. Having lists for items of single size means that unless user specify some alignment or border requirements (that are very rare cases) first item found on the list should satisfy the request. While running SPEC NFS benchmark on top of ZFS on 24-core machine with 84GB RAM this change reduces CPU time spent in vmem_xalloc() from 8% and lock congestion spinning around it from 20% to invisible levels. And that all is by the cost of just 26 more pointers per vmem instance. If at some point our kernel will start to actively use KVA allocations with odd sizes above 128K, something may need to be done to bigger lists also.	2013-12-11 21:48:04 +00:00
Konstantin Belousov	b4aa4fed2b	Fix detection of EOF in kern_physio(). If bio_length was clipped by the excess code in g_io_check(), bio_resid is also truncated by g_io_deliver(). As result, bufdonebio() assigns truncated value to the buffer b_resid field. Use the residual bio_completed to calculate buffer b_resid from b_bcount in bufdonebio(), instead of bio_resid, calculated from bio_length in g_io_deliver(). The issue is seemingly caused by the code rearrange into g_io_check(), which is not present in stable/10. The change still looks as the useful change to have in 10 nevertheless. Reported by: Stefan Hegnauer <stefan.hegnauer@gmx.ch> Tested by: pho, Stefan Hegnauer <stefan.hegnauer@gmx.ch> Sponsored by: The FreeBSD Foundation MFC after: 1 week	2013-12-10 21:15:18 +00:00
Aleksandr Rybalko	27cf7d04ef	Merge VT(9) project (a.k.a. newcons). Reviewed by: nwhitehorn MFC_to_10_after: re approval Sponsored by: The FreeBSD Foundation	2013-12-05 22:38:53 +00:00
Colin Percival	3b251028e2	Make panic_reboot_wait_time static. Submitted by: jhb	2013-12-05 03:01:41 +00:00
Nathan Whitehorn	fec27435ab	Rename sysctl kern.supported_abis to kern.supported_archs, since it gives the set of MACHINE_ARCH values that can be run.	2013-12-04 16:38:40 +00:00
Pawel Jakub Dawidek	53449c98b7	Break the loop once we know we have the SYF_CAPENABLED flag.	2013-12-04 00:10:37 +00:00
Colin Percival	1cdbb9ed2b	Add a new sysctl / loader tunable kern.panic_reboot_wait_time which defaults to PANIC_REBOOT_WAIT_TIME (a long-existing kernel config setting). Use this now-variable value in place of the defined constant to control how long the system waits after a panic before rebooting.	2013-12-03 21:35:25 +00:00
John Baldwin	5457fa234b	Fix an off-by-one error in r228960. The maximum priority delta provided by SCHED_PRI_TICKS should be SCHED_PRI_RANGE - 1 so that the resulting priority value (before nice adjustment) is between SCHED_PRI_MIN and SCHED_PRI_MAX, inclusive. Submitted by: kib Reported by: pho MFC after: 1 week	2013-12-03 14:50:12 +00:00
Nathan Whitehorn	3cb6654d23	Add new sysctl, kern.supported_abis, containing the list of FreeBSD MACHINE_ARCH values whose binaries this kernel can run. This patch provides a feature requested for implementing pkgng ABI identifiers in a robust way. The list is designed to indicate whether, say, an i386 package can be run on the current system. If kern.supported_abis contains "i386", then the answer is yes. Otherwise, the answer is no. At the moment, this only supports MACHINE_ARCH and MACHINE_ARCH32. As we gain support for more interesting combinations, this needs to become more flexible, possibily through the sysent framework, along with the hw.machine_arch emulation immediately preceding this code in kern_mib.c. Reviewed by: imp MFC after: 3 days	2013-12-02 00:44:36 +00:00
Gleb Smirnoff	ad4804a001	Remove unused variable.	2013-12-01 20:03:00 +00:00
Adrian Chadd	79750e3b36	Migrate the sendfile_sync structure into a public(ish) API in preparation for extending and reusing it. The sendfile_sync wrapper is mostly just a "mbuf transaction" wrapper, used to indicate that the backing store for a group of mbufs has completed. It's only being used by sendfile for now and it's only implementing a sleep/wakeup rendezvous. However, there are other potential signaling paths (kqueue) and other potential uses (socket zero-copy write) where the same mechanism would also be useful. So, with that in mind: * extract the sendfile_sync code out into sf_sync_() methods teach the sf_sync_alloc method about the current config flag - it will eventually know about kqueue. * move the sendfile_sync code out of do_sendfile() - the only thing it now knows about is the sfs pointer. The guts of the sync rendezvous (setup, rendezvous/wait, free) is now done in the syscall wrapper. * .. and teach the 32-bit compat sendfile call the same. This should be a no-op. It's primarily preparation work for teaching the sendfile_sync about kqueue notification. Tested: * Peter Holm's sendfile stress / regression scripts Sponsored by: Netflix, Inc.	2013-12-01 03:53:21 +00:00
Pawel Jakub Dawidek	f2b525e6b9	Make process descriptors standard part of the kernel. rwhod(8) already requires process descriptors to work and having PROCDESC in GENERIC seems not enough, especially that we hope to have more and more consumers in the base. MFC after: 3 days	2013-11-30 15:08:35 +00:00
Peter Wemm	b5019bc45b	jail_v0.ip_number was always in host byte order. This was handled in one of the many layers of indirection and shims through stable/7 in jail_handle_ips(). When it was cleaned up and unified through kern_jail() for 8.x, the byte order swap was lost. This only matters for ancient binaries that call jail(2) themselves internally.	2013-11-28 19:40:33 +00:00
Andriy Gapon	73f82099ea	add taskqueue_drain_all This API has semantics similar to that of taskqueue_drain but acts on all tasks that might be queued or running on a taskqueue. A caller must ensure that no new tasks are being enqueued otherwise this call would be totally meaningless. For example, if the tasks are enqueued by an interrupt filter then its interrupt must be disabled. MFC after: 10 days	2013-11-28 18:56:34 +00:00
Konstantin Belousov	80c3af4e80	Add an kinfo sysctl to retrieve signal trampoline location for the given process. Note that the correctness of the trampoline length returned for ABIs which do not use shared page depends on the correctness of the struct sysvec sv_szsigcodebase member, which will be fixed on as-need basis. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2013-11-26 19:47:09 +00:00
Andriy Gapon	55050ab560	use saner calculations in should_yield This is based on feedback from bde. MFC after: 6 days	2013-11-26 14:00:50 +00:00
Andriy Gapon	a776a1c1c5	sdt: add support for solaris/illumos style DTRACE_PROBE macros The new macros are implemented in terms of SDT_PROBE_DEFINE and SDT_PROBE. Probes defined in this way will appear under SDT provider named "sdt". Parameter types are exposed via SDT_PROBE_ARGTYPE. This is something that illumos does not have by default. This kind of SDT probes is already present in ZFS code, so those probes will now be available if KDTRACE_HOOKS options is enabled. A potential future illumos compatibility enhancement is to encode a provider name as a prefix in a probe name. Reviewed by: markj MFC after: 3 weeks X-MFC after: r258622	2013-11-26 08:49:53 +00:00
Andriy Gapon	d9fae5ab88	dtrace sdt: remove the ugly sname parameter of SDT_PROBE_DEFINE In its stead use the Solaris / illumos approach of emulating '-' (dash) in probe names with '__' (two consecutive underscores). Reviewed by: markj MFC after: 3 weeks	2013-11-26 08:46:27 +00:00
Adrian Chadd	3287361e38	Refactor out the sendfile copyout in order to make vn_sendfile() callable from the kernel. Right now vn_sendfile() can't be called from anything other than a syscall handler _and_ return the number of bytes queued. This simply moves the copyout() to do_sendfile() so that any kernel code can initiate vn_sendfile() outside of a syscall context. Tested: * tiny little sendfile program spitting things out a tcp socket Sponsored by: Netflix, Inc.	2013-11-26 02:02:05 +00:00
Attilio Rao	54366c0bd7	- For kernel compiled only with KDTRACE_HOOKS and not any lock debugging option, unbreak the lock tracing release semantic by embedding calls to LOCKSTAT_PROFILE_RELEASE_LOCK() direclty in the inlined version of the releasing functions for mutex, rwlock and sxlock. Failing to do so skips the lockstat_probe_func invokation for unlocking. - As part of the LOCKSTAT support is inlined in mutex operation, for kernel compiled without lock debugging options, potentially every consumer must be compiled including opt_kdtrace.h. Fix this by moving KDTRACE_HOOKS into opt_global.h and remove the dependency by opt_kdtrace.h for all files, as now only KDTRACE_FRAMES is linked there and it is only used as a compile-time stub [0]. [0] immediately shows some new bug as DTRACE-derived support for debug in sfxge is broken and it was never really tested. As it was not including correctly opt_kdtrace.h before it was never enabled so it was kept broken for a while. Fix this by using a protection stub, leaving sfxge driver authors the responsibility for fixing it appropriately [1]. Sponsored by: EMC / Isilon storage division Discussed with: rstone [0] Reported by: rstone [1] Discussed with: philip	2013-11-25 07:38:45 +00:00
Konstantin Belousov	7e14088d93	Revert back to use int for the page counts. In vn_io_fault(), the i/o is chunked to pieces limited by integer io_hold_cnt tunable, while vm_fault_quick_hold_pages() takes integer max_count as the upper bound. Rearrange the checks to correctly handle overflowing address arithmetic. Submitted by: bde Tested by: pho Discussed with: alc MFC after: 1 week	2013-11-20 08:45:26 +00:00
Andriy Gapon	4c47024ce0	taskqueue_cancel: garbage collect a write-only variable MFC after: 3 days	2013-11-19 18:45:29 +00:00
Jilles Tjoelker	b20a9aa92a	Fix siginfo_t.si_status for wait6/waitid/SIGCHLD. Per POSIX, si_status should contain the value passed to exit() for si_code==CLD_EXITED and the signal number for other si_code. This was incorrect for CLD_EXITED and CLD_DUMPED. This is still not fully POSIX-compliant (Austin group issue #594 says that the full value passed to exit() shall be returned via si_status, not just the low 8 bits) but is sufficient for a si_status-related test in libnih (upstart, Debian/kFreeBSD). PR: kern/184002 Reported by: Dmitrijs Ledkovs Tested by: Dmitrijs Ledkovs	2013-11-17 22:31:23 +00:00
Pawel Jakub Dawidek	ed5848c835	Replace CAP_POLL_EVENT and CAP_POST_EVENT capability rights (which I had a very hard time to fully understand) with much more intuitive rights: CAP_EVENT - when set on descriptor, the descriptor can be monitored with syscalls like select(2), poll(2), kevent(2). CAP_KQUEUE_EVENT - When set on a kqueue descriptor, the kevent(2) syscall can be called on this kqueue to with the eventlist argument set to non-NULL value; in other words the given kqueue descriptor can be used to monitor other descriptors. CAP_KQUEUE_CHANGE - When set on a kqueue descriptor, the kevent(2) syscall can be called on this kqueue to with the changelist argument set to non-NULL value; in other words it allows to modify events monitored with the given kqueue descriptor. Add alias CAP_KQUEUE, which allows for both CAP_KQUEUE_EVENT and CAP_KQUEUE_CHANGE. Add backward compatibility define CAP_POLL_EVENT which is equal to CAP_EVENT. Sponsored by: The FreeBSD Foundation MFC after: 3 days	2013-11-15 19:55:35 +00:00
John Baldwin	ba9593f3bd	Don't allow vfs.lorunningspace or vfs.hirunningspace to be set such that lorunningspace is greater than hirunningspace as the system performs terribly if it is mistuned in this fashion. MFC after: 1 week	2013-11-15 15:29:53 +00:00
Pawel Jakub Dawidek	d673cd358c	Change cap_rights_merge(3) and cap_rights_remove(3) to return pointer to the destination cap_rights_t structure. This already matches manual page. MFC after: 3 days	2013-11-14 22:59:20 +00:00
Pawel Jakub Dawidek	98b74f0d1d	Add a note that this file is compiled as part of the kernel and libc. Requested by: kib MFC after: 3 days	2013-11-14 22:57:07 +00:00
Gleb Smirnoff	77badb18cd	Fix a very bad typo from r248887. Submitted by: art	2013-11-14 09:45:33 +00:00
Sergey Kandaurov	903093ecec	Add VM_LAST, a special last element in enum VM_GUEST and use it in CTASSERT to ensure that vm_guest range is covered by vm_guest_sysctl_names. Suggested by: mjg	2013-11-12 20:13:10 +00:00
Alan Cox	63b9ae948d	Eliminate the gratuitous use of mmap(2) flags from the implementation of kern_shmat(). Use a simpler approach to determine whether to pass VMFS_NO_SPACE or VMFS_OPTIMAL_SPACE to vm_map_find().	2013-11-12 17:46:11 +00:00
Konstantin Belousov	d005ed537c	Avoid overflow for the page counts. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2013-11-12 08:47:58 +00:00
Sergey Kandaurov	40b4cb0f54	Set description string for VM_GUEST_HV (HyperV guest). This fixes fallout from r256425. Reported by: Pavel Timofeev <timp87@gmail com> Tested by: Pavel Timofeev <timp87@gmail com> Reviewed by: Roger Pau Monnц╘ MFC after: 3 days	2013-11-11 16:14:33 +00:00
Konstantin Belousov	1bd7d0b7db	If filesystem declares that it supports shared locking for writes, use shared vnode lock for VOP_PUTPAGES() as well. The only such filesystem in the tree is ZFS, and it uses vnode_pager_generic_putpages(), which performs the pageout with VOP_WRITE(). Reviewed by: alc Discussed with: avg Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2013-11-09 20:36:29 +00:00
Konstantin Belousov	6272798a3f	Both vn_close() and VFS_PROLOGUE() evaluate vp->v_mount twice, without holding the vnode lock; vp->v_mount is checked first for NULL equiality, and then dereferenced if not NULL. If vnode is reclaimed meantime, second dereference would still give NULL. Change VFS_PROLOGUE() to evaluate the mp once, convert MNTK_SHARED_WRITES and MNTK_EXTENDED_SHARED tests into inline functions. Reviewed by: alc Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2013-11-09 20:30:13 +00:00
Hiren Panchasara	16ef0fa833	Fix typo in a comment.	2013-11-08 20:11:15 +00:00
Alan Cox	c70af4875e	As of r257209, all architectures have defined VM_KMEM_SIZE_SCALE. In other words, every architecture is now auto-sizing the kmem arena. This revision changes kmeminit() so that the definition of VM_KMEM_SIZE_SCALE becomes mandatory and the definition of VM_KMEM_SIZE becomes optional. Replace or eliminate all existing definitions of VM_KMEM_SIZE. With auto-sizing enabled, VM_KMEM_SIZE effectively became an alternate spelling for VM_KMEM_SIZE_MIN on most architectures. Use VM_KMEM_SIZE_MIN for clarity. Change kmeminit() so that the effect of defining VM_KMEM_SIZE is similar to that of setting the tunable vm.kmem_size. Whereas the macros VM_KMEM_SIZE_{MAX,MIN,SCALE} have had the same effect as the tunables vm.kmem_size_{max,min,scale}, the effects of VM_KMEM_SIZE and vm.kmem_size have been distinct. In particular, whereas VM_KMEM_SIZE was overridden by VM_KMEM_SIZE_{MAX,MIN,SCALE} and vm.kmem_size_{max,min,scale}, vm.kmem_size was not. Remedy this inconsistency. Now, VM_KMEM_SIZE can be used to set the size of the kmem arena at compile-time without that value being overridden by auto-sizing. Update the nearby comments to reflect the kmem submap being replaced by the kmem arena. Stop duplicating the auto-sizing formula in every machine- dependent vmparam.h and place it in kmeminit() where auto-sizing takes place. Reviewed by: kib (an earlier version) Sponsored by: EMC / Isilon Storage Division	2013-11-08 16:25:00 +00:00
Pawel Jakub Dawidek	e7b1ce0769	- Remove mac_get_fd/mac_set_fd - those are not syscalls. The __mac_get_fd() and __mac_set_fd() syscalls are listed earlier. - Correct typo in syscall name. It should be sched_rr_get_interval, not sched_rr_getinterval. Submitted by: David Drysdale <drysdale@google.com> MFC after: 3 days	2013-11-06 07:46:10 +00:00
Jilles Tjoelker	1947c8a6d1	kqueue: Change error for kqueues rlimit from EMFILE to ENOMEM and document this error condition in the kqueue(2) manual page. Discussed with: kib	2013-11-03 23:06:24 +00:00
Alexander Motin	915f2b7c89	Make getenv_() functions and respectively TUNABLE__FETCH() macros not allocate memory and so not require sleepable environment. getenv() has already used on-stack temporary storage, so just use it more rationally. getenv_string() receives buffer as argument, so don't need another one.	2013-11-01 10:32:33 +00:00
Gleb Smirnoff	0d168b8d36	prison_check_ip4() can take const arguments.	2013-11-01 10:01:57 +00:00
Maksim Yevmenkin	f7a3a2a57c	Rate limit (to once per minute) "Listen queue overflow" message in sonewconn(). Reviewed by: scottl, lstewart Obtained from: Netflix, Inc MFC after: 2 weeks	2013-10-31 20:33:21 +00:00
Konstantin Belousov	80938e75f0	Add bus_dmamap_load_ma() function to load map with the array of vm_pages. Provide trivial implementation which forwards the load to _bus_dmamap_load_phys() page by page. Right now all architectures use bus_dmamap_load_ma_triv(). Tested by: pho (as part of the functional patch) Sponsored by: The FreeBSD Foundation MFC after: 1 month	2013-10-27 21:39:16 +00:00
Konstantin Belousov	46038d7fa1	Fix typo. MFC after: 3 days	2013-10-27 18:52:09 +00:00
Konstantin Belousov	c2a445910d	When reentering kdb, typically due to a bug causing trap or assert in the code executed in the context of debugger, do not be ashamed to inform loudly about the re-entry. Also, print the backtrace before obliterating current stack with longjmp, allowing the operator to see a place which caused the bug. The change should make it less mysterious debugging the ddb itself. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2013-10-27 16:20:52 +00:00
Gleb Smirnoff	76039bc84f	The r48589 promised to remove implicit inclusion of if_var.h soon. Prepare to this event, adding if_var.h to files that do need it. Also, include all includes that now are included due to implicit pollution via if_var.h Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-26 17:58:36 +00:00
Mark Johnston	2e1ae0b3e9	Redefine the io provider using the SDT(9) macros instead of doing everything manually. This change has no functional impact. Discussed with: gnn	2013-10-24 02:39:07 +00:00
Brooks Davis	30f8de5ad0	MFP4: Change 221669 by bz@bz_zenith on 2013/02/01 12:26:04 Run the initialization for polling earlier along with INTRs so that we can put network interface into polling mode by default if DEVICE_POLLING is compiled in and no interrupts are available. MFC after: 3 days Sponsored by: DARPA/AFRL	2013-10-22 22:03:01 +00:00
Alexander Motin	9e8bd2acf2	Remove global device lock acquisition from dev_relthread(), replacing it with atomics on per-device data.	2013-10-22 10:40:26 +00:00
Alexander Motin	40ea77a036	Merge GEOM direct dispatch changes from the projects/camlock branch. When safety requirements are met, it allows to avoid passing I/O requests to GEOM g_up/g_down thread, executing them directly in the caller context. That allows to avoid CPU bottlenecks in g_up/g_down threads, plus avoid several context switches per I/O. The defined now safety requirements are: - caller should not hold any locks and should be reenterable; - callee should not depend on GEOM dual-threaded concurency semantics; - on the way down, if request is unmapped while callee doesn't support it, the context should be sleepable; - kernel thread stack usage should be below 50%. To keep compatibility with GEOM classes not meeting above requirements new provider and consumer flags added: - G_CF_DIRECT_SEND -- consumer code meets caller requirements (request); - G_CF_DIRECT_RECEIVE -- consumer code meets callee requirements (done); - G_PF_DIRECT_SEND -- provider code meets caller requirements (done); - G_PF_DIRECT_RECEIVE -- provider code meets callee requirements (request). Capable GEOM class can set them, allowing direct dispatch in cases where it is safe. If any of requirements are not met, request is queued to g_up or g_down thread same as before. Such GEOM classes were reviewed and updated to support direct dispatch: CONCAT, DEV, DISK, GATE, MD, MIRROR, MULTIPATH, NOP, PART, RAID, STRIPE, VFS, ZERO, ZFS::VDEV, ZFS::ZVOL, all classes based on g_slice KPI (LABEL, MAP, FLASHMAP, etc). To declare direct completion capability disk(9) KPI got new flag equivalent to G_PF_DIRECT_SEND -- DISKFLAG_DIRECT_COMPLETION. da(4) and ada(4) disk drivers got it set now thanks to earlier CAM locking work. This change more then twice increases peak block storage performance on systems with manu CPUs, together with earlier CAM locking changes reaching more then 1 million IOPS (512 byte raw reads from 16 SATA SSDs on 4 HBAs to 256 user-level threads). Sponsored by: iXsystems, Inc. MFC after: 2 months	2013-10-22 08:22:19 +00:00
Alexander Motin	18093155a3	Add comments that taskqueue_enqueue_locked() returns without the lock.	2013-10-21 21:16:50 +00:00
Konstantin Belousov	9110db818a	Add a resource limit for the total number of kqueues available to the user. Kqueue now saves the ucred of the allocating thread, to correctly decrement the counter on close. Under some specific and not real-world use scenario for kqueue, it is possible for the kqueues to consume memory proportional to the square of the number of the filedescriptors available to the process. Limit allows administrator to prevent the abuse. This is kernel-mode side of the change, with the user-mode enabling commit following. Reported and tested by: pho Discussed with: jmg Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2013-10-21 16:44:53 +00:00
Konstantin Belousov	44f3b9c787	Print more useful information about the transfer that trigger the assertion. Other data is available with ddb command 'show pginfo'. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2013-10-21 16:17:46 +00:00
Alexander Motin	8160afdab1	MFprojects/camlock r256619: Restore BIO_UNMAPPED and BIO_TRANSIENT_MAPPING in biodonne() when unmapping temporary mapped buffer. That fixes double unmap if biodone() called twice for the same BIO (but with different done methods). Move mapping removal before calling bio_done() method. I believe that it is very wrong to do anything to BIO after reporting completion. kib@ thinks it was done for some forgotten now case when bio_done() method needed mapped buffer. But 1) if BIO was sent as unmapped, then IMO done() should be called in the same way; 2) IMO there is no guatantee that buffer will be mapped at this point at all, for example, if all underlying stack supports unmapped I/O, so bio_done() handler can not expect that.	2013-10-21 06:44:55 +00:00
Gleb Smirnoff	a9355dbe82	Revert r256587. Requested by: zec	2013-10-18 11:26:40 +00:00
Ed Maste	ec7935bfef	Error out on failure to open specified config file	2013-10-16 17:03:46 +00:00
Alexander Motin	77a30af6f8	MFprojects/camlock r256370: - Take BIO lock in biodone() only when there is no completion callback set and so we should wake up thread waiting in biowait(). - Remove msleep() timeout from biowait(). It was added 11 years ago, when there was no locks used, and it should not be needed any more.	2013-10-16 09:56:40 +00:00
Alexander Motin	6d545f4c8d	MFprojects/camlock r254763: Move tq_enqueue() call out of the queue lock for known handlers (actually I have found no others in the base system). This reduces queue lock hold time and congestion spinning under active multithreaded enqueuing.	2013-10-16 09:52:59 +00:00
Alexander Motin	1d1e92f102	MFprojects/camlock r254685: Remove TQ_FLAGS_PENDING flag, softly duplicating queue emptiness status.	2013-10-16 09:48:23 +00:00
Alexander Motin	e431d66c04	MFprojects/camlock r254905: Introduce new function devstat_end_transaction_bio_bt(), adding new argument to specify present time. Use this function to move binuptime() out of lock, substantially reducing lock congestion when slow timecounter is used.	2013-10-16 09:12:40 +00:00
Gleb Smirnoff	348298b1a3	For VIMAGE kernels store vnet in the struct task, and set vnet context during task processing. Reported & tested by: mm	2013-10-16 05:02:01 +00:00
Konstantin Belousov	eda6009c04	Add a sysctl kern.disallow_high_osrel which disables executing the images compiled on the world with higher major version number than the high version number of the booted kernel. Default to disable. Sponsored by: The FreeBSD Foundation Discussed with: bapt MFC after: 1 week	2013-10-15 06:38:40 +00:00
Konstantin Belousov	cd4dd444dd	By default, allow up to SSIZE_MAX i/o for non-devfs files. Sponsored by: The FreeBSD Foundation Reminded by: Dmitry Sivachenko <trtrmitya@gmail.com> MFC after: 1 month X-MFC-note: stable/10 only	2013-10-15 06:35:22 +00:00
Konstantin Belousov	bf3e483b44	Similar to debug.iosize_max_clamp sysctl, introduce devfs_iosize_max_clamp sysctl, which allows/disables SSIZE_MAX-sized i/o requests on the devfs files. Sponsored by: The FreeBSD Foundation Reminded by: Dmitry Sivachenko <trtrmitya@gmail.com> MFC after: 1 week	2013-10-15 06:33:10 +00:00
Mark Murray	cc4d059c03	Merge from project branch. Uninteresting commits are trimmed. Refactor of /dev/random device. Main points include: * Userland seeding is no longer used. This auto-seeds at boot time on PC/Desktop setups; this may need some tweeking and intelligence from those folks setting up embedded boxes, but the work is believed to be minimal. * An entropy cache is written to /entropy (even during installation) and the kernel uses this at next boot. * An entropy file written to /boot/entropy can be loaded by loader(8) * Hardware sources such as rdrand are fed into Yarrow, and are no longer available raw. ------------------------------------------------------------------------ r256240 \| des \| 2013-10-09 21:14:16 +0100 (Wed, 09 Oct 2013) \| 4 lines Add a RANDOM_RWFILE option and hide the entropy cache code behind it. Rename YARROW_RNG and FORTUNA_RNG to RANDOM_YARROW and RANDOM_FORTUNA. Add the RANDOM_* options to LINT. ------------------------------------------------------------------------ r256239 \| des \| 2013-10-09 21:12:59 +0100 (Wed, 09 Oct 2013) \| 2 lines Define RANDOM_PURE_RNDTEST for rndtest(4). ------------------------------------------------------------------------ r256204 \| des \| 2013-10-09 18:51:38 +0100 (Wed, 09 Oct 2013) \| 2 lines staticize struct random_hardware_source ------------------------------------------------------------------------ r256203 \| markm \| 2013-10-09 18:50:36 +0100 (Wed, 09 Oct 2013) \| 2 lines Wrap some policy-rich code in 'if NOTYET' until we can thresh out what it really needs to do. ------------------------------------------------------------------------ r256184 \| des \| 2013-10-09 10:13:12 +0100 (Wed, 09 Oct 2013) \| 2 lines Re-add /dev/urandom for compatibility purposes. ------------------------------------------------------------------------ r256182 \| des \| 2013-10-09 10:11:14 +0100 (Wed, 09 Oct 2013) \| 3 lines Add missing include guards and move the existing ones out of the implementation namespace. ------------------------------------------------------------------------ r256168 \| markm \| 2013-10-08 23:14:07 +0100 (Tue, 08 Oct 2013) \| 10 lines Fix some just-noticed problems: o Allow this to work with "nodevice random" by fixing where the MALLOC pool is defined. o Fix the explicit reseed code. This was correct as submitted, but in the project branch doesn't need to set the "seeded" bit as this is done correctly in the "unblock" function. o Remove some debug ifdeffing. o Adjust comments. ------------------------------------------------------------------------ r256159 \| markm \| 2013-10-08 19:48:11 +0100 (Tue, 08 Oct 2013) \| 6 lines Time to eat crow for me. I replaced the sx_* locks that Arthur used with regular mutexes; this turned out the be the wrong thing to do as the locks need to be sleepable. Revert this folly. # Submitted by: Arthur Mesh <arthurmesh@gmail.com> (In original diff) ------------------------------------------------------------------------ r256138 \| des \| 2013-10-08 12:05:26 +0100 (Tue, 08 Oct 2013) \| 10 lines Add YARROW_RNG and FORTUNA_RNG to sys/conf/options. Add a SYSINIT that forces a reseed during proc0 setup, which happens fairly late in the boot process. Add a RANDOM_DEBUG option which enables some debugging printf()s. Add a new RANDOM_ATTACH entropy source which harvests entropy from the get_cyclecount() delta across each call to a device attach method. ------------------------------------------------------------------------ r256135 \| markm \| 2013-10-08 07:54:52 +0100 (Tue, 08 Oct 2013) \| 8 lines Debugging. My attempt at EVENTHANDLER(multiuser) was a failure; use EVENTHANDLER(mountroot) instead. This means we can't count on /var being present, so something will need to be done about harvesting /var/db/entropy/... . Some policy now needs to be sorted out, and a pre-sync cache needs to be written, but apart from that we are now ready to go. Over to review. ------------------------------------------------------------------------ r256094 \| markm \| 2013-10-06 23:45:02 +0100 (Sun, 06 Oct 2013) \| 8 lines Snapshot. Looking pretty good; this mostly works now. New code includes: * Read cached entropy at startup, both from files and from loader(8) preloaded entropy. Failures are soft, but announced. Untested. * Use EVENTHANDLER to do above just before we go multiuser. Untested. ------------------------------------------------------------------------ r256088 \| markm \| 2013-10-06 14:01:42 +0100 (Sun, 06 Oct 2013) \| 2 lines Fix up the man page for random(4). This mainly removes no-longer-relevant details about HW RNGs, reseeding explicitly and user-supplied entropy. ------------------------------------------------------------------------ r256087 \| markm \| 2013-10-06 13:43:42 +0100 (Sun, 06 Oct 2013) \| 6 lines As userland writing to /dev/random is no more, remove the "better than nothing" bootstrap mode. Add SWI harvesting to the mix. My box seeds Yarrow by itself in a few seconds! YMMV; more to follow. ------------------------------------------------------------------------ r256086 \| markm \| 2013-10-06 13:40:32 +0100 (Sun, 06 Oct 2013) \| 11 lines Debug run. This now works, except that the "live" sources haven't been tested. With all sources turned on, this unlocks itself in a couple of seconds! That is no my box, and there is no guarantee that this will be the case everywhere. * Cut debug prints. * Use the same locks/mutexes all the way through. * Be a tad more conservative about entropy estimates. ------------------------------------------------------------------------ r256084 \| markm \| 2013-10-06 13:35:29 +0100 (Sun, 06 Oct 2013) \| 5 lines Don't use the "real" assembler mnemonics; older compilers may not understand them (like when building CURRENT on 9.x). # Submitted by: Konstantin Belousov <kostikbel@gmail.com> ------------------------------------------------------------------------ r256081 \| markm \| 2013-10-06 10:55:28 +0100 (Sun, 06 Oct 2013) \| 12 lines SNAPSHOT. Simplify the malloc pools; We only need one for this device. Simplify the harvest queue. Marginally improve the entropy pool hashing, making it a bit faster in the process. Connect up the hardware "live" source harvesting. This is simplistic for now, and will need to be made rate-adaptive. All of the above passes a compile test but needs to be debugged. ------------------------------------------------------------------------ r256042 \| markm \| 2013-10-04 07:55:06 +0100 (Fri, 04 Oct 2013) \| 25 lines Snapshot. This passes the build test, but has not yet been finished or debugged. Contains: * Refactor the hardware RNG CPU instruction sources to feed into the software mixer. This is unfinished. The actual harvesting needs to be sorted out. Modified by me (see below). * Remove 'frac' parameter from random_harvest(). This was never used and adds extra code for no good reason. * Remove device write entropy harvesting. This provided a weak attack vector, was not very good at bootstrapping the device. To follow will be a replacement explicit reseed knob. * Separate out all the RANDOM_PURE sources into separate harvest entities. This adds some secuity in the case where more than one is present. * Review all the code and fix anything obviously messy or inconsistent. Address som review concerns while I'm here, like rename the pseudo-rng to 'dummy'. # Submitted by: Arthur Mesh <arthurmesh@gmail.com> (the first item) ------------------------------------------------------------------------ r255319 \| markm \| 2013-09-06 18:51:52 +0100 (Fri, 06 Sep 2013) \| 4 lines Yarrow wants entropy estimations to be conservative; the usual idea is that if you are certain you have N bits of entropy, you declare N/2. ------------------------------------------------------------------------ r255075 \| markm \| 2013-08-30 18:47:53 +0100 (Fri, 30 Aug 2013) \| 4 lines Remove short-lived idea; thread to harvest (eg) RDRAND enropy into the usual harvest queues. It was a nifty idea, but too heavyweight. # Submitted by: Arthur Mesh <arthurmesh@gmail.com> ------------------------------------------------------------------------ r255071 \| markm \| 2013-08-30 12:42:57 +0100 (Fri, 30 Aug 2013) \| 4 lines Separate out the Software RNG entropy harvesting queue and thread into its own files. # Submitted by: Arthur Mesh <arthurmesh@gmail.com> ------------------------------------------------------------------------ r254934 \| markm \| 2013-08-26 20:07:03 +0100 (Mon, 26 Aug 2013) \| 2 lines Remove the short-lived namei experiment. ------------------------------------------------------------------------ r254928 \| markm \| 2013-08-26 19:35:21 +0100 (Mon, 26 Aug 2013) \| 2 lines Snapshot; Do some running repairs on entropy harvesting. More needs to follow. ------------------------------------------------------------------------ r254927 \| markm \| 2013-08-26 19:29:51 +0100 (Mon, 26 Aug 2013) \| 15 lines Snapshot of current work; 1) Clean up namespace; only use "Yarrow" where it is Yarrow-specific or close enough to the Yarrow algorithm. For the rest use a neutral name. 2) Tidy up headers; put private stuff in private places. More could be done here. 3) Streamline the hashing/encryption; no need for a 256-bit counter; 128 bits will last for long enough. There are bits of debug code lying around; these will be removed at a later stage. ------------------------------------------------------------------------ r254784 \| markm \| 2013-08-24 14:54:56 +0100 (Sat, 24 Aug 2013) \| 39 lines 1) example (partially humorous random_adaptor, that I call "EXAMPLE") * It's not meant to be used in a real system, it's there to show how the basics of how to create interfaces for random_adaptors. Perhaps it should belong in a manual page 2) Move probe.c's functionality in to random_adaptors.c * rename random_ident_hardware() to random_adaptor_choose() 3) Introduce a new way to choose (or select) random_adaptors via tunable "rngs_want" It's a list of comma separated names of adaptors, ordered by preferences. I.e.: rngs_want="yarrow,rdrand" Such setting would cause yarrow to be preferred to rdrand. If neither of them are available (or registered), then system will default to something reasonable (currently yarrow). If yarrow is not present, then we fall back to the adaptor that's first on the list of registered adaptors. 4) Introduce a way where RNGs can play a role of entropy source. This is mostly useful for HW rngs. The way I envision this is that every HW RNG will use this functionality by default. Functionality to disable this is also present. I have an example of how to use this in random_adaptor_example.c (see modload event, and init function) 5) fix kern.random.adaptors from kern.random.adaptors: yarrowpanicblock to kern.random.adaptors: yarrow,panic,block 6) add kern.random.active_adaptor to indicate currently selected adaptor: root@freebsd04:~ # sysctl kern.random.active_adaptor kern.random.active_adaptor: yarrow # Submitted by: Arthur Mesh <arthurmesh@gmail.com> Submitted by: Dag-Erling Smørgrav <des@FreeBSD.org>, Arthur Mesh <arthurmesh@gmail.com> Reviewed by: des@FreeBSD.org Approved by: re (delphij) Approved by: secteam (des,delphij)	2013-10-12 12:57:57 +00:00
John Baldwin	d251e7006b	Ignore attempts to set the nmbcluster sysctls to their current value rather than failing with an error. Reviewed by: andre Approved by: re (delphij) MFC after: 2 weeks	2013-10-10 16:11:34 +00:00
Mark Murray	72acff0f07	MFC - tracking commit.	2013-10-09 21:03:34 +00:00
Konstantin Belousov	acb9d2c7f0	The device vnodes are often unlocked when bread() or bwrite() is called. This probably should be fixed eventually, but for now it is not needed to try to flush such vnodes from the buffer allocation context. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Approved by: re (gjb)	2013-10-09 18:45:01 +00:00
Konstantin Belousov	2c1531e746	Do not flush buffers when the v_object of the passed vnode does not really belong to it. Such vnodes, with the pointers to other vnodes v_objects, are typically instantiated by the bypass filesystems. Invalidating mappings of other vnode pages and the pages is wrong, since reclamation of the upper vnode does not imply that lower vnode is reclaimed too. One of the consequences of the improper reclamation was destruction of the wired mappings of the lower vnode pages, triggering miscellaneous assertions in the VM system. Reported by: John Marshall <john.marshall@riverwillow.com.au> Tested by: John Marshall <john.marshall@riverwillow.com.au>, pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Approved by: re (gjb)	2013-10-09 18:43:29 +00:00
Konstantin Belousov	1744fe5048	When growing the file descriptor table, new larger memory chunk is allocated, but the old table is kept around to handle the case of threads still performing unlocked accesses to it. Grow the table exponentially instead of increasing its size by sizeof(long) * 8 chunks when overflowing. This mode significantly reduces the total memory use for the processes consuming large numbers of the file descriptors which open them one by one. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Approved by: re (marius)	2013-10-09 18:41:35 +00:00
Konstantin Belousov	3625bde45d	Reduce code duplication, introduce the getmaxfd() helper to calculate the max filedescriptor index. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Approved by: re (marius)	2013-10-09 18:39:44 +00:00
Mark Murray	371cbaafa8	MFC - tracking commit	2013-10-09 17:41:47 +00:00
Gleb Smirnoff	1d2df300e9	- Substitute sbdrop_internal() with sbcut_internal(). The latter doesn't free mbufs, but return chain of free mbufs to a caller. Caller can either reuse them or return to allocator in a batch manner. - Implement sbdrop()/sbdrop_locked() as a wrapper around sbcut_internal(). - Expose sbcut_locked() for outside usage. Sponsored by: Netflix Sponsored by: Nginx, Inc. Approved by: re (marius)	2013-10-09 11:57:53 +00:00
Dag-Erling Smørgrav	db3fcaf970	Add YARROW_RNG and FORTUNA_RNG to sys/conf/options. Add a SYSINIT that forces a reseed during proc0 setup, which happens fairly late in the boot process. Add a RANDOM_DEBUG option which enables some debugging printf()s. Add a new RANDOM_ATTACH entropy source which harvests entropy from the get_cyclecount() delta across each call to a device attach method.	2013-10-08 11:05:26 +00:00
Mark Murray	6e818c871f	Debugging. My attempt at EVENTHANDLER(multiuser) was a failure; use EVENTHANDLER(mountroot) instead. This means we can't count on /var being present, so something will need to be done about harvesting /var/db/entropy/... . Some policy now needs to be sorted out, and a pre-sync cache needs to be written, but apart from that we are now ready to go. Over to review.	2013-10-08 06:54:52 +00:00
Mark Murray	1a3c1f06dd	Snapshot. Looking pretty good; this mostly works now. New code includes: * Read cached entropy at startup, both from files and from loader(8) preloaded entropy. Failures are soft, but announced. Untested. * Use EVENTHANDLER to do above just before we go multiuser. Untested.	2013-10-06 22:45:02 +00:00
Mark Murray	12babbf219	MFC - tracking commit	2013-10-06 09:37:57 +00:00
Konstantin Belousov	505cdd82bf	Remove the uipc_cow.c file, which is not used since the zero copy sockets removal. Noted by: alc Sponsored by: The FreeBSD Foundation Approved by: re (delphij)	2013-10-06 06:57:28 +00:00
Alan Cox	61083fcc61	Tidy up kmeminit(): Since r245575, 'nmbclusters' is calculated after kmeminit() runs, so it contributes nothing to 'vm_kmem_size'; update a comment to reflect that r254025 replaced the kmem submap with the kmem arena. Reviewed by: kib Approved by: re (gjb) Sponsored by: EMC / Isilon Storage Division	2013-10-05 18:53:03 +00:00
Mark Murray	3c59587daa	MFC - tracking commit.	2013-10-04 07:00:59 +00:00
Mark Murray	f02e47dc1e	Snapshot. This passes the build test, but has not yet been finished or debugged. Contains: * Refactor the hardware RNG CPU instruction sources to feed into the software mixer. This is unfinished. The actual harvesting needs to be sorted out. Modified by me (see below). * Remove 'frac' parameter from random_harvest(). This was never used and adds extra code for no good reason. * Remove device write entropy harvesting. This provided a weak attack vector, was not very good at bootstrapping the device. To follow will be a replacement explicit reseed knob. * Separate out all the RANDOM_PURE sources into separate harvest entities. This adds some secuity in the case where more than one is present. * Review all the code and fix anything obviously messy or inconsistent. Address som review concerns while I'm here, like rename the pseudo-rng to 'dummy'. Submitted by: Arthur Mesh <arthurmesh@gmail.com> (the first item)	2013-10-04 06:55:06 +00:00
Sean Bruno	d3baefa809	Change len checks for fstypelen and fspathlen to be against absolute len not strlen as they are not strings. Discovered by GSOC student, Mike Ma <mikemandarine@gmail.com> during his fuse.glusterfs port to FreeBSD. Final patch from mckusick@ Submitted by: mckusick@ Approved by: re (hrs) MFC after: 2 weeks	2013-10-03 22:52:03 +00:00
Konstantin Belousov	432e79fc33	When helping the bufdaemon from the buffer allocation context, there is no sense to walk the whole dirty buffer queue. We are only interested in, and can operate on, the buffers owned by the current vnode [1]. Instead of calling generic queue flush routine, do VOP_FSYNC() if possible. Holding the dirty buffer queue lock in the bufdaemon, without dropping it, can cause starvation of buffer writes from other threads. This is esp. easy to reproduce on the big memory machines, where large files are written, causing almost all dirty buffers accumulating in several big files, which vnodes are locked by writers. Bufdaemon cannot flush any buffer, but is iterating over the whole dirty queue continuously. Since dirty queue mutex is not dropped, bufdone() in g_up thread is starved, usually deadlocking the machine [2]. Mitigate this by dropping the queue lock after the vnode is locked, allowing other queue lock contenders to make a progress. Discussed with: Jeff [1] Reported by: pho [2] Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Approved by: re (hrs)	2013-10-02 06:00:34 +00:00
Konstantin Belousov	d6498b153e	When printing the vnode information from ddb, print the lengths of the dirty and clean buffer queues. Sponsored by: The FreeBSD Foundation MFC after: 1 week Approved by: re (gjb)	2013-10-01 20:18:33 +00:00
Konstantin Belousov	fe39412e99	For vunref(), try to upgrade the vnode lock if the function was called with the vnode shared-locked. If upgrade succeeded, the inactivation can be done immediately, instead of being postponed. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Approved by: re (glebius)	2013-09-29 18:07:14 +00:00
Konstantin Belousov	ac34145005	Reimplement r255797 using LK_TRYUPGRADE. The r255797 was: Increase the chance of the buffer write from the bufdaemon helper context to succeed. If the locked vnode which owns the buffer to be written is shared locked, try the non-blocking upgrade of the lock to exclusive. PR: kern/178997 Reported and tested by: Klaus Weber <fbsd-bugs-2013-1@unix-admin.de> Sponsored by: The FreeBSD Foundation MFC after: 1 week Approved by: re (glebius)	2013-09-29 18:04:57 +00:00
Konstantin Belousov	7c6fe80353	Add LK_TRYUPGRADE operation for lockmgr(9), which attempts to atomically upgrade shared lock to exclusive. On failure, error is returned and lock is not dropped in the process. Tested by: pho (previous version) No objections from: attilio Sponsored by: The FreeBSD Foundation MFC after: 1 week Approved by: re (glebius)	2013-09-29 18:02:23 +00:00
John-Mark Gurney	da9442ef43	it must be the last member, not might... Reviewed by: attilio Approved by: re (delphij, gjb)	2013-09-26 17:55:04 +00:00
Konstantin Belousov	9d2abcd01a	Do not allow negative timeouts for kqueue timers, check for the negative timeout both before and after the conversion to sbintime_t. For periodic kqueue timer, convert zero timeout into 1ms, to avoid interrupt storm on fast event timers. Reported and tested by: pho Discussed with: mav Reviewed by: davide Sponsored by: The FreeBSD Foundation Approved by: re (marius)	2013-09-26 13:17:31 +00:00
Konstantin Belousov	27884e3bd1	Acquire a hold reference on the vnode when a knote is instantiated. Otherwise, knote keeps a pointer to a vnode which could become invalid any time. Reported by: many Tested by: Patrick Lamaiziere <patfbsd@davenulle.org> Discussed with: jmg Sponsored by: The FreeBSD Foundation MFC after: 1 week Approved by: re (marius)	2013-09-26 13:14:51 +00:00
Davide Italiano	1b0c144fc2	Make the callout arithmetic more robust adding checks for overflow. Without these, if the timeout value passed is "large enough", the value of the sum of it and other factors (e.g. current time as returned by sbinuptime() or 'precision' argument) might result in a negative number. This negative number is then passed to eventtimers(4), which causes et_start() routine to load et_min_period into eventtimer, making the CPU where the thread is stuck forever in timer interrupt handler routine. This is now avoided rounding to INT64_MAX the timeout period in case of overflow. Reported by: kib, pho Discussed with: kib, mav Tested by: pho (stress2 suite, kevent7.sh scenario) Approved by: re (kib)	2013-09-26 10:06:50 +00:00
Attilio Rao	57a9eeb4ed	Avoid memory accesses reordering which can result in fget_unlocked() seeing a stale fd_ofiles table once fd_nfiles is already updated, resulting in OOB accesses. Approved by: re (kib) Sponsored by: EMC / Isilon storage division Reported and tested by: pho Reviewed by: benno	2013-09-25 13:37:52 +00:00

... 2 3 4 5 6 ...

13798 Commits