freebsd-dev

Author	SHA1	Message	Date
Alexander Motin	cdd09fea28	Add vmem locking to r281026. While races there are not fatal, they cause result underestimation, that cause unneeded ARC reclaims. MFC after: 1 month	2015-04-05 14:17:26 +00:00
Konstantin Belousov	4cfc037c30	Restore proper error from oshmctl(2), used by COMPAT_43, when the segment cannot be found. Broken by r280323. Sponsored by: The FreeBSD Foundation MFC after: 3 days	2015-04-04 23:56:38 +00:00
Jilles Tjoelker	78d75aba77	utimensat: Correct Capsicum required capability rights.	2015-04-04 21:47:54 +00:00
Konstantin Belousov	0122d251bf	Remove useless initialization. Sponsored by: The FreeBSD Foundation MFC after: 3 days	2015-04-04 08:44:20 +00:00
Alexander Motin	2e9ccb32a1	Make ZFS ARC track both KVA usage and fragmentation. Even on Illumos, with its much larger KVA, ZFS ARC steps back if KVA usage reaches certain threshold (3/4 on i386 or 16/17 otherwise). FreeBSD has even less KVA, but had no such limit on archs with direct map as amd64. As result, on machines with a lot of RAM, during load with very small user- space memory pressure, such as `zfs send`, it was possible to reach state, when there is enough both physical RAM and KVA (I've seen up to 25-30%), but no continuous KVA range to allocate even single 128KB I/O request. Address this situation from two sides: - restore KVA usage limitations in a way the most close to Illumos; - introduce new requirement for KVA fragmentation, specifying that we should have at least one sequential KVA range of zfs_max_recordsize bytes. Experiments show that first limitation done alone is not sufficient. On machine with 64GB of RAM it is sometimes needed to drop up to half of ARC size to get at leats one 1MB KVA chunk. Statically limiting ARC to half of KVA/RAM is too strict, so second limitation makes it to work in cycles: accumulate trash up to certain critical mass, do massive spring-cleaning, and then start littering again. :) MFC after: 1 month	2015-04-03 14:45:48 +00:00
Konstantin Belousov	2832cd544f	Speed up symbol lookup for the amd64 kernel modules. Amd64 uses relocatable object files as the modules format. It is good WRT not having unneeded overhead for PIC code, in particular, due to absence of useless GOT and PLT. But the cost is that the module linking process cannot use hash to speed up the symbol lookup, and that each reference to the symbol requiring a relocation, instead of single-place relocation in GOT. Cache the successfull symbol lookup results in the module symbol table, using the newly allocated SHN_FBSD_CACHED value from SHN_LOOS-HIOS range as an indicator. The SHN_FBSD_CACHED together with the non-existent definition of the found symbol are reverted after successfull relocations, which is done under kld_sx lock, so it should not be visible to other consumers of the symbol table. Submitted by: Conrad Meyer Differential Revision: https://reviews.freebsd.org/D1718 MFC after: 3 weeks	2015-04-02 20:14:51 +00:00
Ryan Stone	f2c2231e0c	Fix integer truncation bug in malloc(9) A couple of internal functions used by malloc(9) and uma truncated a size_t down to an int. This could cause any number of issues (e.g. indefinite sleeps, memory corruption) if any kernel subsystem tried to allocate 2GB or more through malloc. zfs would attempt such an allocation when run on a system with 2TB or more of RAM. Note to self: When this is MFCed, sparc64 needs the same fix. Differential revision: https://reviews.freebsd.org/D2106 Reviewed by: kib Reported by: Michael Fuckner <michael@fuckner.net> Tested by: Michael Fuckner <michael@fuckner.net> MFC after: 2 weeks	2015-04-01 12:42:26 +00:00
Randall Stewart	403df7a672	Adopt jhb's suggested changes, updated comments and callout_migration() moving to kern/kern_timeout.c This does not address his -1 -> NOCPU comment. Sponsored by: Netflix Inc.	2015-03-31 00:18:00 +00:00
Gleb Smirnoff	f6d6b5e262	Catch up on r271387 and remove unused parameter from VOP_GETPAGES_ASYNC().	2015-03-30 22:49:26 +00:00
Alexander Motin	43329ffcc8	Periodically wake up threads waiting for vmem(9) resources, so they could ask for resource reclamation again. This is kind of dirty hack, but as last resort this is better then stuck indefinitely because of KVA fragmentation, waiting until some random event free something sufficient. OpenSolaris also has this hack in its vmem(9). MFC after: 2 weeks	2015-03-30 13:30:53 +00:00
Alexander Motin	b308aaed27	Add four new DDB commands to display vmem(9) statistics. In particular, such DDB commands were added: show vmem <addr> show all vmem show vmemdump <addr> show all vmemdump As possible usage, that allows to see KVA usage and fragmentation.	2015-03-29 10:02:29 +00:00
Konstantin Belousov	eeb697c8e9	Make debug.vmem_check a tunable. It is useful to set it early. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-03-28 23:30:51 +00:00
Eric van Gyzen	e858b027db	Clean up some cosmetic nits in kern_umtx.c, found during recent work in this area and by the Clang static analyzer. Remove some dead assignments. Fix a typo in a panic string. Use umtx_pi_disown() instead of duplicate code. Use an existing variable instead of curthread. Approved by: kib (mentor) MFC after: 3 days Sponsored by: Dell Inc	2015-03-28 21:21:40 +00:00
Bjoern A. Zeeb	a04d412295	Try to unbreak !SMP kernels broken in r280785 by using the proper macros to access cc_cpu.	2015-03-28 15:07:19 +00:00
Randall Stewart	15b1eb142c	Change the callout to supply -1 to indicate we are not changing CPU, also add protection against invalid CPU's as well as split c_flags and c_iflags so that if a user plays with the active flag (the one expected to be played with by callers in MPSAFE) without a lock, it won't adversely affect the callout system by causing a corrupt list. This also means that all callers need to use the macros and not play with the falgs directly (like netgraph used to). Differential Revision: htts://reviews.freebsd.org/D1894 Reviewed by: .. timed out but looked at by jhb, imp, adrian hselasky tested by hiren and netflix. Sponsored by: Netflix Inc.	2015-03-28 12:50:24 +00:00
Hans Petter Selasky	38668c6044	Implement a simple OID number garbage collector. Given the increasing number of dynamically created and destroyed SYSCTLs during runtime it is very likely that the current new OID number limit of 0x7fffffff can be reached. Especially if dynamic OID creation and destruction results from automatic tests. Additional changes: - Optimize the typical use case by decrementing the next automatic OID sequence number instead of incrementing it. This saves searching time when inserting new OIDs into a fresh parent OID node. - Add simple check for duplicate non-automatic OID numbers. MFC after: 1 week	2015-03-25 08:55:34 +00:00
Hans Petter Selasky	502702c644	Make sure tunable sysctls are only fetched once. The existing code can re-register sysctls when destroying sysctl contexts or when moving sysctls from one tree to another.	2015-03-24 17:42:53 +00:00
Gleb Smirnoff	a2d4a7e456	Do not include if_var.h and in6_var.h into kern_jail.c. It is now possible after r280444. Sponsored by: Nginx, Inc.	2015-03-24 16:46:40 +00:00
Hans Petter Selasky	ab91c9a743	Correct string pointer offset for error printout.	2015-03-24 16:37:19 +00:00
Rui Paulo	0da9e11b7e	Disable coredump_devctl because it could lead to leaking paths to jails.	2015-03-24 02:17:17 +00:00
Mateusz Guzik	ea926658ff	filedesc: microoptimize fget_unlocked by getting rid of fd < 0 branch Casting fd to an unsigned type simplifies fd range coparison to mere checking if the result is bigger than the table.	2015-03-24 00:10:11 +00:00
Ian Lepore	296f235de0	The sysctls that return process argv and envv return binary data, so clear the SBUF_INCLUDENUL flag. Pointed out by: tijl@	2015-03-22 21:18:44 +00:00
Hans Petter Selasky	2793ea13aa	Fix for out of order device destruction notifications when using the delist_dev() function. In addition to this change: - add a proper description of this function - add a proper witness assert inside this function - switch a nearby line to use the "cdp" pointer instead of cdev2priv() MFC after: 3 days	2015-03-22 13:11:56 +00:00
Mateusz Guzik	f97af9706b	proc: use MTX_NEW flag in proc_init This allows us to get rid of bzero which was added specifically to make mtx_init on p_mtx reliable. This also fixes a potential problem where mtx_init on other mutexes could trip over on unitialized memory and fire an assertion. Reviewed by: kib	2015-03-21 20:25:34 +00:00
Mateusz Guzik	ffb34484ee	cred: add proc_set_cred_init helper proc_set_cred_init can be used to set first credentials of a new process. Update proc_set_cred assertions so that it only expects already used processes. This fixes panics where p_ucred of a new process happens to be non-NULL. Reviewed by: kib	2015-03-21 20:24:54 +00:00
Mateusz Guzik	12cec311e6	fork: assign refed credentials earlier Prior to this change the kernel would take p1's credentials and assign them tempororarily to p2. But p1 could change credentials at that time and in effect give us a use-after-free. No objections from: kib	2015-03-21 20:24:03 +00:00
Alan Cox	3d653db063	Introduce vm_object_color() and use it in mmap(2) to set the color of named objects to zero before the virtual address is selected. Previously, the color setting was delayed until after the virtual address was selected. In rtld, this delay effectively prevented the mapping of a shared library's code section using superpages. Now, for example, we see the first 1 MB of libc's code on armv6 mapped by a superpage after we've gotten through the initial cold misses that bring the first 1 MB of code into memory. (With the page clustering that we perform on read faults, this happens quickly.) Differential Revision: https://reviews.freebsd.org/D2013 Reviewed by: jhb, kib Tested by: Svatopluk Kraus (armv6) MFC after: 6 weeks	2015-03-21 17:56:55 +00:00
Olivier Houchard	d8d2f47629	error is only used if MAC is defined, so make its declaration conditional as well.	2015-03-21 16:16:17 +00:00
Konstantin Belousov	0555fb3523	Somewhat modernize the SysV shm code: - Use real locking, replace Giant with global sx protecting the subsystem. Since the subsystem' lock is no longer dropped during the sleepsk, remove not needed SHMSEG_WANTED segment flag, and revert r278963. - To do proper code simplification possible after the change of the lock, restructure several functions into _locked body and originally-named wrapper which calls into _locked variant. This allows to eliminate the 'goto done2' spread over the code. - Merge shm_find_segment_by_shmid() and shm_find_segment_by_shmidx(). - Consistently change all function prototypes to ANSI C. Reviewed by: mjg (who has earlier version of the similar patch to introduce real locking) Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2015-03-21 15:01:19 +00:00
Mateusz Guzik	5bc0ff888a	coredump: protect corefilename access with a lock Previously format string traversal could happen while the string itself was being modified. Use allproc_lock as coredumping is a rare operation and as such we don't have to create a dedicated lock. Submitted by: Tiwei Bie <btw mail.ustc.edu.cn> Reviewed by: kib X-Additional: JuniorJobs project	2015-03-21 04:39:33 +00:00
Ian Lepore	612d9391a4	The minimum sbuf buffer size is 2 bytes (a byte plus a nulterm), assert that. Values smaller than two lead to strange asserts that have nothing to do with the actual problem (in the case of size=0), or to writing beyond the end of the allocated buffer in sbuf_finish() (in the case of size=1).	2015-03-17 21:00:31 +00:00
Ian Lepore	2834924513	In sbuf_new_for_sysctl(), default the buffer size to 64 bytes if the passed-in pointer is NULL and the length is zero.	2015-03-17 20:56:24 +00:00
Gleb Smirnoff	8c4df6296b	Reduce header pollution.	2015-03-17 14:16:50 +00:00
Benno Rice	43348dc2ad	Reset bp->bio_done to unmapped_buf when removing a transient map in biodone. Submitted by: Scott Ferris <scott.ferris@isilon.com> Sponsored by: EMC / Isilon Storage Division Reviewed by: kib	2015-03-16 20:00:09 +00:00
Ian Lepore	f62fbd30cb	Trivial change / forced-commit to document prior change that slipped in without a commit message... Use sbuf_new() + SYSCTL_OUT() instead of wiring the userland buffer and using sbuf_new_for_sysctl(). The preallocated 256 byte buffer is always going to be big enough to hold these results, and this should be more efficient than wiring the old buffer.	2015-03-16 19:29:19 +00:00
Ian Lepore	ff352d8978		2015-03-16 19:25:03 +00:00
Ian Lepore	ba00885515	Use a regular sbuf + SYSCTL_OUT() rather than sbuf_new_for_sysctl() with auto-draining, to avoid a potential copyout fault while holding a lock. Pointed out by: jhb Pointy hat to: ian	2015-03-16 19:18:45 +00:00
Ian Lepore	8d5628fdb8	Update an sbuf assertion to allow for the new SBUF_INCLUDENUL flag. If INCLUDENUL is set and sbuf_finish() has been called, the length has been incremented to count the nulterm byte, and in that case current length is allowed to be equal to buffer size, otherwise it must be less than. Add a predicate macro to test for SBUF_INCLUDENUL, and use it in tests, to be consistant with the style in the rest of this file.	2015-03-16 17:45:41 +00:00
Mateusz Guzik	fbe503d462	proc: get rid of proc lock + unlock pair in proc_reap A comment in the code stated we PROC_LOCK and as a side effect guarantee all writers released process lock. But at that point such lock was already taken while we were removing the process from all lists, so it should be already unreachable.	2015-03-16 01:09:49 +00:00
Mateusz Guzik	daf63fd2f9	cred: add proc_set_cred helper The goal here is to provide one place altering process credentials. This eases debugging and opens up posibilities to do additional work when such an action is performed.	2015-03-16 00:10:03 +00:00
Ian Lepore	e5197e3a08	Add a nulterm byte to the returned sysctl string. PR: 195668	2015-03-15 00:39:18 +00:00
Ian Lepore	657282e062	Include the nulterm byte in the sysctl string. PR: 195668	2015-03-15 00:36:08 +00:00
Ian Lepore	91d9eda200	Use sbuf_printf() for sysctl strings instead of stack buffers and snprintf().	2015-03-14 23:16:12 +00:00
Ian Lepore	acfc962f82	Use SYSCTL_OUT_STR() to return strings. PR: 195668	2015-03-14 21:40:01 +00:00
Ian Lepore	b773372938	Use sbuf_new_for_sysctl() instead of plain sbuf_new() to ensure sysctl string returned to userland is nulterminated. PR: 195668	2015-03-14 18:46:33 +00:00
Ian Lepore	b97fa22cd6	Use sbuf_new_for_sysctl() instead of plain sbuf_new() to ensure sysctl string returned to userland is nulterminated. PR: 195668	2015-03-14 18:42:30 +00:00
Ian Lepore	1eafc07856	Set the SBUF_INCLUDENUL flag in sbuf_new_for_sysctl() so that sysctl strings returned to userland include the nulterm byte. Some uses of sbuf_new_for_sysctl() write binary data rather than strings; clear the SBUF_INCLUDENUL flag after calling sbuf_new_for_sysctl() in those cases. (Note that the sbuf code still automatically adds a nulterm byte in sbuf_finish(), but since it's not included in the length it won't get copied to userland along with the binary data.) Remove explicit adding of a nulterm byte in a couple places now that it gets done automatically by the sbuf drain code. PR: 195668	2015-03-14 17:08:28 +00:00
Ian Lepore	f4d281428f	Add a new flag, SBUF_INCLUDENUL, and new get/set/clear functions for flags. The SBUF_INCLUDENUL flag causes the nulterm byte at the end of the string to be counted in the length of the data. If copying the data using the sbuf_data() and sbuf_len() functions, or if writing it automatically with a drain function, the net effect is that the nulterm byte is copied along with the rest of the data.	2015-03-14 16:02:11 +00:00
Hans Petter Selasky	b7ba031ff7	Factor out mbuf hashing code from LAGG driver so that other network drivers can use it. This avoids some code duplication. Add missing default case to all switch statements while at it. Also move the hashing of the IPv6 flow field to layer 4 because the IPv6 flow field is constant on a per L4 connection basis and not on a per L3 network. Differential Revision: https://reviews.freebsd.org/D1987 Sponsored by: Mellanox Technologies MFC after: 1 month	2015-03-11 16:02:24 +00:00
Ryan Stone	1c229658b9	Fix SR-IOV passthrough devices to allow ppt to attach A late change to the SR-IOV infrastructure broke passthrough of VFs. device_set_devclass() was being used to try to force the ppt driver to attach to the device, but this didn't work because the DF_FIXEDCLASS flag wasn't being set on the device, so the ppt driver probe routine would not match when it returned BUS_NOWILDCARD. Fix this by adding a new device function that both sets the devclass and sets the DF_FIXEDCLASS flag, and use that to force the ppt driver to attach to VFs. Differential Revision: https://reviews.freebsd.org/D2041 Reviewed by: jhb MFC after: 3 weeks	2015-03-10 23:27:13 +00:00

1 2 3 4 5 ...

14181 Commits