freebsd-nq

Author	SHA1	Message	Date
Xin LI	4e8671dd78	GZIO: Update to use zlib 1.2.11. PR: 229763 Submitted by: Yoshihiro Ota <ota j email ne jp> Differential Revision: https://reviews.freebsd.org/D21408	2019-08-25 07:50:44 +00:00
Mateusz Guzik	0256405e98	vfs: add vholdnz (for already held vnodes) Reviewed by: kib (previous version) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21358	2019-08-25 05:11:43 +00:00
Mateusz Guzik	5b596b9fa5	Remove the obsolete pcpu_zone_ptr zone. It was only used by flowtable (removed in r321618). Sponsored by: The FreeBSD Foundation	2019-08-24 00:01:19 +00:00
Konstantin Belousov	e671edac06	De-commision the MNTK_NOINSMNTQ kernel mount flag. After all the changes, its dynamic scope is same as for MNTK_UNMOUNT, but to allow the syncer vnode to be re-installed on unmount failure. But the case of syncer was already handled by using the VV_FORCEINSMQ flag for quite some time. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2019-08-23 19:40:10 +00:00
Xin LI	a11bf9a49b	INVARIANTS: treat LA_LOCKED as the same of LA_XLOCKED in mtx_assert. The Linux lockdep API assumes LA_LOCKED semantic in lockdep_assert_held(), meaning that either a shared lock or write lock is Ok. On the other hand, the timeout code uses lc_assert() with LA_XLOCKED, and we need both to work. For mutexes, because they can not be shared (this is unique among all lock classes, and it is unlikely that we would add new lock class anytime soon), it is easier to simply extend mtx_assert to handle LA_LOCKED there, despite the change itself can be viewed as a slight abstraction violation. Reviewed by: mjg, cem, jhb MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D21362	2019-08-23 06:39:40 +00:00
Brooks Davis	075ac3b446	Reorganise conditionals to reduce duplication. No functional change. Obtained from: CheriBSD MFC after: 3 days Sponsored by: DARPA, AFRL	2019-08-22 10:21:07 +00:00
Rick Macklem	df9bc7df42	Map ENOTTY to EINVAL for lseek(SEEK_DATA/SEEK_HOLE). Without this patch, when an application performed lseek(SEEK_DATA/SEEK_HOLE) on a file in a file system that does not have its own VOP_IOCTL(), the lseek(2) fails with errno ENOTTY. This didn't seem appropriate, since ENOTTY is not listed as an error return by either the lseek(2) man page nor the POSIX draft for lseek(2). This was discussed on freebsd-current@ here: http://docs.FreeBSD.org/cgi/mid.cgi?CAOtMX2iiQdv1+15e1N_r7V6aCx_VqAJCTP1AW+qs3Yg7sPg9wA This trivial patch maps ENOTTY to EINVAL for lseek(SEEK_DATA/SEEK_HOLE). Reviewed by: markj Relnotes: yes Differential Revision: https://reviews.freebsd.org/D21300	2019-08-22 01:15:06 +00:00
Mark Johnston	5b699f1614	Add lockmgr(9) probes to the lockstat DTrace provider. They follow the conventions set by rw and sx lock probes. There is an additional lockstat:::lockmgr-disown probe. Update lockstat(1) to report on contention and hold events for lockmgr locks. Document the new probes in dtrace_lockstat.4, and deduplicate some of the existing probe descriptions. Reviewed by: mjg MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21355	2019-08-21 23:43:58 +00:00
Mark Johnston	9fb7c918ef	Remove manual wire_count adjustments from the unmapped mbuf code. The original code came from a desire to minimize the number of updates to v_wire_count, which prior to r329187 was updated using atomics. However, there is no significant benefit to batching today, so simply allocate pages using VM_ALLOC_WIRED and rely on system accounting. Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D21323	2019-08-21 20:01:52 +00:00
Mark Johnston	6bc13e042f	Modify pipe_poll() to properly check for pending direct writes. With r349546, it is a responsibility of the writer to clear PIPE_DIRECTW after pinned data has been read. In particular, once a reader has drained this data, there is a small window where the pipe is empty but PIPE_DIRECTW is set. pipe_poll() was using the presence of PIPE_DIRECTW to determine whether to return POLLIN, so in this window it would claim that data was available to read when this was not the case. Fix this by modifying several checks for PIPE_DIRECTW to instead look at the number of residual bytes in data pinned by a direct writer. In some cases we really do want to check for PIPE_DIRECTW, since the presence of this flag indicates that any attempt to write to the pipe will block on the existing direct writer. Bisected and test case provided by: mav Tested by: pho Reviewed by: kib MFC after: 3 days Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21333	2019-08-21 19:35:04 +00:00
Ed Maste	f37192064a	mqueuefs: fix compat32 struct file leak In a compat32 error case we previously leaked a struct file. Submitted by: Karsten König, Secfault Security Security: CVE-2019-5603	2019-08-20 17:44:03 +00:00
Jeff Roberson	cf27e0d125	Use an atomic reference count for paging in progress so that callers do not require the object lock. Reviewed by: markj Tested by: pho (as part of a larger branch) Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D21311	2019-08-19 23:09:38 +00:00
Mateusz Guzik	4b3f767340	vfs: fix up r351193 ("stop always overwriting ->mnt_stat in VFS_STATFS") fs-specific part of vfs_statfs routines only fill in small portion of the structure. Previous code was always copying everything at a higher layer to acoomodate it and this patch does the same. 'df' (no arguments) worked fine because the caller uses mnt_stat itself as the target buffer, making all the copying a no-op for its own case. 'df /' and similar use a different consumer which passes its own buffer and this is where you can run into trouble. Reported by: cy Fixes: r351193 Sponsored by: The FreeBSD Foundation	2019-08-19 14:11:54 +00:00
Andrey V. Elsukov	75697b16b6	Use TAILQ_FOREACH_SAFE() macro to avoid use after free in soclose(). PR: 239893 MFC after: 1 week	2019-08-19 12:42:03 +00:00
Andriy Gapon	0db7afd0ae	assert that td_lk_slocks is not leaked upon return from kernel This is similar to checks for td_sx_slocks and td_rw_rlocks. Although td_lk_slocks is an implementation detail, it still makes sense to validate it. MFC after: 1 week Sponsored by: Panzura	2019-08-19 11:18:36 +00:00
Rick Macklem	2e1b32c0e3	Add a vop_stdioctl() that performs a trivial FIOSEEKDATA/FIOSEEKHOLE. Without this patch, when an application performed lseek(SEEK_DATA/SEEK_HOLE) on a file in a file system that does not have its own VOP_IOCTL(), the lseek(2) fails with errno ENOTTY. This didn't seem appropriate, since ENOTTY is not listed as an error return by either the lseek(2) man page nor the POSIX draft for lseek(2). A discussion on freebsd-current@ seemed to indicate that implementing a trivial algorithm that returns the offset argument for FIOSEEKDATA and returns the file's size for FIOSEEKHOLE was the preferred fix. http://docs.FreeBSD.org/cgi/mid.cgi?CAOtMX2iiQdv1+15e1N_r7V6aCx_VqAJCTP1AW+qs3Yg7sPg9wA The Linux kernel appears to implement this trivial algorithm as well. This patch adds a vop_stdioctl() that implements this trivial algorithm. It returns errors consistent with vn_bmap_seekhole() and, as such, will still return ENOTTY for non-regular files. I have proposed a separate patch that maps errors not described by the lseek(2) man page nor POSIX draft to EINVAL. This patch is under separate review. Reviewed by: kib Relnotes: yes Differential Revision: https://reviews.freebsd.org/D21299	2019-08-19 00:29:05 +00:00
Konstantin Belousov	de4e1aeb21	Fix an issue with executing tmpfs binary. Suppose that a binary was executed from tmpfs mount, and the text vnode was reclaimed while the binary was still running. It is possible during even the normal operations since tmpfs vnode' vm_object has swap type, and no references on the vnode is held. Also assume that the text vnode was revived for some reason. Then, on the process exit or exec, unmapping of the text mapping tries to remove the text reference from the vnode, but since it went from recycle/instantiation cycle, there is no reference kept, and assertion in VOP_UNSET_TEXT_CHECKED() triggers. Fix this by keeping a use reference on the tmpfs vnode for each exec reference. This prevents the vnode reclamation while executable map entry is active. Do it by adding per-mount flag MNTK_TEXT_REFS that directs vop_stdset_text() to add use ref on first vnode text use, and per-vnode VI_TEXT_REF flag, to record the need on unref in vop_stdunset_text() on last vnode text use going away. Set MNTK_TEXT_REFS for tmpfs mounts. Reported by: bdrewery Tested by: sbruno, pho (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 week	2019-08-18 20:36:11 +00:00
Konstantin Belousov	bb9e2184f0	Change locking requirements for VOP_UNSET_TEXT(). Require the vnode to be locked for the VOP_UNSET_TEXT() call. This will be used by the following bug fix for a tmpfs issue. Tested by: sbruno, pho (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 week	2019-08-18 20:24:52 +00:00
Mateusz Guzik	e7c1709aaf	vfs: stop always overwriting ->mnt_stat in VFS_STATFS The struct is already populated on each mount (and remount). Fields are either constant or not used by filesystem in the first place. Some infrequently used functions use it to avoid having to allocate a new buffer and are left alone. The current code results in an avoidable copying single-threaded and significant cache line bouncing multithreaded While here deduplicate initial filling of the struct. Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21317	2019-08-18 18:40:12 +00:00
Jeff Roberson	33205c60e7	Add a blocking wait bit to refcount. This allows refs to be used as a simple barrier. Reviewed by: markj, kib Discussed with: jhb Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D21254	2019-08-18 11:43:58 +00:00
Mateusz Guzik	50c7615fb0	fork: rework locking around do_fork - move allproc lock into the func, it is of no use prior to it - the code would lock p1 and p2 while holding allproc to partially construct it after it gets added to the list. instead we can do the work prior to adding anything. - protect lastpid with procid_lock As a side effect we do less work with allproc held. Sponsored by: The FreeBSD Foundation	2019-08-17 18:19:49 +00:00
Mateusz Guzik	60cdcb644d	fork: bump process count before checking for permission to cross the limit The limit is almost never reached. Do the check only on failure to see if we can override it. No change in user-visible behavior. Sponsored by: The FreeBSD Foundation	2019-08-17 17:56:43 +00:00
Mateusz Guzik	b05641b6bd	fork: stop skipping < 100 ids on wrap around Code doing this is commented with a claim that these IDs are occupied by daemons, but that's demonstrably false. To an extent the range is used by init and kernel processes (and on sufficiently big machines it indeed is fully populated). On a sample box 40-way box the highest id in the range is 63. On a different one it is 23. Just use the range. Sponsored by: The FreeBSD Foundation	2019-08-17 17:42:01 +00:00
Alexander Motin	3a60f3dad0	Add support for 'j', 't' and 'z' flags to kernel sscanf(). MFC after: 2 weeks	2019-08-16 19:46:22 +00:00
Jeff Roberson	2194393787	Move phys_avail definition into MI code. It is consumed in the MI layer and doing so adds more flexibility with less redundant code. Reviewed by: jhb, markj, kib Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D21250	2019-08-16 00:45:14 +00:00
Rick Macklem	c61b14315f	Fix copy_file_range(2) so that unneeded blocks are not allocated to the output file. When the byte range for copy_file_range(2) doesn't go to EOF on the output file and there is a hole in the input file, a hole must be "punched" in the output file. This is done by writing a block of bytes all set to 0. Without this patch, the write is done unconditionally which means that, if the output file already has a hole in that byte range, a unneeded data block of all 0 bytes would be allocated. This patch adds code to check for a hole in the output file, so that it can skip doing the write if there is already a hole in that byte range of the output file. This avoids unnecessary allocation of blocks to the output file. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D21155	2019-08-15 23:21:41 +00:00
Jeff Roberson	018ff6860f	Move scheduler state into the per-cpu area where it can be allocated on the correct NUMA domain. Reviewed by: markj, gallatin Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D19315	2019-08-13 04:54:02 +00:00
Konstantin Belousov	7e097daa93	Only enable COMPAT_43 changes for syscalls ABI for a.out processes. Reviewed by: imp, jhb Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D21200	2019-08-11 19:16:07 +00:00
Jonathan T. Looney	afd959f332	In m_pulldown(), before trying to prepend bytes to the subsequent mbuf, ensure that the subsequent mbuf contains the remainder of the bytes the caller sought. If this is not the case, fall through to the code which gathers the bytes in a new mbuf. This fixes a bug where m_pulldown() could fail to gather all the desired bytes into consecutive memory. PR: 238787 Reported by: A reddit user Discussed with: emaste Obtained from: NetBSD MFC after: 3 days	2019-08-09 05:18:59 +00:00
Rick Macklem	6b1bc6f7dd	Remove some harmless cruft from vn_generic_copy_file_range(). An earlier version of the patch had code that set "error" between line#s 2797-2799. When that code was moved, the second check for "error != 0" could never be true and the check became harmless cruft. This patch removes the cruft, mainly to make Coverity happy. Reported by: asomers, cem	2019-08-08 20:07:38 +00:00
Rick Macklem	614633146f	Fix copy_file_range(2) for an unlikely race during hole finding. Since the VOP_IOCTL(FIOSEEKDATA/FIOSEEKHOLE) calls are done with the vnode unlocked, it is possible for another thread to do: - truncate(), lseek(), write() between the two calls and create a hole where FIOSEEKDATA returned the start of data. For this case, VOP_IOCTL(FIOSEEKHOLE) will return the same offset for the hole location. This could result in an infinite loop in the copy code, since copylen is set to 0 and the copy doesn't advance. Usually, this race is avoided because of the use of rangelocks, but the NFS server does not do range locking and could do a sequence like the above to create the hole. This patch checks for this case and makes the hole search fail, to avoid the infinite loop. At this time, it is an open question as to whether or not the NFS server should do range locking to avoid this race.	2019-08-08 19:53:07 +00:00
Konstantin Belousov	b706be23b4	Update comment explaining create_init(). Sponsored by: The FreeBSD Foundation MFC after: 3 days	2019-08-08 16:42:53 +00:00
Xin LI	22bbc4b242	Convert DDB_CTF to use newer version of ZLIB. PR: 229763 Submitted by: Yoshihiro Ota <ota j email ne jp> Differential Revision: https://reviews.freebsd.org/D21176	2019-08-08 07:27:49 +00:00
Conrad Meyer	7d0658ad55	Fix !DDB kernel configurations after r350713 KDB is standard and the kdb_active variable is always available. So, de-conditionalize inclusion of sys/kdb.h in kern_sysctl.c. Reported by: Michael Butler <imb AT protected-networks.net> X-MFC-With: r350713 Sponsored by: Dell EMC Isilon	2019-08-08 01:37:41 +00:00
Conrad Meyer	088c17b46b	ddb(4): Add 'sysctl' command Implement `sysctl` in `ddb` by overriding `SYSCTL_OUT`. When handling the req, we install custom ddb in/out handlers. The out handler prints straight to the debugger, while the in handler ignores all input. This is intended to allow us to print just about any sysctl. There is a known issue when used from ddb(4) entered via 'sysctl debug.kdb.enter=1'. The DDB mode does not quite prevent all lock interactions, and it is possible for the recursive Giant lock to be unlocked when the ddb(4) 'sysctl' command is used. This may result in a panic on return from ddb(4) via 'c' (continue). Obviously, this is not a problem when debugging already-paniced systems. Submitted by: Travis Lane (formerly: <travis.lane AT isilon.com>) Reviewed by: vangyzen (earlier version), Don Morris <dgmorris AT earthlink.net> Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D20219	2019-08-08 00:42:29 +00:00
Conrad Meyer	76cb1112da	sbuf(9): Add sbuf_nl_terminate() API The API is used to gracefully terminate text line(s) with a single \n. If the formatted buffer was empty or already ended in \n, it is unmodified. Otherwise, a newline character is appended to it. The API, like other sbuf-modifying routines, is only valid while the sbuf is not FINISHED. Reviewed by: rlibby Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D21030	2019-08-07 19:27:14 +00:00
Conrad Meyer	d23813cdb9	sbuf(9): Refactor sbuf_newbuf into sbuf_new Code flow was somewhat difficult to read due to the combination of multiple return sites and the 4x possible dynamic constructions of an sbuf. (Future consideration: do we need all 4?) Refactored slightly to improve legibility. No functional change. Sponsored by: Dell EMC Isilon	2019-08-07 19:25:56 +00:00
Conrad Meyer	71db411eb6	sbuf(9): Add NOWAIT dynamic buffer extension mode The goal is to avoid some kinds of low-memory deadlock when formatting heap-allocated buffers. Reviewed by: vangyzen Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D21015	2019-08-07 19:23:07 +00:00
Gleb Smirnoff	814f33aafb	Since r350426 this KASSERT doesn't serve any useful purpose.	2019-08-06 16:11:00 +00:00
Mariusz Zaborski	c878d1eb45	procdesc: fix the function name I changed name of the function r350429 and forgot to update the r350612 patch. Reported by: jenkins MFC after: 1 month	2019-08-05 20:31:17 +00:00
Mariusz Zaborski	9f5103abab	process: style We don't need to check if the parent is already set. This is done already in the proc_reparent. No functional behaviour changes intended. MFC after: 1 month	2019-08-05 20:26:01 +00:00
Mariusz Zaborski	a05cfdf479	exit1: fix style nits MFC after: 1 month	2019-08-05 20:20:14 +00:00
Mariusz Zaborski	fd631bcd95	procdesc: fix reparenting when the debugger is attached The process is reparented to the debugger while it is attached. B B / ----> \| A A D Every time when the process is reparented, it is added to the orphan list of the previous parent: A->orphan = B D->orphan = NULL When the A process will close the process descriptor to the B process, the B process will be reparented to the init process. B B - init \| ----> A D A D A->orphan = B D->orphan = B In this scenario, the B process is in the orphan list of A and D. When the last process descriptor is closed instead of reparenting it to the reaper let it stay with the debugger process and set our previews parent to the reaper. Add test case for this situation. Notice that without this patch the kernel will crash with this test case: panic: orphan 0xfffff8000e990530 of 0xfffff8000e990000 has unexpected oppid 1 Reviewed by: markj, kib MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D20361	2019-08-05 20:15:46 +00:00
Mariusz Zaborski	799d92ab78	proc: introduce the proc_add_orphan function This API allows adding the process to its parent orphan list. Reviewed by: kib, markj MFC after: 1 month	2019-08-05 20:11:57 +00:00
Mariusz Zaborski	41fadb3fca	exit1: postpone clearing P_TRACED flag until the proctree lock is acquired In case of the process being debugged. The P_TRACED is cleared very early, which would make procdesc_close() not calling proc_clear_orphan(). That would result in the debugged process can not be able to collect status of the process with process descriptor. Reviewed by: markj, kib Tested by: pho MFC after: 1 month	2019-08-05 19:59:23 +00:00
Konstantin Belousov	a1549acbaf	Fix mis-merge. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2019-08-05 19:19:25 +00:00
Konstantin Belousov	01c3ba9752	Fix mis-merge Sponsored by: The FreeBSD Foundation MFC after: 1 week	2019-08-05 19:16:33 +00:00
Justin Hibbits	937a05ba81	Add necessary bits for Linux KPI to work correctly on powerpc PowerPC, and possibly other architectures, use different address ranges for PCI space vs physical address space, which is only mapped at resource activation time, when the BAR gets written. The DRM kernel modules do not activate the rman resources, soas not to waste KVA, instead only mapping parts of the PCI memory at a time. This introduces a BUS_TRANSLATE_RESOURCE() method, implemented in the Open Firmware/FDT PCI driver, to perform this necessary translation without activating the resource. In addition to system KPI changes, LinuxKPI is updated to handle a big-endian host, by adding proper endian swaps to the I/O functions. Submitted by: mmacy Reported by: hselasky Differential Revision: https://reviews.freebsd.org/D21096	2019-08-04 19:28:10 +00:00
John Baldwin	f422bc3092	Set ISOPEN in namei flags when opening executable interpreters. These vnodes are explicitly opened via VOP_OPEN via exec_check_permissions identical to the main exectuable image. Setting ISOPEN allows filesystems to perform suitable checks in VOP_LOOKUP (e.g. close-to-open consistency in the NFS client). Reviewed by: kib MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D21129	2019-08-03 01:02:52 +00:00
Mark Johnston	8675f5f776	Only check the blessings table for known LORs. Previously we would check for blessings before marking a given lock pair as reversed, so each "reversed" lock acquisition would require a linear scan of the table. Instead, check the table after marking the pair as reversed but before generating a report. Reviewed by: jhb MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21135	2019-08-02 18:01:47 +00:00

1 2 3 4 5 ...

16765 Commits