freebsd-dev

Author	SHA1	Message	Date
Robert Watson	4f2cbaf3cd	Track pipe(2) reads and writes as rusage message receives and sends, a feature misplaced during the transition from BSD 4.4's socket implementation to the optimised FreeBSD pipe implementation. MFC after: 1 week Reviewed by: arichardson, imp Differential Revision: https://reviews.freebsd.org/D27878	2021-01-10 12:16:39 +00:00
Mateusz Guzik	2e51c2bfd1	pipe: follow up cleanup to previous The commited patch was incomplete. - add back missing goto retry, noted by jhb - 'if (error)' -> 'if (error != 0)' - consistently do: if (error != 0) break; continue; instead of: if (error != 0) break; else continue; This adds some 'continue' uses which are not needed, but line up with the rest of pipe_write.	2020-11-25 22:53:21 +00:00
Mateusz Guzik	c8df8543fd	pipe: drop spurious pipeunlock/pipelock cycle on write	2020-11-25 21:41:23 +00:00
Mateusz Guzik	f9fe7b28bc	pipe: thundering herd problem in pipelock All reads and writes are serialized with a hand-rolled lock, but unlocking it always wakes up all waiters. Existing flag fields get resized to make room for introduction of waiter counter without growing the struct. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D27273	2020-11-19 19:25:47 +00:00
Mateusz Guzik	b8cb628534	pipe: tidy up pipelock	2020-11-19 08:16:45 +00:00
Mateusz Guzik	89744405e6	pipe: allow for lockless pipe_stat pipes get stated all thet time and this avoidably contributed to contention. The pipe lock is only held to accomodate MAC and to check the type. Since normally there is no probe for pipe stat depessimize this by having the flag. The pipe_state field gets modified with locks held all the time and it's not feasible to convert them to use atomic store. Move the type flag away to a separate variable as a simple cleanup and to provide stable field to read. Use short for both fields to avoid growing the struct. While here short-circuit MAC for pipe_poll as well.	2020-11-19 06:30:25 +00:00
Mateusz Guzik	331c21dd5e	pipe: whitespace nit in previous	2020-11-04 23:17:41 +00:00
Mateusz Guzik	c22ba7bb06	pipe: fix POLLHUP handling if no events were specified Linux allows polling without any events specified and it happens to be the case in FreeBSD as well. POLLHUP has to be delivered regardless of the event mask and this works fine if the condition is already present. However, if it is missing, selrecord is only called if the eventmask has relevant bits set. This in particular leads to a conditon where pipe_poll can return 0 events and neglect to selrecord, while kern_poll takes it as an indication it has to go to sleep, but then there is nobody to wake it up. While the problem seems systemic to *_poll handlers the least we can do is fix it up for pipes. Reported by: Jeremie Galarneau <jeremie.galarneau at efficios.com> Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D27094	2020-11-04 23:11:54 +00:00
Mateusz Guzik	6fed89b179	kern: clean up empty lines in .c and .h files	2020-09-01 22:12:32 +00:00
Mark Johnston	85232c2ff1	Rename the pipe_map field of struct pipe. This is to avoid conflicts with a upcoming macro. pipe_pages is a more accurate name since the field tracks pages wired into the kernel as part of a process-to-process copy operation. Reviewed by: alc, kib MFC after: 1 week Sponsored by: The FreeBSD Foundation	2020-08-14 14:50:41 +00:00
Mateusz Guzik	4f00177887	pipe: reduce atime precision The routine is called on successful write and read, which on pipes happens a lot and for small sizes. Precision provided by default seems way bigger than necessary and it causes problems in vms on amd64 (it rdtscp's which vmexits). getnanotime seems to provide the level roughly in lines of Linux so we should be good here. Sample result from will-it-scale pipe1_processes -t 1 (ops/s): before: 426464 after: 3247421 Note the that atime handling for named pipes is broken with and without the patch. The filesystem code is never used for updating atime and never looks at the updated field. Consequently, while there are no provisions added to handle named pipes separately, the change is a nop for that case. Differential Revision: https://reviews.freebsd.org/D23964	2020-08-05 19:15:59 +00:00
Mark Johnston	569eb766c5	Fix handling of EV_EOF for named pipes. Contrary to the kevent man page, EV_EOF on a fifo is not cleared by EV_CLEAR. Modify the read and write filters to clear EV_EOF when the fifo's PIPE_EOF flag is clear, and update the man page to document the new behaviour. Modify the write filter to return the amount of buffer space available even if no readers are present. This matches the behaviour for sockets. When reading from a pipe, only call pipeselwakeup() if some data was actually read. This prevents the continuous re-triggering of a EVFILT_READ event on EOF when in edge-triggered mode. PR: 203366, 224615 Submitted by: Jan Kokemüller <jan.kokemueller@gmail.com> MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D24528	2020-04-27 15:59:19 +00:00
Mark Johnston	9b22722423	Call pipeselwakeup() after toggling PIPE_EOF. This ensures that pipe_poll() and the pipe kqueue filters observe PIPE_EOF and set EV_EOF accordingly. As a result an extra call to knote() after setting PIPE_EOF is unnecessary. Submitted by: Jan Kokemüller <jan.kokemueller@gmail.com> MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D24528	2020-04-27 15:59:07 +00:00
Mark Johnston	9ab4355732	Avoid returning POLLIN if the pipe descriptor is not open for reading. Submitted by: Jan Kokemüller <jan.kokemueller@gmail.com> MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D24528	2020-04-27 15:58:55 +00:00
Konstantin Belousov	2d3c083fd7	pipe: explain why not deallocating inode number is fine. Suggested and reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D24009	2020-03-09 23:40:25 +00:00
Konstantin Belousov	c6d3d601c9	Preallocate pipe buffers on pipe creation. Return ENOMEM if one of the buffer cannot be created even with the minimal size. This should avoid subsequent spurious ENOMEM errors from write(2) when buffer cannot be allocated on the fly, after we reported that the pipe was create succesfully. Reported by: Keno Fischer <keno@juliacomputing.com> Reviewed by: markj (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D23993	2020-03-09 21:55:26 +00:00
Konstantin Belousov	1213de28f8	Style. Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D23993	2020-03-09 19:46:28 +00:00
Mateusz Guzik	3ff65f71cb	Remove duplicated empty lines from kern/*.c No functional changes.	2020-01-30 20:05:05 +00:00
Mark Johnston	1cbfe73da5	Fix handling of PIPE_EOF in the direct write path. Suppose a writing thread has pinned its pages and gone to sleep with pipe_map.cnt > 0. Suppose that the thread is woken up by a signal (so error != 0) and the other end of the pipe has simultaneously been closed. In this case, to satisfy the assertion about pipe_map.cnt in pipe_destroy_write_buffer(), we must mark the buffer as empty. Reported by: syzbot+5cce271bf2cb1b1e1876@syzkaller.appspotmail.com Reviewed by: kib Tested by: pho MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D22261	2019-11-11 20:44:30 +00:00
Mark Johnston	6bc13e042f	Modify pipe_poll() to properly check for pending direct writes. With r349546, it is a responsibility of the writer to clear PIPE_DIRECTW after pinned data has been read. In particular, once a reader has drained this data, there is a small window where the pipe is empty but PIPE_DIRECTW is set. pipe_poll() was using the presence of PIPE_DIRECTW to determine whether to return POLLIN, so in this window it would claim that data was available to read when this was not the case. Fix this by modifying several checks for PIPE_DIRECTW to instead look at the number of residual bytes in data pinned by a direct writer. In some cases we really do want to check for PIPE_DIRECTW, since the presence of this flag indicates that any attempt to write to the pipe will block on the existing direct writer. Bisected and test case provided by: mav Tested by: pho Reviewed by: kib MFC after: 3 days Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21333	2019-08-21 19:35:04 +00:00
Mark Johnston	02476c44c5	Fix mutual exclusion in pipe_direct_write(). We use PIPE_DIRECTW as a semaphore for direct writes to a pipe, where the reader copies data directly from pages mapped into the writer. However, when a reader finishes such a copy, it previously cleared PIPE_DIRECTW, allowing multiple writers to race and corrupt the state used to track wired pages belonging to the writer. Fix this by having the writer clear PIPE_DIRECTW and instead use the count of unread bytes to determine whether a write is finished. Reported by: syzbot+21811cc0a89b2a87a9e7@syzkaller.appspotmail.com Reviewed by: kib, mjg Tested by: pho MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D20784	2019-06-29 16:05:52 +00:00
Alan Somers	38b06f8ac4	fcntl: fix overflow when setting F_READAHEAD VOP_READ and VOP_WRITE take the seqcount in blocks in a 16-bit field. However, fcntl allows you to set the seqcount in bytes to any nonnegative 31-bit value. The result can be a 16-bit overflow, which will be sign-extended in functions like ffs_read. Fix this by sanitizing the argument in kern_fcntl. As a matter of policy, limit to IO_SEQMAX rather than INT16_MAX. Also, fifos have overloaded the f_seqcount field for a completely different purpose ever since r238936. Formalize that by using a union type. Reviewed by: cem MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D20710	2019-06-20 23:07:20 +00:00
Mark Johnston	18a7de663b	Move a racy assertion in filt_pipewrite(). EVFILT_WRITE knotes for pipes live on the knlist for the other end of the pipe. Since they do not hold a reference on the corresponding file structure, they may be removed from the knlist by pipeclose() while still remaining active. In this case, there is no knlist lock acquired before filt_pipewrite() is called, so the assertion fails. Fix the problem by first checking whether that end of the pipe has been closed. These checks are memory safe since the knote holds a reference on one end of the pipe, and the pipe structure is not freed until both ends are closed. The checks are not racy since PIPE_EOF is never cleared after being set, and pipe_present is never set back to PIPE_ACTIVE after pipeclose() has been called. PR: 235640 Reported and tested by: pho Reviewed by: kib MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D19224	2019-02-19 15:46:43 +00:00
Mark Johnston	648890835c	Remove a write-only variable orphaned by r340677.	2019-02-17 16:56:41 +00:00
Mateusz Guzik	737037f6c0	pipe: use unr64 Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D18054	2018-11-20 14:59:27 +00:00
Mark Johnston	07702f72e5	Avoid specifying VM_PROT_EXECUTE in mappings from pipe_map and exec_map. These submaps are used for mapping pipe buffers and execv() argument strings respectively, so there's no need for such mappings to have execute permissions. Reported by: jhb Reviewed by: alc, jhb, kib MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D17827	2018-11-06 21:57:03 +00:00
Ed Maste	b8d908b71e	ANSIfy sys/kern	2018-06-01 13:26:45 +00:00
Brooks Davis	6469bdcdb6	Move most of the contents of opt_compat.h to opt_global.h. opt_compat.h is mentioned in nearly 180 files. In-progress network driver compabibility improvements may add over 100 more so this is closer to "just about everywhere" than "only some files" per the guidance in sys/conf/options. Keep COMPAT_LINUX32 in opt_compat.h as it is confined to a subset of sys/compat/linux/*.c. A fake _COMPAT_LINUX option ensure opt_compat.h is created on all architectures. Move COMPAT_LINUXKPI to opt_dontuse.h as it is only used to control the set of compiled files. Reviewed by: kib, cem, jhb, jtl Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D14941	2018-04-06 17:35:35 +00:00
Pedro F. Giffuni	8a36da99de	sys/kern: adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 2-Clause license, however the tool I was using misidentified many licenses so this was mostly a manual - error prone - task. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts.	2017-11-27 15:20:12 +00:00
Brooks Davis	a72c64b0b6	Generate syscall tables and update pipe() implementation after r302094. Mark the pipe() system call as COMPAT10. As of r302092 libc uses pipe2() with a zero flags value instead of pipe(). Approved by: re (gjb) Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D6816	2016-06-22 21:18:19 +00:00
Pedro F. Giffuni	55e0987aea	sys: extend use of the howmany() macro when available. We have a howmany() macro in the <sys/param.h> header that is convenient to re-use as it makes things easier to read.	2016-04-26 15:38:17 +00:00
Ed Schouten	8328babdd0	Make pipes in CloudABI work. Summary: Pipes in CloudABI are unidirectional. The reason for this is that CloudABI attempts to provide a uniform runtime environment across different flavours of UNIX. Instead of implementing a custom pipe that is unidirectional, we can simply reuse Capsicum permission bits to support this. This is nice, because CloudABI already attempts to restrict permission bits to correspond with the operations that apply to a certain file descriptor. Replace kern_pipe() and kern_pipe2() by a single kern_pipe() that takes a pair of filecaps. These filecaps are passed to the newly introduced falloc_caps() function that creates the descriptors with rights in place. Test Plan: CloudABI pipes seem to be created with proper rights in place: https://github.com/NuxiNL/cloudlibc/blob/master/src/libc/unistd/pipe_test.c#L44 Reviewers: jilles, mjg Reviewed By: mjg Subscribers: imp Differential Revision: https://reviews.freebsd.org/D3236	2015-07-29 17:18:27 +00:00
Conrad Meyer	c578e0fb48	pipe_direct_write: Fix mismatched pipelock/unlock If a signal is caught in pipelock, causing it to fail, pipe_direct_write should not try to pipeunlock. Reported by: pho Differential Revision: https://reviews.freebsd.org/D3069 Reviewed by: kib Approved by: markj (mentor) MFC after: 1 week Sponsored by: EMC / Isilon Storage Division	2015-07-13 17:45:22 +00:00
Mateusz Guzik	90f54cbfeb	fd: remove filedesc argument from fdclose Just accept a thread instead. This makes it consistent with fdalloc. No functional changes.	2015-04-11 15:40:28 +00:00
Konstantin Belousov	ff5ba73987	Fix use after free in pipe_dtor(). PIPE_NAMED flag must be tested before pipeclose() is called, since for !PIPE_NAMED case, when peer is already closed, the pipe pair memory is freed. Submitted by: luke.tw@gmail.com PR: 197246 Tested by: pho MFC after: 3 days	2015-02-03 10:29:40 +00:00
Konstantin Belousov	fe63170115	Do not assert that the new pipepair mutex is not initialized. The backing memory contains garbage and might trigger the assertion. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2015-01-21 16:32:54 +00:00
Konstantin Belousov	6762091ea4	Remove lock recursion for the pipe pair mutex, and disable the recursion on mutex initialization. The only places where the recursive acquire is performed are read and write filters, since knlist, which uses the pipe pair mutex as lock, is locked when filter is called. The recursion was added in r93296, and consistent locking for kn_fop->f_event() introduced in r133741. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 month	2014-11-29 17:18:20 +00:00
Konstantin Belousov	ab57474c83	When other end of the pipe closed during the write, but some bytes were written, return short write instead of EPIPE. Update comment. Discussed with: bde (long time ago) MFC after: 2 weeks	2014-11-03 10:01:56 +00:00
John Baldwin	9696feebe2	Add a new fo_fill_kinfo fileops method to add type-specific information to struct kinfo_file. - Move the various fill_*_info() methods out of kern_descrip.c and into the various file type implementations. - Rework the support for kinfo_ofile to generate a suitable kinfo_file object for each file and then convert that to a kinfo_ofile structure rather than keeping a second, different set of code that directly manipulates type-specific file information. - Remove the shm_path() and ksem_info() layering violations. Differential Revision: https://reviews.freebsd.org/D775 Reviewed by: kib, glebius (earlier version)	2014-09-22 16:20:47 +00:00
John Baldwin	cd550b9b52	Tweak pipe_truncate() to more closely match pipe_chown() and pipe_chmod() by checking PIPE_NAMED and using invfo_truncate() for unnamed pipes.	2014-09-12 21:20:36 +00:00
Hans Petter Selasky	af3b2549c4	Pull in r267961 and r267973 again. Fix for issues reported will follow.	2014-06-28 03:56:17 +00:00
Glen Barber	37a107a407	Revert r267961, r267973: These changes prevent sysctl(8) from returning proper output, such as: 1) no output from sysctl(8) 2) erroneously returning ENOMEM with tools like truss(1) or uname(1) truss: can not get etype: Cannot allocate memory	2014-06-27 22:05:21 +00:00
Hans Petter Selasky	3da1cf1e88	Extend the meaning of the CTLFLAG_TUN flag to automatically check if there is an environment variable which shall initialize the SYSCTL during early boot. This works for all SYSCTL types both statically and dynamically created ones, except for the SYSCTL NODE type and SYSCTLs which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to be used in the case a tunable sysctl has a custom initialisation function allowing the sysctl to still be marked as a tunable. The kernel SYSCTL API is mostly the same, with a few exceptions for some special operations like iterating childrens of a static/extern SYSCTL node. This operation should probably be made into a factored out common macro, hence some device drivers use this. The reason for changing the SYSCTL API was the need for a SYSCTL parent OID pointer and not only the SYSCTL parent OID list pointer in order to quickly generate the sysctl path. The motivation behind this patch is to avoid parameter loading cludges inside the OFED driver subsystem. Instead of adding special code to the OFED driver subsystem to post-load tunables into dynamically created sysctls, we generalize this in the kernel. Other changes: - Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask" to "hw.pcic.intr_mask". - Removed redundant TUNABLE statements throughout the kernel. - Some minor code rewrites in connection to removing not needed TUNABLE statements. - Added a missing SYSCTL_DECL(). - Wrapped two very long lines. - Avoid malloc()/free() inside sysctl string handling, in case it is called to initialize a sysctl from a tunable, hence malloc()/free() is not ready when sysctls from the sysctl dataset are registered. - Bumped FreeBSD version to indicate SYSCTL API change. MFC after: 2 weeks Sponsored by: Mellanox Technologies	2014-06-27 16:33:43 +00:00
Mateusz Guzik	183870cf75	Ignore the error from pipespace_new when creating a pipe. It can fail if pipe map is exhausted (as a result of too many pipes created), but it is not fatal and could be provoked by unprivileged users. The only consequence is worse performance with given pipe. Reported by: ivoras Suggested by: kib MFC after: 1 week	2014-05-02 00:52:13 +00:00
John Baldwin	edb572a38c	Add a mmap flag (MAP_32BIT) on 64-bit platforms to request that a mapping use an address in the first 2GB of the process's address space. This flag should have the same semantics as the same flag on Linux. To facilitate this, add a new parameter to vm_map_find() that specifies an optional maximum virtual address. While here, fix several callers of vm_map_find() to use a VMFS_* constant for the findspace argument instead of TRUE and FALSE. Reviewed by: alc Approved by: re (kib)	2013-09-09 18:11:59 +00:00
Gleb Smirnoff	ca04d21d5f	Make sendfile() a method in the struct fileops. Currently only vnode backed file descriptors have this method implemented. Reviewed by: kib Sponsored by: Nginx, Inc. Sponsored by: Netflix	2013-08-15 07:54:31 +00:00
Jilles Tjoelker	dc570d5e56	Add pipe2() system call. The pipe2() function is similar to pipe() but allows setting FD_CLOEXEC and O_NONBLOCK (on both sides) as part of the function. If p points to two writable ints, pipe2(p, 0) is equivalent to pipe(p). If the pointer is not valid, behaviour differs: pipe2() writes into the array from the kernel like socketpair() does, while pipe() writes into the array from an architecture-specific assembler wrapper. Reviewed by: kan, kib	2013-05-01 22:42:42 +00:00
Jilles Tjoelker	d289dc7b73	Rename do_pipe() to kern_pipe2() and declare it properly.	2013-03-31 17:42:54 +00:00
Pawel Jakub Dawidek	49549b1894	Remove redundant space.	2013-02-17 11:48:16 +00:00
David Xu	5ff2bb52cc	I am comparing current pipe code with the one in 8.3-STABLE r236165, I found 8.3 is a history BSD version using socket to implement FIFO pipe, it uses per-file seqcount to compare with writer generation stored in per-pipe object. The concept is after all writers are gone, the pipe enters next generation, all old readers have not closed the pipe should get the indication that the pipe is disconnected, result is they should get EPIPE, SIGPIPE or get POLLHUP in poll(). But newcomer should not know that previous writters were gone, it should treat it as a fresh session. I am trying to bring back FIFO pipe to history behavior. It is still unclear that if single EOF flag can represent SBS_CANTSENDMORE and SBS_CANTRCVMORE which socket-based version is using, but I have run the poll regression test in tool directory, output is same as the one on 8.3-STABLE now. I think the output "not ok 18 FIFO state 6b: poll result 0 expected 1. expected POLLHUP; got 0" might be bogus, because newcomer should not know that old writers were gone. I got the same behavior on Linux. Our implementation always return POLLIN for disconnected pipe even it should return POLLHUP, but I think it is not wise to remove POLLIN for compatible reason, this is our history behavior. Regression test: /usr/src/tools/regression/poll	2012-07-31 05:48:35 +00:00

1 2 3 4 5 ...

279 Commits