freebsd-dev

Author	SHA1	Message	Date
Mateusz Guzik	58a3dcb229	filedesc assert that table size is at least 3 in fdsetugidsafety Requested by: kib	2014-10-22 08:56:57 +00:00
Mateusz Guzik	11888da8d9	filedesc: cleanup setugidsafety a little Rename it to fdsetugidsafety for consistency with other functions. There is no need to take filedesc lock if not closing any files. The loop has to verify each file and we are guaranteed fdtable has space for at least 20 fds. As such there is no need to check fd_lastfile. While here tidy up is_unsafe.	2014-10-22 00:23:43 +00:00
Hans Petter Selasky	f0188618f2	Fix multiple incorrect SYSCTL arguments in the kernel: - Wrong integer type was specified. - Wrong or missing "access" specifier. The "access" specifier sometimes included the SYSCTL type, which it should not, except for procedural SYSCTL nodes. - Logical OR where binary OR was expected. - Properly assert the "access" argument passed to all SYSCTL macros, using the CTASSERT macro. This applies to both static- and dynamically created SYSCTLs. - Properly assert the the data type for both static and dynamic SYSCTLs. In the case of static SYSCTLs we only assert that the data pointed to by the SYSCTL data pointer has the correct size, hence there is no easy way to assert types in the C language outside a C-function. - Rewrote some code which doesn't pass a constant "access" specifier when creating dynamic SYSCTL nodes, which is now a requirement. - Updated "EXAMPLES" section in SYSCTL manual page. MFC after: 3 days Sponsored by: Mellanox Technologies	2014-10-21 07:31:21 +00:00
Mateusz Guzik	966ee9f25f	filedesc: plug 2 write-only variables Reported by: Coverity CID: 1245745, 1245746	2014-10-20 21:57:24 +00:00
Mateusz Guzik	55056be254	filedesc: plug 2 assignments to M_ZERO-ed pointers in falloc_noinstall No functional changes.	2014-10-15 01:16:11 +00:00
Mateusz Guzik	2b4a2528d7	filedesc: fix up breakage introduced in 272505 Include sequence counter supports incoditionally [1]. This fixes reprted build problems with e.g. nvidia driver due to missing opt_capsicum.h. Replace fishy looking sizeof with offsetof. Make fde_seq the last member in order to simplify calculations. Suggested by: kib [1] X-MFC: with 272505	2014-10-05 19:40:29 +00:00
Konstantin Belousov	57c2505e65	On error, sbuf_bcat() returns -1. Some callers returned this -1 to the upper layers, which interpret it as errno value, which happens to be ERESTART. The result was spurious restarts of the sysctls in loop, e.g. kern.proc.proc, instead of returning ENOMEM to caller. Convert -1 from sbuf_bcat() to ENOMEM, when returning to the callers expecting errno. In collaboration with: pho Sponsored by: The FreeBSD Foundation (kib) MFC after: 1 week	2014-10-05 17:35:59 +00:00
Mateusz Guzik	ee3fd7bbb1	Plug capability races. fp and appropriate capability lookups were not atomic, which could result in improper capabilities being checked. This could result either in protection bypass or in a spurious ENOTCAPABLE. Make fp + capability check atomic with the help of sequence counters. Reviewed by: kib MFC after: 3 weeks	2014-10-04 08:08:56 +00:00
Mateusz Guzik	0c4a09a378	Make do_dup() static and move relevant macros to kern_descrip.c No functional changes.	2014-09-26 19:48:47 +00:00
Konstantin Belousov	f69261f2f9	Fix fcntl(2) compat32 after r270691. The copyin and copyout of the struct flock are done in the sys_fcntl(), which mean that compat32 used direct access to userland pointers. Move code from sys_fcntl() to new wrapper, kern_fcntl_freebsd(), which performs neccessary userland memory accesses, and use it from both native and compat32 fcntl syscalls. Reported by: jhibbits Sponsored by: The FreeBSD Foundation MFC after: 3 days	2014-09-25 21:07:19 +00:00
John Baldwin	9696feebe2	Add a new fo_fill_kinfo fileops method to add type-specific information to struct kinfo_file. - Move the various fill_*_info() methods out of kern_descrip.c and into the various file type implementations. - Rework the support for kinfo_ofile to generate a suitable kinfo_file object for each file and then convert that to a kinfo_ofile structure rather than keeping a second, different set of code that directly manipulates type-specific file information. - Remove the shm_path() and ksem_info() layering violations. Differential Revision: https://reviews.freebsd.org/D775 Reviewed by: kib, glebius (earlier version)	2014-09-22 16:20:47 +00:00
John Baldwin	2d69d0dcc2	Fix various issues with invalid file operations: - Add invfo_rdwr() (for read and write), invfo_ioctl(), invfo_poll(), and invfo_kqfilter() for use by file types that do not support the respective operations. Home-grown versions of invfo_poll() were universally broken (they returned an errno value, invfo_poll() uses poll_no_poll() to return an appropriate event mask). Home-grown ioctl routines also tended to return an incorrect errno (invfo_ioctl returns ENOTTY). - Use the invfo_() functions instead of local versions for unsupported file operations. - Reorder fileops members to match the order in the structure definition to make it easier to spot missing members. - Add several missing methods to linuxfileops used by the OFED shim layer: fo_write(), fo_truncate(), fo_kqfilter(), and fo_stat(). Most of these used invfo_(), but a dummy fo_stat() implementation was added.	2014-09-12 21:29:10 +00:00
John Baldwin	0ed667f6e5	Simplify vntype_to_kinfo() by returning when the desired value is found instead of breaking out of the loop and then immediately checking the loop index so that if it was broken out of the proper value can be returned. While here, use nitems().	2014-09-12 20:56:09 +00:00
Mateusz Guzik	64196a9996	Plug unnecessary fp assignments in kern_fcntl. No functional changes.	2014-09-05 23:56:25 +00:00
Gleb Smirnoff	e86447ca44	- Remove socket file operations declaration from sys/file.h. - Make them static in sys_socket.c. - Provide generic invfo_truncate() instead of soo_truncate(). Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-08-26 14:44:08 +00:00
Mateusz Guzik	037755fd15	Fix up races with f_seqcount handling. It was possible that the kernel would overwrite user-supplied hint. Abuse vnode lock for this purpose. In collaboration with: kib MFC after: 1 week	2014-08-26 08:17:22 +00:00
Mateusz Guzik	a1bf811596	Prepare fget_unlocked for reading fd table only once. Some capsicum functions accept fdp + fd and lookup fde based on that. Add variants which accept fde. Reviewed by: pjd MFC after: 1 week	2014-07-23 19:33:49 +00:00
Mateusz Guzik	b23c40d7b1	Don't zero fd_nfiles during fdp destruction. Code trying to take a look has to check fd_refcnt and it is 0 by that time. This is a follow up to r268505, without this the code would leak memory for tables bigger than the default. MFC after: 1 week	2014-07-10 21:05:45 +00:00
Mateusz Guzik	e518baf8f9	Avoid relocking filedesc lock when closing fds during fdp destruction. Don't call bzero nor fdunused from fdfree for such cases. It would do unnecessary work and complain that the lock is not taken. MFC after: 1 week	2014-07-10 20:59:54 +00:00
Mateusz Guzik	b9d32c36fa	Make fdunshare accept only td parameter. Proc had to match the thread anyway and 2 parameters were inconsistent with the rest. MFC after: 1 week	2014-06-28 05:41:53 +00:00
Mateusz Guzik	35778d7aa9	Make sure to always clear p_fd for process getting rid of its filetable. Filetable can be shared with other processes. Previous code failed to clear the pointer for all but the last process getting rid of the table. This is mostly cosmetics. Get rid of 'This should happen earlier' comment. Clearing the pointer in this place is fine as consumers can reliably check for files availability by inspecting fd_refcnt and vnodes availabity by NULL-checking them. MFC after: 1 week	2014-06-28 05:18:03 +00:00
Mateusz Guzik	450570a55e	Tidy up fd-related functions called by do_execve o assert in each one that fdp is not shared o remove unnecessary NULL checks - all userspace processes have fdtables and kernel processes cannot execve o remove comments about the danger of fd_ofiles getting reallocated - fdtable is not shared and fd_ofiles could be only reallocated if new fd was about to be added, but if that was possible the code would already be buggy as setugidsafety work could be undone MFC after: 1 week	2014-06-23 01:28:18 +00:00
Mateusz Guzik	158627616c	Don't take filedesc lock in fdunshare(). We can read refcnt safely and only care if it is equal to 1. If it could suddenly change from 1 to something bigger the code would be buggy even in the previous form and transitions from > 1 to 1 are equally racy and harmless (we copy even though there is no need). MFC after: 1 week	2014-06-22 21:37:27 +00:00
Mateusz Guzik	adf87ab01c	fd: replace fd_nfiles with fd_lastfile where appropriate fd_lastfile is guaranteed to be the biggest open fd, so when the intent is to iterate over active fds or lookup one, there is no point in looking beyond that limit. Few places are left unpatched for now. MFC after: 1 week	2014-06-22 01:31:55 +00:00
Mateusz Guzik	0f0b852c73	do_dup: plug redundant adjustment of fd_lastfile By that time it was already set by fdalloc, or was there in the first place if fd is replaced. MFC after: 1 week	2014-06-22 00:53:33 +00:00
Mateusz Guzik	f2b1eaec33	Request a non-exiting process in sysctl_kern_proc_{o,}filedesc This fixes a race with exit1 freeing p_textvp. Suggested by: kib MFC after: 1 week	2014-05-02 21:55:09 +00:00
Mateusz Guzik	210a5d1689	Garbage collect fdavail. It rarely returns an error and fdallocn handles the failure of fdalloc just fine.	2014-04-04 05:07:36 +00:00
Mateusz Guzik	f804336026	Mark the following sysctls as MPSAFE: kern.file kern.proc.filedesc kern.proc.ofiledesc MFC after: 7 days	2014-03-21 19:12:05 +00:00
Mateusz Guzik	4c73e705a5	Take filedesc lock only for reading when allocating new fdtable. Code populating the table does this already. MFC after: 1 week	2014-03-21 01:34:19 +00:00
Robert Watson	4a14441044	Update kernel inclusions of capability.h to use capsicum.h instead; some further refinement is required as some device drivers intended to be portable over FreeBSD versions rely on __FreeBSD_version to decide whether to include capability.h. MFC after: 3 weeks	2014-03-16 10:55:57 +00:00
Bryan Drewery	63d8fe5531	Fix style of comment blocks. Reported by: peter Approved by: bapt (mentor, implicit) X-MFC with: r262006	2014-02-22 04:28:49 +00:00
Mateusz Guzik	1f9e8f8ad9	Fix a race between kern_proc_{o,}filedesc_out and fdescfree leading to use-after-free. fdescfree proceeds to free file pointers once fd_refcnt reaches 0, but kern_proc_{o,}filedesc_out only checked for hold count. MFC after: 3 days	2014-02-21 22:29:09 +00:00
Bryan Drewery	70f82cfbaf	Fix M_FILEDESC leak in fdgrowtable() introduced in r244510. fdgrowtable() now only reallocates fd_map when necessary. This fixes fdgrowtable() to use the same logic as fdescfree() for when to free the fd_map. The logic in fdescfree() is intended to not free the initial static allocation, however the fd_map grows at a slower rate than the table does. The table is intended to hold 20 fd, but its initial map has many more slots than 20. The slot sizing causes NDSLOTS(20) through NDSLOTS(63) to be 1 which matches NDSLOTS(20), so fdescfree() was assuming that the fd_map was still the initial allocation and not freeing it. This partially reverts r244510 by reintroducing some of the logic it removed in fdgrowtable(). Reviewed by: mjg Approved by: bapt (mentor) MFC after: 2 weeks	2014-02-17 00:00:39 +00:00
Bryan Drewery	88812f91aa	Remove redundant memcpy of fd_ofiles in fdgrowtable() added in r247602 Discussed with: mjg Approved by: bapt (mentor) MFC after: 2 weeks	2014-02-16 23:10:46 +00:00
Mateusz Guzik	231a0fe857	Plug a memory leak in dup2 when both old and new fd have ioctl caps. Reviewed by: pjd MFC after: 3 days	2014-01-03 16:36:55 +00:00
Mateusz Guzik	0918d4b21f	Don't check for fd limits in fdgrowtable_exp. Callers do that already and additional check races with process decreasing limits and can result in not growing the table at all, which is currently not handled. MFC after: 3 days	2014-01-03 16:34:16 +00:00
Adrian Chadd	79750e3b36	Migrate the sendfile_sync structure into a public(ish) API in preparation for extending and reusing it. The sendfile_sync wrapper is mostly just a "mbuf transaction" wrapper, used to indicate that the backing store for a group of mbufs has completed. It's only being used by sendfile for now and it's only implementing a sleep/wakeup rendezvous. However, there are other potential signaling paths (kqueue) and other potential uses (socket zero-copy write) where the same mechanism would also be useful. So, with that in mind: * extract the sendfile_sync code out into sf_sync_() methods teach the sf_sync_alloc method about the current config flag - it will eventually know about kqueue. * move the sendfile_sync code out of do_sendfile() - the only thing it now knows about is the sfs pointer. The guts of the sync rendezvous (setup, rendezvous/wait, free) is now done in the syscall wrapper. * .. and teach the 32-bit compat sendfile call the same. This should be a no-op. It's primarily preparation work for teaching the sendfile_sync about kqueue notification. Tested: * Peter Holm's sendfile stress / regression scripts Sponsored by: Netflix, Inc.	2013-12-01 03:53:21 +00:00
Pawel Jakub Dawidek	f2b525e6b9	Make process descriptors standard part of the kernel. rwhod(8) already requires process descriptors to work and having PROCDESC in GENERIC seems not enough, especially that we hope to have more and more consumers in the base. MFC after: 3 days	2013-11-30 15:08:35 +00:00
Konstantin Belousov	1744fe5048	When growing the file descriptor table, new larger memory chunk is allocated, but the old table is kept around to handle the case of threads still performing unlocked accesses to it. Grow the table exponentially instead of increasing its size by sizeof(long) * 8 chunks when overflowing. This mode significantly reduces the total memory use for the processes consuming large numbers of the file descriptors which open them one by one. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Approved by: re (marius)	2013-10-09 18:41:35 +00:00
Konstantin Belousov	3625bde45d	Reduce code duplication, introduce the getmaxfd() helper to calculate the max filedescriptor index. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Approved by: re (marius)	2013-10-09 18:39:44 +00:00
John-Mark Gurney	da9442ef43	it must be the last member, not might... Reviewed by: attilio Approved by: re (delphij, gjb)	2013-09-26 17:55:04 +00:00
Attilio Rao	57a9eeb4ed	Avoid memory accesses reordering which can result in fget_unlocked() seeing a stale fd_ofiles table once fd_nfiles is already updated, resulting in OOB accesses. Approved by: re (kib) Sponsored by: EMC / Isilon storage division Reported and tested by: pho Reviewed by: benno	2013-09-25 13:37:52 +00:00
Pawel Jakub Dawidek	ab568de789	Handle cases where capability rights are not provided. Reported by: kib	2013-09-05 11:58:12 +00:00
Pawel Jakub Dawidek	7008be5bd7	Change the cap_rights_t type from uint64_t to a structure that we can extend in the future in a backward compatible (API and ABI) way. The cap_rights_t represents capability rights. We used to use one bit to represent one right, but we are running out of spare bits. Currently the new structure provides place for 114 rights (so 50 more than the previous cap_rights_t), but it is possible to grow the structure to hold at least 285 rights, although we can make it even larger if 285 rights won't be enough. The structure definition looks like this: struct cap_rights { uint64_t cr_rights[CAP_RIGHTS_VERSION + 2]; }; The initial CAP_RIGHTS_VERSION is 0. The top two bits in the first element of the cr_rights[] array contain total number of elements in the array - 2. This means if those two bits are equal to 0, we have 2 array elements. The top two bits in all remaining array elements should be 0. The next five bits in all array elements contain array index. Only one bit is used and bit position in this five-bits range defines array index. This means there can be at most five array elements in the future. To define new right the CAPRIGHT() macro must be used. The macro takes two arguments - an array index and a bit to set, eg. #define CAP_PDKILL CAPRIGHT(1, 0x0000000000000800ULL) We still support aliases that combine few rights, but the rights have to belong to the same array element, eg: #define CAP_LOOKUP CAPRIGHT(0, 0x0000000000000400ULL) #define CAP_FCHMOD CAPRIGHT(0, 0x0000000000002000ULL) #define CAP_FCHMODAT (CAP_FCHMOD \| CAP_LOOKUP) There is new API to manage the new cap_rights_t structure: cap_rights_t cap_rights_init(cap_rights_t rights, ...); void cap_rights_set(cap_rights_t rights, ...); void cap_rights_clear(cap_rights_t rights, ...); bool cap_rights_is_set(const cap_rights_t rights, ...); bool cap_rights_is_valid(const cap_rights_t rights); void cap_rights_merge(cap_rights_t dst, const cap_rights_t src); void cap_rights_remove(cap_rights_t dst, const cap_rights_t src); bool cap_rights_contains(const cap_rights_t big, const cap_rights_t little); Capability rights to the cap_rights_init(), cap_rights_set(), cap_rights_clear() and cap_rights_is_set() functions are provided by separating them with commas, eg: cap_rights_t rights; cap_rights_init(&rights, CAP_READ, CAP_WRITE, CAP_FSTAT); There is no need to terminate the list of rights, as those functions are actually macros that take care of the termination, eg: #define cap_rights_set(rights, ...) \ __cap_rights_set((rights), __VA_ARGS__, 0ULL) void __cap_rights_set(cap_rights_t *rights, ...); Thanks to using one bit as an array index we can assert in those functions that there are no two rights belonging to different array elements provided together. For example this is illegal and will be detected, because CAP_LOOKUP belongs to element 0 and CAP_PDKILL to element 1: cap_rights_init(&rights, CAP_LOOKUP \| CAP_PDKILL); Providing several rights that belongs to the same array's element this way is correct, but is not advised. It should only be used for aliases definition. This commit also breaks compatibility with some existing Capsicum system calls, but I see no other way to do that. This should be fine as Capsicum is still experimental and this change is not going to 9.x. Sponsored by: The FreeBSD Foundation	2013-09-05 00:09:56 +00:00
Gleb Smirnoff	ca04d21d5f	Make sendfile() a method in the struct fileops. Currently only vnode backed file descriptors have this method implemented. Reviewed by: kib Sponsored by: Nginx, Inc. Sponsored by: Netflix	2013-08-15 07:54:31 +00:00
Mikolaj Golub	9e89077c65	Plug up the lock lock leakage when exporting to a short buffer. Reported by: Alexander Leidinger Submitted by: mjg MFC after: 1 week	2013-07-01 03:27:14 +00:00
Mateusz Guzik	07bd8bf929	Remove duplicate NULL check in kern_proc_filedesc_out. No functional changes. MFC after: 1 week	2013-06-28 18:32:46 +00:00
Mikolaj Golub	6359d169ef	Rework r252313: The filedesc lock may not be dropped unconditionally before exporting fd to sbuf: fd might go away during execution. While it is ok for DTYPE_VNODE and DTYPE_FIFO because the export is from a vrefed vnode here, for other types it is unsafe. Instead, drop the lock in export_fd_to_sb(), after preparing data in memory and before writing to sbuf. Spotted by: mjg Suggested by: kib Review by: kib MFC after: 1 week	2013-06-28 18:07:41 +00:00
Mikolaj Golub	bd973910c8	To avoid LOR, always drop the filedesc lock before exporting fd to sbuf. Reviewed by: kib MFC after: 3 days	2013-06-27 19:14:03 +00:00
John Baldwin	958aa57537	Similar to 233760 and 236717, export some more useful info about the kernel-based POSIX semaphore descriptors to userland via procstat(1) and fstat(1): - Change sem file descriptors to track the pathname they are associated with and add a ksem_info() method to copy the path out to a caller-supplied buffer. - Use the fo_stat() method of shared memory objects and ksem_info() to export the path, mode, and value of a semaphore via struct kinfo_file. - Add a struct semstat to the libprocstat(3) interface along with a procstat_get_sem_info() to export the mode and value of a semaphore. - Teach fstat about semaphores and to display their path, mode, and value. MFC after: 2 weeks	2013-05-03 21:11:57 +00:00

1 2 3 4 5 ...

492 Commits