freebsd-dev

Author	SHA1	Message	Date
Pawel Jakub Dawidek	6d4e99aaef	If the target file already exists, check for the CAP_UNLINKAT capabiity right on the target directory descriptor, but only if this is renameat(2) and real target directory descriptor is given (not AT_FDCWD). Without this fix regular rename(2) fails if the target file already exists. Reported by: Michael Butler <imb@protected-networks.net> Reported by: Larry Rosenman <ler@lerctr.org> Sponsored by: The FreeBSD Foundation	2013-03-02 09:58:47 +00:00
Pawel Jakub Dawidek	1dc31587bf	Regen after r247602.	2013-03-02 00:55:09 +00:00
Pawel Jakub Dawidek	2609222ab4	Merge Capsicum overhaul: - Capability is no longer separate descriptor type. Now every descriptor has set of its own capability rights. - The cap_new(2) system call is left, but it is no longer documented and should not be used in new code. - The new syscall cap_rights_limit(2) should be used instead of cap_new(2), which limits capability rights of the given descriptor without creating a new one. - The cap_getrights(2) syscall is renamed to cap_rights_get(2). - If CAP_IOCTL capability right is present we can further reduce allowed ioctls list with the new cap_ioctls_limit(2) syscall. List of allowed ioctls can be retrived with cap_ioctls_get(2) syscall. - If CAP_FCNTL capability right is present we can further reduce fcntls that can be used with the new cap_fcntls_limit(2) syscall and retrive them with cap_fcntls_get(2). - To support ioctl and fcntl white-listing the filedesc structure was heavly modified. - The audit subsystem, kdump and procstat tools were updated to recognize new syscalls. - Capability rights were revised and eventhough I tried hard to provide backward API and ABI compatibility there are some incompatible changes that are described in detail below: CAP_CREATE old behaviour: - Allow for openat(2)+O_CREAT. - Allow for linkat(2). - Allow for symlinkat(2). CAP_CREATE new behaviour: - Allow for openat(2)+O_CREAT. Added CAP_LINKAT: - Allow for linkat(2). ABI: Reuses CAP_RMDIR bit. - Allow to be target for renameat(2). Added CAP_SYMLINKAT: - Allow for symlinkat(2). Removed CAP_DELETE. Old behaviour: - Allow for unlinkat(2) when removing non-directory object. - Allow to be source for renameat(2). Removed CAP_RMDIR. Old behaviour: - Allow for unlinkat(2) when removing directory. Added CAP_RENAMEAT: - Required for source directory for the renameat(2) syscall. Added CAP_UNLINKAT (effectively it replaces CAP_DELETE and CAP_RMDIR): - Allow for unlinkat(2) on any object. - Required if target of renameat(2) exists and will be removed by this call. Removed CAP_MAPEXEC. CAP_MMAP old behaviour: - Allow for mmap(2) with any combination of PROT_NONE, PROT_READ and PROT_WRITE. CAP_MMAP new behaviour: - Allow for mmap(2)+PROT_NONE. Added CAP_MMAP_R: - Allow for mmap(PROT_READ). Added CAP_MMAP_W: - Allow for mmap(PROT_WRITE). Added CAP_MMAP_X: - Allow for mmap(PROT_EXEC). Added CAP_MMAP_RW: - Allow for mmap(PROT_READ \| PROT_WRITE). Added CAP_MMAP_RX: - Allow for mmap(PROT_READ \| PROT_EXEC). Added CAP_MMAP_WX: - Allow for mmap(PROT_WRITE \| PROT_EXEC). Added CAP_MMAP_RWX: - Allow for mmap(PROT_READ \| PROT_WRITE \| PROT_EXEC). Renamed CAP_MKDIR to CAP_MKDIRAT. Renamed CAP_MKFIFO to CAP_MKFIFOAT. Renamed CAP_MKNODE to CAP_MKNODEAT. CAP_READ old behaviour: - Allow pread(2). - Disallow read(2), readv(2) (if there is no CAP_SEEK). CAP_READ new behaviour: - Allow read(2), readv(2). - Disallow pread(2) (CAP_SEEK was also required). CAP_WRITE old behaviour: - Allow pwrite(2). - Disallow write(2), writev(2) (if there is no CAP_SEEK). CAP_WRITE new behaviour: - Allow write(2), writev(2). - Disallow pwrite(2) (CAP_SEEK was also required). Added convinient defines: #define CAP_PREAD (CAP_SEEK \| CAP_READ) #define CAP_PWRITE (CAP_SEEK \| CAP_WRITE) #define CAP_MMAP_R (CAP_MMAP \| CAP_SEEK \| CAP_READ) #define CAP_MMAP_W (CAP_MMAP \| CAP_SEEK \| CAP_WRITE) #define CAP_MMAP_X (CAP_MMAP \| CAP_SEEK \| 0x0000000000000008ULL) #define CAP_MMAP_RW (CAP_MMAP_R \| CAP_MMAP_W) #define CAP_MMAP_RX (CAP_MMAP_R \| CAP_MMAP_X) #define CAP_MMAP_WX (CAP_MMAP_W \| CAP_MMAP_X) #define CAP_MMAP_RWX (CAP_MMAP_R \| CAP_MMAP_W \| CAP_MMAP_X) #define CAP_RECV CAP_READ #define CAP_SEND CAP_WRITE #define CAP_SOCK_CLIENT \ (CAP_CONNECT \| CAP_GETPEERNAME \| CAP_GETSOCKNAME \| CAP_GETSOCKOPT \| \ CAP_PEELOFF \| CAP_RECV \| CAP_SEND \| CAP_SETSOCKOPT \| CAP_SHUTDOWN) #define CAP_SOCK_SERVER \ (CAP_ACCEPT \| CAP_BIND \| CAP_GETPEERNAME \| CAP_GETSOCKNAME \| \ CAP_GETSOCKOPT \| CAP_LISTEN \| CAP_PEELOFF \| CAP_RECV \| CAP_SEND \| \ CAP_SETSOCKOPT \| CAP_SHUTDOWN) Added defines for backward API compatibility: #define CAP_MAPEXEC CAP_MMAP_X #define CAP_DELETE CAP_UNLINKAT #define CAP_MKDIR CAP_MKDIRAT #define CAP_RMDIR CAP_UNLINKAT #define CAP_MKFIFO CAP_MKFIFOAT #define CAP_MKNOD CAP_MKNODAT #define CAP_SOCK_ALL (CAP_SOCK_CLIENT \| CAP_SOCK_SERVER) Sponsored by: The FreeBSD Foundation Reviewed by: Christoph Mallon <christoph.mallon@gmx.de> Many aspects discussed with: rwatson, benl, jonathan ABI compatibility discussed with: kib	2013-03-02 00:53:12 +00:00
John Baldwin	f9379dc411	Replace the TDP_NOSLEEPING flag with a counter so that the THREAD_NO_SLEEPING() and THREAD_SLEEPING_OK() macros can nest. Reviewed by: attilio	2013-03-01 22:03:31 +00:00
Pawel Jakub Dawidek	71ac38e896	Remove unnecessary variables.	2013-03-01 21:58:56 +00:00
Pawel Jakub Dawidek	f4d0191b22	Reduce lock scope a little.	2013-03-01 21:57:02 +00:00
Marius Strobl	db9066f798	- Use strdup(9) instead of reimplementing it. - Use __DECONST instead of strange casts. - Reduce code duplication and simplify name2oid(). PR: 176373 Submitted by: Christoph Mallon MFC after: 1 week	2013-03-01 18:49:14 +00:00
Konstantin Belousov	58248e57ab	Make the default implementation of the VOP_VPTOCNP() fail if the directory entry, matched by the inode number, is ".". NFSv4 client might instantiate the distinct vnodes which have the same inode number, since single v4 export can be combined from several filesystems on the server. For instance, a case when the nested server mount point is exactly one directory below the top of the export, causes directory and its parent to have the same inode number 2. The vop_stdvptocnp() algorithm then returns "." as the name of the lower directory. Filtering out the "." entry with ENOENT works around this behaviour, the error forces getcwd(3) to fall back to usermode implementation, which compares both st_dev and st_ino. Based on the submission by: rmacklem Tested by: rmacklem MFC after: 1 week	2013-03-01 18:40:14 +00:00
Davide Italiano	e234a588cb	MFcalloutng: Style fixes.	2013-02-28 16:22:49 +00:00
Alexander Motin	fdc5dd2d2f	MFcalloutng: Switch eventtimers(9) from using struct bintime to sbintime_t. Even before this not a single driver really supported full dynamic range of struct bintime even in theory, not speaking about practical inexpediency. This change legitimates the status quo and cleans up the code.	2013-02-28 13:46:03 +00:00
Davide Italiano	acccf7d8b4	MFcalloutng: When CPU becomes idle, cpu_idleclock() calculates time to the next timer event in order to reprogram hw timer. Return that time in sbintime_t to the caller and pass it to acpi_cpu_idle(), where it can be used as one more factor (quite precise) to extimate furter sleep time and choose optimal sleep state. This is a preparatory change for further callout improvements will be committed in the next days. The commmit is not targeted for MFC.	2013-02-28 10:46:54 +00:00
Konstantin Belousov	20f4e3e158	Make recursive getblk() slightly more useful. Keep the buffer state intact if getblk() is done on the already owned buffer. Exit from brelse() early when the lock recursion is detected, otherwise brelse() might prematurely destroy the buffer under some circumstances. Sponsored by: The FreeBSD Foundation Noted by: mckusick Tested by: pho MFC after: 2 weeks	2013-02-27 07:34:09 +00:00
Alexander Motin	1af19ee4a2	Add support for good old 8192Hz profiling clock to software PMC. Reviewed by: fabient	2013-02-26 18:13:42 +00:00
Attilio Rao	590f9303e5	Merge from vmobj-rwlock branch: Remove unused inclusion of vm/vm_pager.h and vm/vnode_pager.h. Sponsored by: EMC / Isilon storage division Tested by: pho Reviewed by: alc	2013-02-26 01:00:11 +00:00
Pawel Jakub Dawidek	1d59211b2e	Style. Suggested by: kib	2013-02-25 20:51:29 +00:00
Pawel Jakub Dawidek	893365e42d	After r237012, the fdgrowtable() doesn't drop the filedesc lock anymore, so update a stale comment. Reviewed by: kib, keramida	2013-02-25 20:50:08 +00:00
John Baldwin	593efaf9f7	Further refine the handling of stop signals in the NFS client. The changes in r246417 were incomplete as they did not add explicit calls to sigdeferstop() around all the places that previously passed SBDRY to _sleep(). In addition, nfs_getcacheblk() could trigger a write RPC from getblk() resulting in sigdeferstop() recursing. Rather than manually deferring stop signals in specific places, change the VFS_() and VOP_() methods to defer stop signals for filesystems which request this behavior via a new VFCF_SBDRY flag. Note that this has to be a VFC flag rather than a MNTK flag so that it works properly with VFS_MOUNT() when the mount is not yet fully constructed. For now, only the NFS clients are set this new flag in VFS_SET(). A few other related changes: - Add an assertion to ensure that TDF_SBDRY doesn't leak to userland. - When a lookup request uses VOP_READLINK() to follow a symlink, mark the request as being on behalf of the thread performing the lookup (cnp_thread) rather than using a NULL thread pointer. This causes NFS to properly handle signals during this VOP on an interruptible mount. PR: kern/176179 Reported by: Russell Cattelan (sigdeferstop() recursion) Reviewed by: kib MFC after: 1 month	2013-02-21 19:02:50 +00:00
Jamie Gritton	ffc72591b1	Don't worry if a module is already loaded when looking for a fstype to mount (possible in a race condition). Reviewed by: kib MFC after: 1 week	2013-02-21 02:41:37 +00:00
John Baldwin	353374b525	Fix a few typos.	2013-02-19 16:35:27 +00:00
Pawel Jakub Dawidek	b2e054b0d4	Update the comment: we do show the backtrace of misbehaving thread.	2013-02-17 21:37:32 +00:00
Pawel Jakub Dawidek	f0ad2ecb9c	Style.	2013-02-17 11:56:36 +00:00
Pawel Jakub Dawidek	8e1d51ab40	- Require CAP_FSYNC capability right when opening a file with O_SYNC or O_FSYNC flags. - While here simplify check for locking flags. Sponsored by: The FreeBSD Foundation	2013-02-17 11:53:51 +00:00
Pawel Jakub Dawidek	11b0cfe3cd	Remove redundant parenthesis.	2013-02-17 11:49:21 +00:00
Pawel Jakub Dawidek	49549b1894	Remove redundant space.	2013-02-17 11:48:16 +00:00
Pawel Jakub Dawidek	6c08be2b88	Add break to the default case.	2013-02-17 11:47:58 +00:00
Pawel Jakub Dawidek	4881a5950e	Don't treat pointers as booleans.	2013-02-17 11:47:30 +00:00
Pawel Jakub Dawidek	de26549841	Remove redundant parenthesis.	2013-02-17 11:47:01 +00:00
Kirk McKusick	2bc1a1fe5c	Add barrier write capability to the VFS buffer interface. A barrier write is a disk write request that tells the disk that the buffer being written must be committed to the media along with any writes that preceeded it before any future blocks may be written to the drive. Barrier writes are provided by adding the functions bbarrierwrite (bwrite with barrier) and babarrierwrite (bawrite with barrier). Following a bbarrierwrite the client knows that the requested buffer is on the media. It does not ensure that buffers written before that buffer are on the media. It only ensure that buffers written before that buffer will get to the media before any buffers written after that buffer. A flush command must be sent to the disk to ensure that all earlier written buffers are on the media. Reviewed by: kib Tested by: Peter Holm	2013-02-16 14:51:30 +00:00
Ian Lepore	a1137de941	Add PPS_CANWAIT support for time_pps_fetch(). This adds support for all three blocking modes described in section 3.4.3 of RFC 2783, allowing the caller to retrieve the most recent values without blocking, to block for a specified time, or to block forever. Reviewed by: discussion on hackers@	2013-02-15 18:30:32 +00:00
Sergey Kandaurov	d7ffa24831	vn_io_faults_cnt: - use u_long consistently - use SYSCTL_ULONG to match the type of variable Reviewed by: kib MFC after: 1 week	2013-02-15 14:22:05 +00:00
Sergey Kandaurov	ab15d8039e	Add support of passing SCM_BINTIME ancillary data object for PF_LOCAL sockets. PR: kern/175883 Submitted by: Andrey Simonenko <simon@comsys.ntu-kpi.kiev.ua> Discussed with: glebius, phk MFC after: 2 weeks	2013-02-15 13:00:20 +00:00
Ian Lepore	74938cbb7f	Make the F_READAHEAD option to fcntl(2) work as documented: a value of zero now disables read-ahead. It used to effectively restore the system default readahead hueristic if it had been changed; a negative value now restores the default. Reviewed by: kib	2013-02-13 15:09:16 +00:00
Konstantin Belousov	dd0b4fb6d5	Reform the busdma API so that new types may be added without modifying every architecture's busdma_machdep.c. It is done by unifying the bus_dmamap_load_buffer() routines so that they may be called from MI code. The MD busdma is then given a chance to do any final processing in the complete() callback. The cam changes unify the bus_dmamap_load* handling in cam drivers. The arm and mips implementations are updated to track virtual addresses for sync(). Previously this was done in a type specific way. Now it is done in a generic way by recording the list of virtuals in the map. Submitted by: jeff (sponsored by EMC/Isilon) Reviewed by: kan (previous version), scottl, mjacob (isp(4), no objections for target mode changes) Discussed with: ian (arm changes) Tested by: marius (sparc64), mips (jmallet), isci(4) on x86 (jharris), amd64 (Fabian Keil <freebsd-listen@fabiankeil.de>)	2013-02-12 16:57:20 +00:00
Marius Strobl	18716f9f4b	Update comments to reflect r246689.	2013-02-11 23:05:10 +00:00
Marius Strobl	bdc5f0172e	Make SYSCTL_{LONG,QUAD,ULONG,UQUAD}(9) work as advertised and also handle constant values. Reviewed by: kib MFC after: 3 days	2013-02-11 21:50:00 +00:00
Konstantin Belousov	2871baa49a	Remove the ia64-specific code fragment, which effect is more cleanly done by the call to trans_prot() function a line before. Discussed with: Oliver Pinter <oliver.pntr@gmail.com> MFC after: 1 week	2013-02-10 20:08:33 +00:00
Andriy Gapon	c43b08dc6c	ktr: correctly handle possible wrap-around in the boot buffer Older entries should be 'before' newer entries in the new buffer too and there should be no zero-filled gap between them. Pointed out by: jhb MFC after: 3 days X-MFC with: r246282	2013-02-08 07:29:07 +00:00
Konstantin Belousov	888d4d4f86	When vforked child is traced, the debugging events are not generated until child performs exec(). The behaviour is reasonable when a debugger is the real parent, because the parent is stopped until exec(), and sending a debugging event to the debugger would deadlock both parent and child. On the other hand, when debugger is not the parent of the vforked child, not sending debugging signals makes it impossible to debug across vfork. Fix the issue by declining generating debug signals only when vfork() was done and child called ptrace(PT_TRACEME). Set a new process flag P_PPTRACE from the attach code for PT_TRACEME, if P_PPWAIT flag is set, which indicates that the process was created with vfork() and still did not execed. Check P_PPTRACE from issignal(), instead of refusing the trace outright for the P_PPWAIT case. The scope of P_PPTRACE is exactly contained in the scope of P_PPWAIT. Found and tested by: zont Reviewed by: pluknet MFC after: 2 weeks	2013-02-07 15:34:22 +00:00
Konstantin Belousov	2ca4998342	Stop translating the ERESTART error from the open(2) into EINTR. Posix requires that open(2) is restartable for SA_RESTART. For non-posix objects, in particular, devfs nodes, still disable automatic restart of the opens. The open call to a driver could have significant side effects for the hardware. Noted and reviewed by: jilles Discussed with: bde MFC after: 2 weeks	2013-02-07 14:53:33 +00:00
Neel Natu	dae3dc73f6	If an interrupt event's assign_cpu method fails, then restore the original cpuset mask for the associated interrupt thread. The text used above is verbatim from r195249 and the code should now be in line with the intent of that commit.	2013-02-07 06:48:47 +00:00
Pawel Jakub Dawidek	fbda3d5dae	Audit sockaddr argument for bind(2), connect(2), accept(2), sendto(2) and recvfrom(2) syscalls. Sponsored by: The FreeBSD Foundation	2013-02-07 00:36:00 +00:00
Pawel Jakub Dawidek	82b316b377	Minor style tweaks.	2013-02-07 00:27:11 +00:00
John Baldwin	a120a7a3cd	Rework the handling of stop signals in the NFS client. The changes in 195702, 195703, and 195821 prevented a thread from suspending while holding locks inside of NFS by forcing the thread to fail sleeps with EINTR or ERESTART but defer the thread suspension to the user boundary. However, this had the effect that stopping a process during an NFS request could abort the request and trigger EINTR errors that were visible to userland processes (previously the thread would have suspended and completed the request once it was resumed). This change instead effectively masks stop signals while in the NFS client. It uses the existing TDF_SBDRY flag to effect this since SIGSTOP cannot be masked directly. Also, instead of setting PBDRY on individual sleeps, the NFS client now sets the TDF_SBDRY flag around each NFS request and stop signals are masked for all sleeps during that region (the previous change missed sleeps in lockmgr locks). The end result is that stop signals sent to threads performing an NFS request are completely ignored until after the NFS request has finished processing and the thread prepares to return to userland. This restores the behavior of stop signals being transparent to userland processes while still preventing threads from suspending while holding NFS locks. Reviewed by: kib MFC after: 1 month	2013-02-06 17:06:51 +00:00
Sergey Kandaurov	23c053d6a2	Prezero the acl structure which is to be copied to usermode, to avoid leakage of the previous content of padding and unitialized fields. Reported by: Ilia Noskov <noskov@nic.ru> Reviewed by: kib MFC after: 1 week	2013-02-06 15:18:46 +00:00
Sergey Kandaurov	51dc4fea4c	Remove reference to the rlist code from comments, and fix a typo visible in the resulted change. Reviewed by: kib MFC after: 1 week	2013-02-05 20:08:33 +00:00
Andriy Gapon	c8199bc955	ktr: prevent possible footshooting with KTR_ENTRIES and KTR_BOOT_ENTRIES Suggested by: adrian MFC after: 14 days X-MFC with: r246282	2013-02-04 21:58:57 +00:00
Andriy Gapon	f85ed12497	ktr: copy content from the early static buffer if KTR_ENTRIES != KTR_BOOT_ENTRIES Reported by: glebius, jhb Pointyhat to: avg MFC after: 14 days X-MFC with: r246282	2013-02-04 21:50:55 +00:00
Marius Strobl	94bfd5b1a0	Try to improve r242655 take III: move these SYSCTLs describing the kernel map, which is defined and initialized in vm/vm_kern.c, to the latter. Submitted by: alc	2013-02-04 09:35:48 +00:00
Marius Strobl	e8cbe54bc4	Further improve r242655 and supply VM_{MIN,MAX}_KERNEL_ADDRESS as constant values to SYSCTL_ULONG(9) where possible. Submitted by: bde	2013-02-03 21:43:55 +00:00
Andriy Gapon	36b7dde416	allow for large KTR_ENTRIES values by allocating ktr_buf using malloc(9) Only during very early boot, before malloc(9) is functional (SI_SUB_KMEM), the static ktr_buf_init is used. Size of the static buffer is determined by a new kernel option KTR_BOOT_ENTRIES. Its default value is 1024. This commit builds on top of r243046. Reviewed by: alc MFC after: 17 days	2013-02-03 09:57:39 +00:00

1 2 3 4 5 ...

13068 Commits