freebsd-dev

Author	SHA1	Message	Date
Alexander Motin	980c545d76	Fix time math overflows and improve zero intervals handling in poll(), select(), nanosleep() and kevent() functions after calloutng changes. Reported by: bde	2013-03-06 19:37:38 +00:00
Fabien Thomas	d49302aead	Add a generic way to call per event allocate / release function. Reviewed by: mav MFC after: 1 month	2013-03-05 10:18:48 +00:00
Davide Italiano	ac42a1726a	Complete r247813: Use true/false instead of TRUE/FALSE. Reported by: attilio Requested by: jhb	2013-03-04 21:52:12 +00:00
Davide Italiano	a4a3ce9919	Use C99 'bool' rather than Machish 'boolean_t'. Requested by: jhb	2013-03-04 21:09:22 +00:00
Davide Italiano	40e794ab19	MFcalloutng: - Rewrite kevent() timeout implementation to allow sub-tick precision. - Make the interval timings for EVFILT_TIMER more accurate. This also removes an hack introduced in r238424. Sponsored by: Google Summer of Code 2012, iXsystems inc. Tested by: flo, marius, ian, markj, Fabian Keil	2013-03-04 16:55:16 +00:00
Davide Italiano	cf5e4fe6bb	MFcalloutng: Fix kern_select() and sys_poll() so that they can handle sub-tick precision for timeouts (in the same fashion it was done for nanosleep() in r247797). Sponsored by: Google Summer of Code 2012, iXsystems inc. Tested by: flo, marius, ian, markj, Fabian Keil	2013-03-04 16:41:27 +00:00
Davide Italiano	4601bab1fb	MFcalloutng (r244251 with minor changes): Specify that precision of 0.5s is enough for resource limitation. Sponsored by: Google Summer of Code 2012, iXsystems inc. Tested by: flo, marius, ian, markj, Fabian Keil	2013-03-04 16:25:12 +00:00
Davide Italiano	c38250c9b9	MFcalloutng (r244255 by mav, with minor changes): Specify that syslog doesn't need exactly 5 wakeups per second. Sponsored by: Google Summer of Code 2012, iXsystems inc. Tested by: flo, marius, ian, markj, Fabian Keil	2013-03-04 16:07:55 +00:00
Davide Italiano	098176f0d0	MFcalloutng: kern_nanosleep() is now converted to use tsleep_sbt(). With this change nanosleep() and usleep() can handle sub-tick precision for timeouts. Also, try to help coalesce of events passing as argument to tsleep_bt() a precision value calculated as a percentage of the sleep time. This percentage is default 5%, but it can tuned according to users need via the sysctl interface. Sponsored by: Google Summer of Code 2012, iXsystems inc. Tested by: flo, marius, ian, markj, Fabian Keil	2013-03-04 15:57:41 +00:00
Davide Italiano	037637812d	Fix build with DIAGNOSTIC/CALLOUT_PROFILING options turned on. Reported by: kib, David Wolfskill <david at catwhisker dot org> Pointy-hat to: davide	2013-03-04 15:03:52 +00:00
Davide Italiano	24e48c6d5b	MFcalloutng: Introduce sbt variants of msleep(), msleep_spin(), pause(), tsleep() in the KPI, allowing to specify timeout in 'sbintime_t' rather than ticks. Sponsored by: Google Summer of Code 2012, iXsystems inc. Tested by: flo, marius, ian, markj, Fabian Keil	2013-03-04 12:48:41 +00:00
Davide Italiano	461537356a	MFcalloutng: Extend condvar(9) KPI introducing sbt variant of cv_timedwait. This rely on the previously committed sleepq_set_timeout_sbt(). Sponsored by: Google Summer of Code 2012, iXsystems inc. Tested by: flo, marius, ian, markj, Fabian Keil	2013-03-04 12:20:48 +00:00
Davide Italiano	965ac611ec	MFcalloutng: Convert sleepqueue(9) bits to the new callout KPI. Take advantage of the possibility to run callback directly from hw interrupt context. Sponsored by: Google Summer of Code 2012, iXsystems inc. Tested by: flo, marius, ian, markj, Fabian Keil	2013-03-04 11:51:46 +00:00
Davide Italiano	dbd2e1677f	MFcalloutng (r244355): Make loadavg calculation callout direct. There are several reasons for it: - it is very simple and doesn't worth context switch to SWI; - since SWI is no longer used here, we can remove twelve years old hack, excluding this SWI from from the loadavg statistics; - it fixes problem when eventtimer (HPET) shares interrupt with some other device, and that interrupt thread counted as permanent loadavg of 1; now loadavg accounted before that interrupt thread is scheduled. Sponsored by: Google Summer of Code 2012, iXsystems inc. Tested by: flo, marius, ian, Fabian Keil, markj	2013-03-04 11:22:19 +00:00
Davide Italiano	5b999a6be0	- Make callout(9) tickless, relying on eventtimers(4) as backend for precise time event generation. This greatly improves granularity of callouts which are not anymore constrained to wait next tick to be scheduled. - Extend the callout KPI introducing a set of callout_reset_sbt* functions, which take a sbintime_t as timeout argument. The new KPI also offers a way for consumers to specify precision tolerance they allow, so that callout can coalesce events and reduce number of interrupts as well as potentially avoid scheduling a SWI thread. - Introduce support for dispatching callouts directly from hardware interrupt context, specifying an additional flag. This feature should be used carefully, as long as interrupt context has some limitations (e.g. no sleeping locks can be held). - Enhance mechanisms to gather informations about callwheel, introducing a new sysctl to obtain stats. This change breaks the KBI. struct callout fields has been changed, in particular 'int ticks' (4 bytes) has been replaced with 'sbintime_t' (8 bytes) and another 'sbintime_t' field was added for precision. Together with: mav Reviewed by: attilio, bde, luigi, phk Sponsored by: Google Summer of Code 2012, iXsystems inc. Tested by: flo (amd64, sparc64), marius (sparc64), ian (arm), markj (amd64), mav, Fabian Keil	2013-03-04 11:09:56 +00:00
Pawel Jakub Dawidek	8cb539f18f	For some reason when I started to pass filedescent structures instead of pointers to the file structure receiving descriptors stopped to work when also at least few kilobytes of data is being send. In the kernel the soreceive_generic() function doesn't see control mbuf as the first mbuf and unp_externalize() is never called, first 6(?) kilobytes of data is missing as well on receiving end. This breaks for example tmux. I don't know yet why going from 8 bytes to sizeof(struct filedescent) per descriptor (or even to 16 bytes per descriptor) breaks things, but to work-around it for now use 8 bytes per file descriptor at the cost of memory allocation. Reported by: flo, Diane Bruce, Jan Beich <jbeich@tormail.org> Simple testcase provided by: mjg	2013-03-03 23:39:30 +00:00
Pawel Jakub Dawidek	5f39e56581	Use dedicated malloc type for filecaps-related data, so we can detect any memory leaks easier.	2013-03-03 23:25:45 +00:00
Pawel Jakub Dawidek	a6157c3d61	Plug memory leaks in file descriptors passing.	2013-03-03 23:23:35 +00:00
Davide Italiano	3f555c45eb	callwheelmask and callwheelsize are always greater than zero. Switch their type to u_int.	2013-03-03 15:01:33 +00:00
Davide Italiano	0fb285b716	Remove a couple of unused include.	2013-03-03 14:47:02 +00:00
Alexander Motin	4514d6fa18	MFcalloutng: Some whitespace fixes.	2013-03-03 09:11:24 +00:00
Pawel Jakub Dawidek	378a73d1bd	Regen after r247667.	2013-03-02 21:12:54 +00:00
Pawel Jakub Dawidek	7493f24ee6	- Implement two new system calls: int bindat(int fd, int s, const struct sockaddr addr, socklen_t addrlen); int connectat(int fd, int s, const struct sockaddr name, socklen_t namelen); which allow to bind and connect respectively to a UNIX domain socket with a path relative to the directory associated with the given file descriptor 'fd'. - Add manual pages for the new syscalls. - Make the new syscalls available for processes in capability mode sandbox. - Add capability rights CAP_BINDAT and CAP_CONNECTAT that has to be present on the directory descriptor for the syscalls to work. - Update audit(4) to support those two new syscalls and to handle path in sockaddr_un structure relative to the given directory descriptor. - Update procstat(1) to recognize the new capability rights. - Document the new capability rights in cap_rights_limit(2). Sponsored by: The FreeBSD Foundation Discussed with: rwatson, jilles, kib, des	2013-03-02 21:11:30 +00:00
Pawel Jakub Dawidek	6d4e99aaef	If the target file already exists, check for the CAP_UNLINKAT capabiity right on the target directory descriptor, but only if this is renameat(2) and real target directory descriptor is given (not AT_FDCWD). Without this fix regular rename(2) fails if the target file already exists. Reported by: Michael Butler <imb@protected-networks.net> Reported by: Larry Rosenman <ler@lerctr.org> Sponsored by: The FreeBSD Foundation	2013-03-02 09:58:47 +00:00
Pawel Jakub Dawidek	1dc31587bf	Regen after r247602.	2013-03-02 00:55:09 +00:00
Pawel Jakub Dawidek	2609222ab4	Merge Capsicum overhaul: - Capability is no longer separate descriptor type. Now every descriptor has set of its own capability rights. - The cap_new(2) system call is left, but it is no longer documented and should not be used in new code. - The new syscall cap_rights_limit(2) should be used instead of cap_new(2), which limits capability rights of the given descriptor without creating a new one. - The cap_getrights(2) syscall is renamed to cap_rights_get(2). - If CAP_IOCTL capability right is present we can further reduce allowed ioctls list with the new cap_ioctls_limit(2) syscall. List of allowed ioctls can be retrived with cap_ioctls_get(2) syscall. - If CAP_FCNTL capability right is present we can further reduce fcntls that can be used with the new cap_fcntls_limit(2) syscall and retrive them with cap_fcntls_get(2). - To support ioctl and fcntl white-listing the filedesc structure was heavly modified. - The audit subsystem, kdump and procstat tools were updated to recognize new syscalls. - Capability rights were revised and eventhough I tried hard to provide backward API and ABI compatibility there are some incompatible changes that are described in detail below: CAP_CREATE old behaviour: - Allow for openat(2)+O_CREAT. - Allow for linkat(2). - Allow for symlinkat(2). CAP_CREATE new behaviour: - Allow for openat(2)+O_CREAT. Added CAP_LINKAT: - Allow for linkat(2). ABI: Reuses CAP_RMDIR bit. - Allow to be target for renameat(2). Added CAP_SYMLINKAT: - Allow for symlinkat(2). Removed CAP_DELETE. Old behaviour: - Allow for unlinkat(2) when removing non-directory object. - Allow to be source for renameat(2). Removed CAP_RMDIR. Old behaviour: - Allow for unlinkat(2) when removing directory. Added CAP_RENAMEAT: - Required for source directory for the renameat(2) syscall. Added CAP_UNLINKAT (effectively it replaces CAP_DELETE and CAP_RMDIR): - Allow for unlinkat(2) on any object. - Required if target of renameat(2) exists and will be removed by this call. Removed CAP_MAPEXEC. CAP_MMAP old behaviour: - Allow for mmap(2) with any combination of PROT_NONE, PROT_READ and PROT_WRITE. CAP_MMAP new behaviour: - Allow for mmap(2)+PROT_NONE. Added CAP_MMAP_R: - Allow for mmap(PROT_READ). Added CAP_MMAP_W: - Allow for mmap(PROT_WRITE). Added CAP_MMAP_X: - Allow for mmap(PROT_EXEC). Added CAP_MMAP_RW: - Allow for mmap(PROT_READ \| PROT_WRITE). Added CAP_MMAP_RX: - Allow for mmap(PROT_READ \| PROT_EXEC). Added CAP_MMAP_WX: - Allow for mmap(PROT_WRITE \| PROT_EXEC). Added CAP_MMAP_RWX: - Allow for mmap(PROT_READ \| PROT_WRITE \| PROT_EXEC). Renamed CAP_MKDIR to CAP_MKDIRAT. Renamed CAP_MKFIFO to CAP_MKFIFOAT. Renamed CAP_MKNODE to CAP_MKNODEAT. CAP_READ old behaviour: - Allow pread(2). - Disallow read(2), readv(2) (if there is no CAP_SEEK). CAP_READ new behaviour: - Allow read(2), readv(2). - Disallow pread(2) (CAP_SEEK was also required). CAP_WRITE old behaviour: - Allow pwrite(2). - Disallow write(2), writev(2) (if there is no CAP_SEEK). CAP_WRITE new behaviour: - Allow write(2), writev(2). - Disallow pwrite(2) (CAP_SEEK was also required). Added convinient defines: #define CAP_PREAD (CAP_SEEK \| CAP_READ) #define CAP_PWRITE (CAP_SEEK \| CAP_WRITE) #define CAP_MMAP_R (CAP_MMAP \| CAP_SEEK \| CAP_READ) #define CAP_MMAP_W (CAP_MMAP \| CAP_SEEK \| CAP_WRITE) #define CAP_MMAP_X (CAP_MMAP \| CAP_SEEK \| 0x0000000000000008ULL) #define CAP_MMAP_RW (CAP_MMAP_R \| CAP_MMAP_W) #define CAP_MMAP_RX (CAP_MMAP_R \| CAP_MMAP_X) #define CAP_MMAP_WX (CAP_MMAP_W \| CAP_MMAP_X) #define CAP_MMAP_RWX (CAP_MMAP_R \| CAP_MMAP_W \| CAP_MMAP_X) #define CAP_RECV CAP_READ #define CAP_SEND CAP_WRITE #define CAP_SOCK_CLIENT \ (CAP_CONNECT \| CAP_GETPEERNAME \| CAP_GETSOCKNAME \| CAP_GETSOCKOPT \| \ CAP_PEELOFF \| CAP_RECV \| CAP_SEND \| CAP_SETSOCKOPT \| CAP_SHUTDOWN) #define CAP_SOCK_SERVER \ (CAP_ACCEPT \| CAP_BIND \| CAP_GETPEERNAME \| CAP_GETSOCKNAME \| \ CAP_GETSOCKOPT \| CAP_LISTEN \| CAP_PEELOFF \| CAP_RECV \| CAP_SEND \| \ CAP_SETSOCKOPT \| CAP_SHUTDOWN) Added defines for backward API compatibility: #define CAP_MAPEXEC CAP_MMAP_X #define CAP_DELETE CAP_UNLINKAT #define CAP_MKDIR CAP_MKDIRAT #define CAP_RMDIR CAP_UNLINKAT #define CAP_MKFIFO CAP_MKFIFOAT #define CAP_MKNOD CAP_MKNODAT #define CAP_SOCK_ALL (CAP_SOCK_CLIENT \| CAP_SOCK_SERVER) Sponsored by: The FreeBSD Foundation Reviewed by: Christoph Mallon <christoph.mallon@gmx.de> Many aspects discussed with: rwatson, benl, jonathan ABI compatibility discussed with: kib	2013-03-02 00:53:12 +00:00
John Baldwin	f9379dc411	Replace the TDP_NOSLEEPING flag with a counter so that the THREAD_NO_SLEEPING() and THREAD_SLEEPING_OK() macros can nest. Reviewed by: attilio	2013-03-01 22:03:31 +00:00
Pawel Jakub Dawidek	71ac38e896	Remove unnecessary variables.	2013-03-01 21:58:56 +00:00
Pawel Jakub Dawidek	f4d0191b22	Reduce lock scope a little.	2013-03-01 21:57:02 +00:00
Marius Strobl	db9066f798	- Use strdup(9) instead of reimplementing it. - Use __DECONST instead of strange casts. - Reduce code duplication and simplify name2oid(). PR: 176373 Submitted by: Christoph Mallon MFC after: 1 week	2013-03-01 18:49:14 +00:00
Konstantin Belousov	58248e57ab	Make the default implementation of the VOP_VPTOCNP() fail if the directory entry, matched by the inode number, is ".". NFSv4 client might instantiate the distinct vnodes which have the same inode number, since single v4 export can be combined from several filesystems on the server. For instance, a case when the nested server mount point is exactly one directory below the top of the export, causes directory and its parent to have the same inode number 2. The vop_stdvptocnp() algorithm then returns "." as the name of the lower directory. Filtering out the "." entry with ENOENT works around this behaviour, the error forces getcwd(3) to fall back to usermode implementation, which compares both st_dev and st_ino. Based on the submission by: rmacklem Tested by: rmacklem MFC after: 1 week	2013-03-01 18:40:14 +00:00
Davide Italiano	e234a588cb	MFcalloutng: Style fixes.	2013-02-28 16:22:49 +00:00
Alexander Motin	fdc5dd2d2f	MFcalloutng: Switch eventtimers(9) from using struct bintime to sbintime_t. Even before this not a single driver really supported full dynamic range of struct bintime even in theory, not speaking about practical inexpediency. This change legitimates the status quo and cleans up the code.	2013-02-28 13:46:03 +00:00
Davide Italiano	acccf7d8b4	MFcalloutng: When CPU becomes idle, cpu_idleclock() calculates time to the next timer event in order to reprogram hw timer. Return that time in sbintime_t to the caller and pass it to acpi_cpu_idle(), where it can be used as one more factor (quite precise) to extimate furter sleep time and choose optimal sleep state. This is a preparatory change for further callout improvements will be committed in the next days. The commmit is not targeted for MFC.	2013-02-28 10:46:54 +00:00
Konstantin Belousov	20f4e3e158	Make recursive getblk() slightly more useful. Keep the buffer state intact if getblk() is done on the already owned buffer. Exit from brelse() early when the lock recursion is detected, otherwise brelse() might prematurely destroy the buffer under some circumstances. Sponsored by: The FreeBSD Foundation Noted by: mckusick Tested by: pho MFC after: 2 weeks	2013-02-27 07:34:09 +00:00
Alexander Motin	1af19ee4a2	Add support for good old 8192Hz profiling clock to software PMC. Reviewed by: fabient	2013-02-26 18:13:42 +00:00
Attilio Rao	590f9303e5	Merge from vmobj-rwlock branch: Remove unused inclusion of vm/vm_pager.h and vm/vnode_pager.h. Sponsored by: EMC / Isilon storage division Tested by: pho Reviewed by: alc	2013-02-26 01:00:11 +00:00
Pawel Jakub Dawidek	1d59211b2e	Style. Suggested by: kib	2013-02-25 20:51:29 +00:00
Pawel Jakub Dawidek	893365e42d	After r237012, the fdgrowtable() doesn't drop the filedesc lock anymore, so update a stale comment. Reviewed by: kib, keramida	2013-02-25 20:50:08 +00:00
John Baldwin	593efaf9f7	Further refine the handling of stop signals in the NFS client. The changes in r246417 were incomplete as they did not add explicit calls to sigdeferstop() around all the places that previously passed SBDRY to _sleep(). In addition, nfs_getcacheblk() could trigger a write RPC from getblk() resulting in sigdeferstop() recursing. Rather than manually deferring stop signals in specific places, change the VFS_() and VOP_() methods to defer stop signals for filesystems which request this behavior via a new VFCF_SBDRY flag. Note that this has to be a VFC flag rather than a MNTK flag so that it works properly with VFS_MOUNT() when the mount is not yet fully constructed. For now, only the NFS clients are set this new flag in VFS_SET(). A few other related changes: - Add an assertion to ensure that TDF_SBDRY doesn't leak to userland. - When a lookup request uses VOP_READLINK() to follow a symlink, mark the request as being on behalf of the thread performing the lookup (cnp_thread) rather than using a NULL thread pointer. This causes NFS to properly handle signals during this VOP on an interruptible mount. PR: kern/176179 Reported by: Russell Cattelan (sigdeferstop() recursion) Reviewed by: kib MFC after: 1 month	2013-02-21 19:02:50 +00:00
Jamie Gritton	ffc72591b1	Don't worry if a module is already loaded when looking for a fstype to mount (possible in a race condition). Reviewed by: kib MFC after: 1 week	2013-02-21 02:41:37 +00:00
John Baldwin	353374b525	Fix a few typos.	2013-02-19 16:35:27 +00:00
Pawel Jakub Dawidek	b2e054b0d4	Update the comment: we do show the backtrace of misbehaving thread.	2013-02-17 21:37:32 +00:00
Pawel Jakub Dawidek	f0ad2ecb9c	Style.	2013-02-17 11:56:36 +00:00
Pawel Jakub Dawidek	8e1d51ab40	- Require CAP_FSYNC capability right when opening a file with O_SYNC or O_FSYNC flags. - While here simplify check for locking flags. Sponsored by: The FreeBSD Foundation	2013-02-17 11:53:51 +00:00
Pawel Jakub Dawidek	11b0cfe3cd	Remove redundant parenthesis.	2013-02-17 11:49:21 +00:00
Pawel Jakub Dawidek	49549b1894	Remove redundant space.	2013-02-17 11:48:16 +00:00
Pawel Jakub Dawidek	6c08be2b88	Add break to the default case.	2013-02-17 11:47:58 +00:00
Pawel Jakub Dawidek	4881a5950e	Don't treat pointers as booleans.	2013-02-17 11:47:30 +00:00
Pawel Jakub Dawidek	de26549841	Remove redundant parenthesis.	2013-02-17 11:47:01 +00:00
Kirk McKusick	2bc1a1fe5c	Add barrier write capability to the VFS buffer interface. A barrier write is a disk write request that tells the disk that the buffer being written must be committed to the media along with any writes that preceeded it before any future blocks may be written to the drive. Barrier writes are provided by adding the functions bbarrierwrite (bwrite with barrier) and babarrierwrite (bawrite with barrier). Following a bbarrierwrite the client knows that the requested buffer is on the media. It does not ensure that buffers written before that buffer are on the media. It only ensure that buffers written before that buffer will get to the media before any buffers written after that buffer. A flush command must be sent to the disk to ensure that all earlier written buffers are on the media. Reviewed by: kib Tested by: Peter Holm	2013-02-16 14:51:30 +00:00
Ian Lepore	a1137de941	Add PPS_CANWAIT support for time_pps_fetch(). This adds support for all three blocking modes described in section 3.4.3 of RFC 2783, allowing the caller to retrieve the most recent values without blocking, to block for a specified time, or to block forever. Reviewed by: discussion on hackers@	2013-02-15 18:30:32 +00:00
Sergey Kandaurov	d7ffa24831	vn_io_faults_cnt: - use u_long consistently - use SYSCTL_ULONG to match the type of variable Reviewed by: kib MFC after: 1 week	2013-02-15 14:22:05 +00:00
Sergey Kandaurov	ab15d8039e	Add support of passing SCM_BINTIME ancillary data object for PF_LOCAL sockets. PR: kern/175883 Submitted by: Andrey Simonenko <simon@comsys.ntu-kpi.kiev.ua> Discussed with: glebius, phk MFC after: 2 weeks	2013-02-15 13:00:20 +00:00
Ian Lepore	74938cbb7f	Make the F_READAHEAD option to fcntl(2) work as documented: a value of zero now disables read-ahead. It used to effectively restore the system default readahead hueristic if it had been changed; a negative value now restores the default. Reviewed by: kib	2013-02-13 15:09:16 +00:00
Konstantin Belousov	dd0b4fb6d5	Reform the busdma API so that new types may be added without modifying every architecture's busdma_machdep.c. It is done by unifying the bus_dmamap_load_buffer() routines so that they may be called from MI code. The MD busdma is then given a chance to do any final processing in the complete() callback. The cam changes unify the bus_dmamap_load* handling in cam drivers. The arm and mips implementations are updated to track virtual addresses for sync(). Previously this was done in a type specific way. Now it is done in a generic way by recording the list of virtuals in the map. Submitted by: jeff (sponsored by EMC/Isilon) Reviewed by: kan (previous version), scottl, mjacob (isp(4), no objections for target mode changes) Discussed with: ian (arm changes) Tested by: marius (sparc64), mips (jmallet), isci(4) on x86 (jharris), amd64 (Fabian Keil <freebsd-listen@fabiankeil.de>)	2013-02-12 16:57:20 +00:00
Marius Strobl	18716f9f4b	Update comments to reflect r246689.	2013-02-11 23:05:10 +00:00
Marius Strobl	bdc5f0172e	Make SYSCTL_{LONG,QUAD,ULONG,UQUAD}(9) work as advertised and also handle constant values. Reviewed by: kib MFC after: 3 days	2013-02-11 21:50:00 +00:00
Konstantin Belousov	2871baa49a	Remove the ia64-specific code fragment, which effect is more cleanly done by the call to trans_prot() function a line before. Discussed with: Oliver Pinter <oliver.pntr@gmail.com> MFC after: 1 week	2013-02-10 20:08:33 +00:00
Andriy Gapon	c43b08dc6c	ktr: correctly handle possible wrap-around in the boot buffer Older entries should be 'before' newer entries in the new buffer too and there should be no zero-filled gap between them. Pointed out by: jhb MFC after: 3 days X-MFC with: r246282	2013-02-08 07:29:07 +00:00
Konstantin Belousov	888d4d4f86	When vforked child is traced, the debugging events are not generated until child performs exec(). The behaviour is reasonable when a debugger is the real parent, because the parent is stopped until exec(), and sending a debugging event to the debugger would deadlock both parent and child. On the other hand, when debugger is not the parent of the vforked child, not sending debugging signals makes it impossible to debug across vfork. Fix the issue by declining generating debug signals only when vfork() was done and child called ptrace(PT_TRACEME). Set a new process flag P_PPTRACE from the attach code for PT_TRACEME, if P_PPWAIT flag is set, which indicates that the process was created with vfork() and still did not execed. Check P_PPTRACE from issignal(), instead of refusing the trace outright for the P_PPWAIT case. The scope of P_PPTRACE is exactly contained in the scope of P_PPWAIT. Found and tested by: zont Reviewed by: pluknet MFC after: 2 weeks	2013-02-07 15:34:22 +00:00
Konstantin Belousov	2ca4998342	Stop translating the ERESTART error from the open(2) into EINTR. Posix requires that open(2) is restartable for SA_RESTART. For non-posix objects, in particular, devfs nodes, still disable automatic restart of the opens. The open call to a driver could have significant side effects for the hardware. Noted and reviewed by: jilles Discussed with: bde MFC after: 2 weeks	2013-02-07 14:53:33 +00:00
Neel Natu	dae3dc73f6	If an interrupt event's assign_cpu method fails, then restore the original cpuset mask for the associated interrupt thread. The text used above is verbatim from r195249 and the code should now be in line with the intent of that commit.	2013-02-07 06:48:47 +00:00
Pawel Jakub Dawidek	fbda3d5dae	Audit sockaddr argument for bind(2), connect(2), accept(2), sendto(2) and recvfrom(2) syscalls. Sponsored by: The FreeBSD Foundation	2013-02-07 00:36:00 +00:00
Pawel Jakub Dawidek	82b316b377	Minor style tweaks.	2013-02-07 00:27:11 +00:00
John Baldwin	a120a7a3cd	Rework the handling of stop signals in the NFS client. The changes in 195702, 195703, and 195821 prevented a thread from suspending while holding locks inside of NFS by forcing the thread to fail sleeps with EINTR or ERESTART but defer the thread suspension to the user boundary. However, this had the effect that stopping a process during an NFS request could abort the request and trigger EINTR errors that were visible to userland processes (previously the thread would have suspended and completed the request once it was resumed). This change instead effectively masks stop signals while in the NFS client. It uses the existing TDF_SBDRY flag to effect this since SIGSTOP cannot be masked directly. Also, instead of setting PBDRY on individual sleeps, the NFS client now sets the TDF_SBDRY flag around each NFS request and stop signals are masked for all sleeps during that region (the previous change missed sleeps in lockmgr locks). The end result is that stop signals sent to threads performing an NFS request are completely ignored until after the NFS request has finished processing and the thread prepares to return to userland. This restores the behavior of stop signals being transparent to userland processes while still preventing threads from suspending while holding NFS locks. Reviewed by: kib MFC after: 1 month	2013-02-06 17:06:51 +00:00
Sergey Kandaurov	23c053d6a2	Prezero the acl structure which is to be copied to usermode, to avoid leakage of the previous content of padding and unitialized fields. Reported by: Ilia Noskov <noskov@nic.ru> Reviewed by: kib MFC after: 1 week	2013-02-06 15:18:46 +00:00
Sergey Kandaurov	51dc4fea4c	Remove reference to the rlist code from comments, and fix a typo visible in the resulted change. Reviewed by: kib MFC after: 1 week	2013-02-05 20:08:33 +00:00
Andriy Gapon	c8199bc955	ktr: prevent possible footshooting with KTR_ENTRIES and KTR_BOOT_ENTRIES Suggested by: adrian MFC after: 14 days X-MFC with: r246282	2013-02-04 21:58:57 +00:00
Andriy Gapon	f85ed12497	ktr: copy content from the early static buffer if KTR_ENTRIES != KTR_BOOT_ENTRIES Reported by: glebius, jhb Pointyhat to: avg MFC after: 14 days X-MFC with: r246282	2013-02-04 21:50:55 +00:00
Marius Strobl	94bfd5b1a0	Try to improve r242655 take III: move these SYSCTLs describing the kernel map, which is defined and initialized in vm/vm_kern.c, to the latter. Submitted by: alc	2013-02-04 09:35:48 +00:00
Marius Strobl	e8cbe54bc4	Further improve r242655 and supply VM_{MIN,MAX}_KERNEL_ADDRESS as constant values to SYSCTL_ULONG(9) where possible. Submitted by: bde	2013-02-03 21:43:55 +00:00
Andriy Gapon	36b7dde416	allow for large KTR_ENTRIES values by allocating ktr_buf using malloc(9) Only during very early boot, before malloc(9) is functional (SI_SUB_KMEM), the static ktr_buf_init is used. Size of the static buffer is determined by a new kernel option KTR_BOOT_ENTRIES. Its default value is 1024. This commit builds on top of r243046. Reviewed by: alc MFC after: 17 days	2013-02-03 09:57:39 +00:00
Andriy Gapon	8eede5c4d9	fix some fat-fingering in r246246 Submitted by: mjg Pointyhat to: avg MFC after: 5 days X-MFC with: r246246	2013-02-02 14:19:50 +00:00
Andriy Gapon	bfdcb3bcba	print compiler version in the kernel banner And provide kernel compiler version as a sysctl as well. This is useful while we have gcc and clang cohabitation. This could be even more useful when we have support for external toolchains. In cooperation with: mjg MFC after: 13 days	2013-02-02 11:58:35 +00:00
Grzegorz Bernacki	2d7d16429c	Get time of next event from other cores only if SMP is already started. Reviewed by: mav Obtained from: Semihalf	2013-02-01 11:39:03 +00:00
Pawel Jakub Dawidek	4bbe7b0c20	Now that MPSAFE flag is gone, we can arrange code a bit better.	2013-01-31 22:20:05 +00:00
Pawel Jakub Dawidek	b108953c6f	Remove leftover label after Giant removal from VFS.	2013-01-31 22:15:41 +00:00
Pawel Jakub Dawidek	a2c496ebb9	Remove label that was accidentally moved during Giant removal from VFS.	2013-01-31 22:14:16 +00:00
Pawel Jakub Dawidek	9e2677fd6d	Simplify code a bit. This is leftover after Giant removal from VFS.	2013-01-31 22:12:48 +00:00
Konstantin Belousov	538375d42e	The case of pid == WAIT_MYPGRP for the kern_wait() is already handled in kern_wait6(), which is called by kern_wait(). Remove the redundand check, introduced in r243136, and add a comment noting this, to make the code less confusing. The blank lines are added to properly delineate the scope of the preceeding comments. Noted by: "Jukka A. Ukkonen" <jau@iki.fi> MFC after: 1 week	2013-01-30 13:14:15 +00:00
John Baldwin	a8df530ddc	Mark 'ticks', 'time_second', and 'time_uptime' as volatile to prevent the compiler from caching their values in tight loops. Reviewed by: bde MFC after: 1 week	2013-01-28 19:38:13 +00:00
Gleb Smirnoff	29110f87a6	- Move large functions m_getjcl() and m_get2() to kern/uipc_mbuf.c - style(9) fixes to mbuf.h Reviewed by: bde	2013-01-24 09:29:41 +00:00
John Baldwin	75d774e36a	Fix a typo.	2013-01-23 14:37:05 +00:00
Andre Oppermann	371407162b	Move the mbuf memory limit calculations from init_param2() to tunable_mbinit() where it is next to where it is used later. Change the sysinit level of tunable_mbinit() from SI_SUB_TUNABLES to SI_SUB_KMEM after the VM is running. This allows to use better methods to determine the effectively available physical and virtual memory available to the kernel. Update comments. In a second step it can be merged into mbuf_init().	2013-01-17 21:28:31 +00:00
Alfred Perlstein	17ebe960a6	Do not autotune ncallout to be greater than 18508. When maxusers was unrestricted and maxfiles was allowed to autotune much higher the result was that ncallout which was based on maxfiles and maxproc grew much higher than was needed. To fix this clip autotuning to the same number we would get with the old maxusers algorithm which would stop scaling at 384 maxusers. Growing ncalout higher is not likely to be needed since most consumers of timeout(9) are gone and any higher value for ncallout causes the callwheel hashes to be much larger than will even be needed for most applications. MFC after: 1 month Reviewed by: mav	2013-01-15 19:26:17 +00:00
Andrey Zonov	0165f660c1	- Detect when we are in KVM. Silence on: emulation Approved by: kib (mentor) MFC after: 1 week	2013-01-15 14:05:59 +00:00
Konstantin Belousov	10b4bb0b33	Add a trivial comment to record the proper commit log for r245407: Set the v_hash for a new vnode in the getnewvnode() to the value calculated based on the vnode structure address. Filesystems using vfs_hash_insert() override the v_hash using the standard formula of (inode_number + mnt_hashseed). For other filesystems, the initialization allows the vfs_hash_index() to provide useful hash too. Suggested, reviewed and tested by: peter Sponsored by: The FreeBSD Foundation MFC after: 5 days	2013-01-14 05:52:23 +00:00
Konstantin Belousov	a41df84820	diff --git a/sys/kern/vfs_subr.c b/sys/kern/vfs_subr.c index 7c243b6..0bdaf36 100644 --- a/sys/kern/vfs_subr.c +++ b/sys/kern/vfs_subr.c @@ -279,6 +279,7 @@ SYSCTL_INT(_debug, OID_AUTO, vnlru_nowhere, CTLFLAG_RW, #define VSHOULDFREE(vp) (!((vp)->v_iflag & VI_FREE) && !(vp)->v_holdcnt) #define VSHOULDBUSY(vp) (((vp)->v_iflag & VI_FREE) && (vp)->v_holdcnt) +static int vnsz2log; /* * Initialize the vnode management data structures. @@ -293,6 +294,7 @@ SYSCTL_INT(_debug, OID_AUTO, vnlru_nowhere, CTLFLAG_RW, static void vntblinit(void dummy __unused) { + u_int i; int physvnodes, virtvnodes; / @@ -332,6 +334,9 @@ vntblinit(void dummy __unused) syncer_maxdelay = syncer_mask + 1; mtx_init(&sync_mtx, "Syncer mtx", NULL, MTX_DEF); cv_init(&sync_wakeup, "syncer"); + for (i = 1; i <= sizeof(struct vnode); i <<= 1) + vnsz2log++; + vnsz2log--; } SYSINIT(vfs, SI_SUB_VFS, SI_ORDER_FIRST, vntblinit, NULL); @@ -1067,6 +1072,14 @@ alloc: } rangelock_init(&vp->v_rl); + / + * For the filesystems which do not use vfs_hash_insert(), + * still initialize v_hash to have vfs_hash_index() useful. + * E.g., nullfs uses vfs_hash_index() on the lower vnode for + * its own hashing. + / + vp->v_hash = (uintptr_t)vp >> vnsz2log; + vpp = vp; return (0); }	2013-01-14 05:42:54 +00:00
Konstantin Belousov	f6af8e375c	Add exported vfs_hash_index() function, which calculates the canonical pre-masked hash for the given vnode. The function assumes that vp->v_hash is initialized by the filesystem vnode instantiation function. At the moment, it is only done if filesystem uses vfs_hash_insert(). Reviewed by: peter Tested by: peter, pho (previous version) Sponsored by: The FreeBSD Foundation MFC after: 5 days	2013-01-14 05:41:40 +00:00
Konstantin Belousov	7b982bc831	Rename vfs_hash_index() to vfs_hash_bucket(). Reviewed by: peter Tested by: peter, pho Sponsored by: The FreeBSD Foundation MFC after: 5 days	2013-01-14 05:40:21 +00:00
Konstantin Belousov	ddd6b3fc33	Add flags argument to vfs_write_resume() and remove vfs_write_resume_flags(). Sponsored by: The FreeBSD Foundation	2013-01-11 06:08:32 +00:00
Mateusz Guzik	43287e2753	lockmgr: unlock interlock (if requested) when dealing with upgrade/downgrade requests for LK_NOSHARE locks, just like for shared locks. PR: kern/174969 Reviewed by: attilio MFC after: 1 week	2013-01-06 21:47:59 +00:00
Konstantin Belousov	a545089ed5	Protect the p->p_pgrp dereference with the process lock. MFC after: 3 days	2013-01-06 15:10:10 +00:00
Neel Natu	a09580e7a3	Teach the kernel to recognize that it is executing inside a bhyve virtual machine. Obtained from: NetApp	2013-01-05 19:18:50 +00:00
Benjamin Kaduk	5e9723e271	Fix some minor inaccuracies introduced in r243251. Also correct the comment in kern_synch.c which was the source of the problematic text. Reviewed by: kib (previous version) Approved by: hrs (mentor)	2013-01-05 00:23:26 +00:00
David Xu	eea8d86d4d	Revert revision 244760 because strncpy pads trailing space with zero, this prevents kernel data from being leaked. Noticed by: Joerg Sonnenberger < joerg at britannica dot bec dot de >	2013-01-04 11:11:12 +00:00
Konstantin Belousov	d1c5e3f8b0	Remove the deprecated MNT_VNODE_FOREACH interface. Use the MNT_VNODE_FOREACH_ALL instead.	2013-01-03 19:02:52 +00:00
Konstantin Belousov	f99cb34c4f	The process_deferred_inactive() function locks the vnodes of the ufs mount, which means that is must not be called while the snaplock is owned. The vfs_write_resume(9) does call the function as the VFS_SUSP_CLEAN() method, which is too early and falls into the region still protected by snaplock. Add yet another flag for the vfs_write_resume_flags() to avoid calling suspension cleanup handler after the suspend is lifted, and use it in the ffs_snapshot() call to vfs_write_resume. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2013-01-01 16:14:48 +00:00
Konstantin Belousov	91e9474552	Make it possible to atomically resume writes on the mount and account the write start, by adding a variation of the vfs_write_resume(9) which accepts flags. Use the new function to prevent a deadlock between parallel suspension and snapshotting a UFS mount. The ffs_snapshot() code performed vfs_write_resume() followed by vn_start_write() while owning the snaplock. If the suspension intervene between resume and vn_start_write(), the deadlock occured after the suspending thread tried to lock the snaplock, most typically during the write in the ffs_copyonwrite(). Reported and tested by: Andreas Longwitz <longwitz@incore.de> Reviewed by: mckusick MFC after: 2 weeks X-MFC-note: make the vfs_write_resume(9) function a macro after the MFC, in HEAD	2012-12-28 23:08:30 +00:00
Oleksandr Tymoshenko	7fc3ae51f3	Fix build on ARM (and probably other platforms)	2012-12-28 06:52:53 +00:00
David Xu	9d4bf0db7c	Use strlcpy to NULL-terminate error message even if user provided a short buffer.	2012-12-28 02:43:33 +00:00
Attilio Rao	c92c859b7b	Fixup r244240: mp_ncpus will be 1 also in the !SMP and smp_disabled=1 case. There is no point in optimizing further the code and use a TRUE litteral for a path that does heavyweight stuff anyway (like lock acq), at the price of obfuscated code. Use the appropriate check where necessary and remove a macro. Sponsored by: EMC / Isilon storage division MFC after: 3 days	2012-12-26 15:20:32 +00:00
Konstantin Belousov	ad9789f6db	Do not force a writer to the devfs file to drain the buffer writes. Requested and tested by: Ian Lepore <freebsd@damnhippie.dyndns.org> MFC after: 2 weeks	2012-12-23 22:43:27 +00:00
Jaakko Heinonen	b1e1f725e7	Reject spaces and double quotation marks in device names. devctl(4) and devd(8) can't handle names with such characters properly. PR: bin/144736, kern/161912 Discussed with: imp, kib, pjd	2012-12-22 13:33:28 +00:00
Attilio Rao	cd2fe4e632	Fixup r240424: On entering KDB backends, the hijacked thread to run interrupt context can still be idlethread. At that point, without the panic condition, it can still happen that idlethread then will try to acquire some locks to carry on some operations. Skip the idlethread check on block/sleep lock operations when KDB is active. Reported by: jh Tested by: jh MFC after: 1 week	2012-12-22 09:37:34 +00:00
Attilio Rao	b1308d72c2	Fixup r218424: uio_yield() was scaling directly to userland priority. When kern_yield() was introduced with the possibility to specify a new priority, the behaviour changed by not lowering priority at all in the consumers, making the yielding mechanism highly ineffective for high priority kthreads like bufdaemon, syncer, vlrudaemon, etc. There are no evidences that consumers could bear with such change in semantic and this situation could finally lead to bugs similar to the ones fixed in r244240. Re-specify userland pri for kthreads involved. Tested by: pho Reviewed by: kib, mdf MFC after: 1 week	2012-12-21 13:14:12 +00:00
Dag-Erling Smørgrav	b5471c918f	Rewrite fdgrowtable() so common mortals can actually understand what it does and how, and add comments describing the data structures and explaining how they are managed.	2012-12-20 20:18:27 +00:00
Olivier Houchard	05d9035003	Create an architecture-agnostic buffer pool manager that uses uma(9) to manage a set of power-of-2 sized buffers for bus_dmamem_alloc(). This allows the caller to provide the back-end allocator uma allocator, allowing full control of the memory pages backing the pool. For convenience, it provides an optional builtin allocator that provides pages allocated with the VM_MEMATTR_UNCACHEABLE attribute, for managing pools of DMA buffers for BUS_DMA_COHERENT or BUS_DMA_NOCACHE. This also allows the caller to specify a minimum alignment, and it ensures that all buffers start on a boundary and have a length that's a multiple of that value, to avoid using buffers that trigger partial cache line flushes. Submitted by: Ian Lepore <freebsd@damnhippie.dyndns.org>	2012-12-20 00:34:54 +00:00
Pawel Jakub Dawidek	c345faea5a	Replace expand_name() function with corefile_open() function, which not only returns name, but also vnode of corefile to use. This simplifies the code and closes few races, especially in %I handling. Reviewed by: kib Obtained from: WHEEL Systems	2012-12-19 23:59:48 +00:00
Pawel Jakub Dawidek	22a5d85aa9	Use correct file permissions when looking for available core file if kern.corefile contains %I. Obtained from: WHEEL Systems	2012-12-19 23:40:02 +00:00
Jeff Roberson	4c44811c9d	- Add new machine parsable KTR macros for timing events. - Use this new format to automatically handle syscalls and VOPs. This changes the earlier format but is still human readable. Sponsored by: EMC / Isilon Storage Division	2012-12-19 20:10:00 +00:00
Jeff Roberson	5b39d5c739	- Correctly handle EWOULDBLOCK in quiesce_cpus Discussed with: mav	2012-12-19 20:08:06 +00:00
Pawel Jakub Dawidek	07a8e07896	The 'flags' argument can be modified in vn_open_cred(), so we need to set it for every loop interation. Pointed out by: kib	2012-12-19 12:14:08 +00:00
Pawel Jakub Dawidek	cc58032c44	Do not audit paths we try when kern.corefile contains %I. Obtained from: WHEEL Systems	2012-12-19 12:12:53 +00:00
Pawel Jakub Dawidek	29146f1a7a	Style cleanups.	2012-12-19 12:10:14 +00:00
Pawel Jakub Dawidek	086053a370	The expand_name() function isn't called with the process lock held anymore, so we can safely use malloc(M_WAITOK) now. Pointed out by: kib	2012-12-19 12:00:09 +00:00
Mateusz Guzik	af3c786c47	prison_racct_detach can be called for not fully initialized jail, so make it check that the jail has racct before doing anything PR: kern/174436 Reviewed by: trasz MFC after: 3 days	2012-12-18 18:34:36 +00:00
Andrey Zonov	5eb0d2838c	- Add sysctl to allow unprivileged users to call mlock(2)-family system calls and turn it on. - Do not allow to call them inside jail. [1] Pointed out by: trasz [1] Reviewed by: avg Approved by: kib (mentor) MFC after: 1 week	2012-12-18 07:36:45 +00:00
Pawel Jakub Dawidek	f06f465db7	Minor style tweaks. Obtained from: WHEEL Systems	2012-12-17 10:51:22 +00:00
Pawel Jakub Dawidek	c52ff61196	Better variables naming in expand_name() to be more consistent with coredump(). Obtained from: WHEEL Systems	2012-12-17 10:48:10 +00:00
Pawel Jakub Dawidek	dd57ce87eb	Move expand_name() after process lock is released. This fixed panic where we hold mutex (process lock) and try to obtain sleepable lock (vnode lock in expand_name()). The panic could occur when %I was used in kern.corefile. Additionally we avoid expand_name() overhead when coredumps are disabled. Obtained from: WHEEL Systems	2012-12-16 14:53:27 +00:00
Pawel Jakub Dawidek	2ce1b32df2	Don't add audit record when coredumps are disabled or name cannot be expanded. Discussed with: rwatson Obtained from: WHEEL Systems	2012-12-16 14:24:59 +00:00
Pawel Jakub Dawidek	7e73ee85ab	Make the check easier to read. Obtained from: WHEEL Systems	2012-12-16 14:14:18 +00:00
Pawel Jakub Dawidek	b039f8c2aa	Use 'cred' variable. Obtained from: WHEEL Systems	2012-12-16 13:56:38 +00:00
Konstantin Belousov	14df601e47	When mnt_vnode_next_active iterator cannot lock the next vnode and yields, specify the user priority for the yield. Otherwise, a higher-priority (kernel) thread could fall into the priority-inversion with the thread owning the mutex lock. On single-processor machines or UP kernels, do not loop adaptively when the next vnode cannot be locked, instead yield unconditionally. Restructure the iteration initializer and the iterator to remove code duplication. Put the code to fetch and lock a vnode next to the current marker, into the mnt_vnode_next_active() function, and use it instead of repeating the loop. Reported by: hrs, rmacklem Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 3 days	2012-12-15 02:04:46 +00:00
Konstantin Belousov	d4015944e7	Remove a special case for XEN, which is erronous and makes vfork(2) behaviour to differ from the documented, only on XEN. If there are any issues with XEN pmap left, they should be fixed in pmap. MFC after: 2 weeks	2012-12-15 02:02:11 +00:00
Rick Macklem	f1c4014cd5	The group list for a non-default export entry (a host/subnet one) was being copied from the wrong place. This patch fixes that. This could cause access failures for mapped users, when the group permissions were needed. PR: 147998 Submitted by: Christopher Key (cjk32 at cam.ac.uk) MFC after: 2 weeks	2012-12-14 21:49:06 +00:00
Alfred Perlstein	15d32bd543	Cleanup more of the kassert_panic. fix compile warnings on !amd64 and NULL derefs that would happen if kassert_panic() would return.	2012-12-11 07:08:14 +00:00
Alfred Perlstein	c2c5ede903	Fix WITNESS when INVARIANT_SUPPORT is defined. This fixes tinderbox breakage from r244105. Pointed out by: adrian	2012-12-11 05:59:16 +00:00
Alfred Perlstein	6b6bd3b704	Switch the hardwired WITNESS panics to kassert_panic. This is an ongoing effort to provide runtime debug information useful in the field that does not panic existing installations. This gives us the flexibility needed when shipping images to a potentially large audience with WITNESS enabled without worrying about formerly non-fatal LORs hurting a release. Sponsored by: iXsystems	2012-12-11 01:23:50 +00:00
Alfred Perlstein	d3bfafb4f6	back out half of 244098. kern.bootfile needs to be rw for installkernel. Pointed out by: kib, flo	2012-12-11 00:10:20 +00:00
Alfred Perlstein	a94053ba39	allow KASSERT to enter KDB.	2012-12-10 23:11:26 +00:00
Alfred Perlstein	d06cadae1e	make sysctls kern.{bootfile,conftxt} read-only MFC after: 1 month	2012-12-10 23:09:55 +00:00
Konstantin Belousov	686ffcaceb	Do not yield while owning a mutex. The Giant reacquire in the kern_yield() is problematic than. The owned mutex is the mount interlock, and it is in fact not needed to guarantee the stability of the mount list of active vnodes, so fix the the issue by only taking the mount interlock for MNT_REF and MNT_REL operations. While there, augment the unconditional yield by some amount of spinning [1]. Reported and tested by: pho Reviewed by: attilio Submitted by: attilio [1] MFC after: 3 days	2012-12-10 20:44:09 +00:00
Andre Oppermann	0060bab556	Prevent long type overflow of realmem calculation on ILP32 by forcing calculation to be in quad_t space. Fix style issue with second parameter to qmin(). Reported by: alc Reviewed by: bde, alc	2012-12-10 12:19:03 +00:00
Konstantin Belousov	5d439a2957	Do not ignore zero address, possibly returned by the vm_map_find() call. The function indicates a failure by the TRUE return value. To be extra safe, assert that the return value from the following vm_map_insert() indicates success. Fix style issues in the nearby lines, reformulate the comment. Reviewed by: alc (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 week	2012-12-10 05:14:04 +00:00
Konstantin Belousov	17cb8cfc31	Remove useless comment. MFC after: 3 days	2012-12-09 20:34:11 +00:00
Konstantin Belousov	796fa4fb86	Fix typo. MFC after: 3 days	2012-12-09 20:26:51 +00:00
Attilio Rao	e68ccbe85e	Add a comment on why inlining critical_enter() may not be a good idea for the general case. Reviewed by: bde MFC after: 1 week	2012-12-09 04:54:22 +00:00
Pawel Jakub Dawidek	6e0b674628	Configure UMA warnings for the following zones: - unp_zone: kern.ipc.maxsockets limit reached - socket_zone: kern.ipc.maxsockets limit reached - zone_mbuf: kern.ipc.nmbufs limit reached - zone_clust: kern.ipc.nmbclusters limit reached - zone_jumbop: kern.ipc.nmbjumbop limit reached - zone_jumbo9: kern.ipc.nmbjumbo9 limit reached - zone_jumbo16: kern.ipc.nmbjumbo16 limit reached Note that those warnings are printed not often than every five minutes and can be globally turned off by setting sysctl/tunable vm.zone_warnings to 0. Discussed on: arch Obtained from: WHEEL Systems MFC after: 2 weeks	2012-12-07 22:30:30 +00:00
Pawel Jakub Dawidek	45fe0bf7e4	Make use of the fact that uma_zone_set_max(9) already returns actual limit set.	2012-12-07 22:23:53 +00:00
Pawel Jakub Dawidek	4007b61cde	More style cleanups.	2012-12-07 22:22:04 +00:00
Pawel Jakub Dawidek	b0b1402537	Style cleanups.	2012-12-07 22:19:41 +00:00
Pawel Jakub Dawidek	94b0ae5d62	- Make socket_zone static - it is used only in this file. - Update maxsockets on uma_zone_set_max(). Obtained from: WHEEL Systems	2012-12-07 22:15:51 +00:00
Pawel Jakub Dawidek	68412f4179	Style cleanups.	2012-12-07 22:13:33 +00:00
Pawel Jakub Dawidek	0b746181a2	There is no need anymore to include vm/uma.h after r241726. Obtained from: WHEEL Systems	2012-12-07 22:05:42 +00:00
Alfred Perlstein	3945a96431	Allow KASSERT to log instead of panic. This is to allow debug images to be used without taking down the system when non-fatal asserts are hit. The following sysctls are added: debug.kassert.warn_only: 1 = log, 0 = panic debug.kassert.do_ktr: set to a ktr mask for logging via KTR debug.kassert.do_log: 1 = log, 0 = quiet debug.kassert.warnings: stats, number of kasserts hit debug.kassert.log_panic_at: number of kasserts before we actually panic, 0 = never debug.kassert.log_pps_limit: pps limit for log messages debug.kassert.log_mute_at: stop warning after N kasserts, 0 = never stop debug.kassert.kassert: set this sysctl to trigger a kassert Discussed with: scottl, gnn, marcel Sponsored by: iXsystems	2012-12-07 08:25:08 +00:00
Alfred Perlstein	3356d129ad	Use uint instead of int for flags exported via sysctl.	2012-12-07 05:55:48 +00:00
Kevin Lo	b08d12d9be	- according to POSIX, make socket(2) return EAFNOSUPPORT rather than EPROTONOSUPPORT if the address family is not supported. - introduce pffinddomain() to find a domain by family and use it as appropriate. Reviewed by: glebius	2012-12-07 02:22:48 +00:00

1 2 3 4 5 ...

13191 Commits