Commit Graph

226 Commits

Author SHA1 Message Date
Mateusz Guzik
f6f6d24062 Implement lockless resource limits.
Use the same scheme implemented to manage credentials.

Code needing to look at process's credentials (as opposed to thred's) is
provided with *_proc variants of relevant functions.

Places which possibly had to take the proc lock anyway still use the proc
pointer to access limits.
2015-06-10 10:48:12 +00:00
Konstantin Belousov
316b384343 Implement support for binary to requesting specific stack size for the
initial thread.  It is read by the ELF image activator as the virtual
size of the PT_GNU_STACK program header entry, and can be specified by
the linker option -z stack-size in newer binutils.

The soft RLIMIT_STACK is auto-increased if possible, to satisfy the
binary' request.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-04-15 08:13:53 +00:00
Konstantin Belousov
5c7bebf961 The process spin lock currently has the following distinct uses:
- Threads lifetime cycle, in particular, counting of the threads in
  the process, and interlocking with process mutex and thread lock.
  The main reason of this is that turnstile locks are after thread
  locks, so you e.g. cannot unlock blockable mutex (think process
  mutex) while owning thread lock.

- Virtual and profiling itimers, since the timers activation is done
  from the clock interrupt context.  Replace the p_slock by p_itimmtx
  and PROC_ITIMLOCK().

- Profiling code (profil(2)), for similar reason.  Replace the p_slock
  by p_profmtx and PROC_PROFLOCK().

- Resource usage accounting.  Need for the spinlock there is subtle,
  my understanding is that spinlock blocks context switching for the
  current thread, which prevents td_runtime and similar fields from
  changing (updates are done at the mi_switch()).  Replace the p_slock
  by p_statmtx and PROC_STATLOCK().

The split is done mostly for code clarity, and should not affect
scalability.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-11-26 14:10:00 +00:00
Mateusz Guzik
dff9862c0e ifdef RACCT ui_racct_foreach and struct uidinfo's ui_racct
Change racct_ create and destroy to macros evaluating to nothing without RACCT
so that their callers passing ui_racct don't have to be ifdefed.
2014-11-23 08:25:44 +00:00
Mateusz Guzik
8d866b10fc Tidy up functions related to uidinfo management.
- reference found uidinfo in uilookup
- reduce nesting by handling shorter cases first
2014-10-27 20:20:05 +00:00
Mateusz Guzik
8101958c4f De-k&r-ify function definitions in kern/kern_resource.c
No functional changes.
2014-10-27 20:18:30 +00:00
Mateusz Guzik
675c3507d4 rlimit: plug duplicate assertion
counter sanity is already checked by refcount_release.
2014-10-25 05:56:21 +00:00
Mateusz Guzik
c2a48c0d1b rlimit: avoid unnecessary copying of rlimits
If refcount is 1 just modify rlimits in place.

MFC after:	2 weeks
2013-12-13 20:54:45 +00:00
Mateusz Guzik
3318a9c895 rlimit: add and utilize lim_shared
MFC after:	2 weeks
2013-12-13 20:53:31 +00:00
Konstantin Belousov
9110db818a Add a resource limit for the total number of kqueues available to the
user.  Kqueue now saves the ucred of the allocating thread, to
correctly decrement the counter on close.

Under some specific and not real-world use scenario for kqueue, it is
possible for the kqueues to consume memory proportional to the square
of the number of the filedescriptors available to the process.  Limit
allows administrator to prevent the abuse.

This is kernel-mode side of the change, with the user-mode enabling
commit following.

Reported and tested by:	pho
Discussed with:	jmg
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2013-10-21 16:44:53 +00:00
Ian Lepore
9a2bff7ca6 Call sched_prio() to immediately change the priority of the thread in
response to an rtprio_thread() call, when the priority is different
than the old priority, and either the old or the new priority class is
not RTP_PRIO_NORMAL (timeshare).

The reasoning for the second half of the test is that if it's a change in
timeshare priority, then the scheduler is going to adjust that priority
in a way that completely wipes out the requested change anyway, so
what's the point?  (If that's not true, then allowing a thread to change
its own timeshare priority would subvert the scheduler's adjustments and
let a cpu-bound thread monopolize the cpu; if allowed at all, that
should require priveleges.)

On the other hand, if either the old or new priority class is not
timeshare, then the scheduler doesn't make automatic adjustments, so we
should honor the request and make the priority change right away.  The
reason the old class gets caught up in this is the very reason for this
change:  when thread A changes the priority of its child thread B from
idle back to timeshare, thread B never actually gets moved to a
timeshare-range run queue unless there are some idle cycles available
to allow it to first get scheduled again as an idle thread.

Reviewed by:	jhb@
2013-03-07 02:53:29 +00:00
Davide Italiano
4601bab1fb MFcalloutng (r244251 with minor changes):
Specify that precision of 0.5s is enough for resource limitation.

Sponsored by:	Google Summer of Code 2012, iXsystems inc.
Tested by:	flo, marius, ian, markj, Fabian Keil
2013-03-04 16:25:12 +00:00
Mikolaj Golub
8854fe3915 Change kern.proc.rlimit sysctl to:
- retrive only one, specified limit for a process, not the whole
  array, as it was previously (the sysctl has been added recently and
  has not been backported to stable yet, so this change is ok);

- allow to set a resource limit for another process.

Submitted by:	Andrey Zonov <andrey at zonov.org>
Discussed with:	kib
Reviewed by:	kib
MFC after:	2 weeks
2012-01-22 20:25:00 +00:00
John Baldwin
948c460971 Fix a logic bug in change 228207 in the check for a thread's new user
priority being a realtime priority.

MFC after:	3 days
2012-01-05 19:02:52 +00:00
Eitan Adler
9910b854c6 - Add a sysctl to allow non-root users the ability to set idle
priorities.

- While here fix up some style nits.

Discussed with: cperciva (breifly)
Reviewed by:	pjd (earlier version)
Reviewed by:	bde
Approved by:	jhb
MFC after:	1 month
2011-12-13 14:00:27 +00:00
John Baldwin
593dd43eee When changing the user priority of a thread, change the real priority
in addition to the user priority for threads whose current real priority
is equal to the previous user priority or if the new priority is a
real-time priority.  This allows priority changes of other threads to
have an immediate effect.

MFC after:	2 weeks
2011-12-02 19:59:46 +00:00
Mikolaj Golub
bde886fba4 In lim_fork() assert that processes locks are held.
Suggested by:	kib
2011-11-07 21:09:04 +00:00
Kip Macy
8451d0dd78 In order to maximize the re-usability of kernel code in user space this
patch modifies makesyscalls.sh to prefix all of the non-compatibility
calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel
entry points and all places in the code that use them. It also
fixes an additional name space collision between the kernel function
psignal and the libc function of the same name by renaming the kernel
psignal kern_psignal(). By introducing this change now we will ease future
MFCs that change syscalls.

Reviewed by:	rwatson
Approved by:	re (bz)
2011-09-16 13:58:51 +00:00
John Baldwin
2417d97ebb - Export each thread's individual resource usage in in struct kinfo_proc's
ki_rusage member when KERN_PROC_INC_THREAD is passed to one of the
  process sysctls.
- Correctly account for the current thread's cputime in the thread when
  doing the runtime fixup in calcru().
- Use TIDs as the key to lookup the previous thread to compute IO stat
  deltas in IO mode in top when thread display is enabled.

Reviewed by:	kib
Approved by:	re (kib)
2011-07-18 17:33:08 +00:00
John Baldwin
e806d352d2 Fix several places to ignore processes that are not yet fully constructed.
MFC after:	1 week
2011-04-06 17:47:22 +00:00
Edward Tomasz Napierala
097055e26d Add racct. It's an API to keep per-process, per-jail, per-loginclass
and per-loginclass resource accounting information, to be used by the new
resource limits code.  It's connected to the build, but the code that
actually calls the new functions will come later.

Sponsored by:	The FreeBSD Foundation
Reviewed by:	kib (earlier version)
2011-03-29 17:47:25 +00:00
John Baldwin
8e6fa660f2 Fix some locking nits with the p_state field of struct proc:
- Hold the proc lock while changing the state from PRS_NEW to PRS_NORMAL
  in fork to honor the locking requirements.  While here, expand the scope
  of the PROC_LOCK() on the new process (p2) to avoid some LORs.  Previously
  the code was locking the new child process (p2) after it had locked the
  parent process (p1).  However, when locking two processes, the safe order
  is to lock the child first, then the parent.
- Fix various places that were checking p_state against PRS_NEW without
  having the process locked to use PROC_LOCK().  Every place was already
  locking the process, just after the PRS_NEW check.
- Remove or reduce the use of PROC_SLOCK() for places that were checking
  p_state against PRS_NEW.  The PROC_LOCK() alone is sufficient for reading
  the current state.
- Reorder fill_kinfo_proc() slightly so it only acquires PROC_SLOCK() once.

MFC after:	1 week
2011-03-24 18:40:11 +00:00
David Xu
c8e368a933 - Follow r216313, the sched_unlend_user_prio is no longer needed, always
use sched_lend_user_prio to set lent priority.
- Improve pthread priority-inherit mutex, when a contender's priority is
  lowered, repropagete priorities, this may cause mutex owner's priority
  to be lowerd, in old code, mutex owner's priority is rise-only.
2010-12-29 09:26:46 +00:00
John Baldwin
36b4cd243e Add back a bounds check on valid idle priorities that was lost in an
earlier commit.  While here, move the thread lock down in rtp_to_pri().
It is not needed for all of the priority value checks and the computation
of newpri.

Reported by:	swell.k @ gmail
MFC after:	3 days
2010-12-17 16:29:06 +00:00
Ed Maste
c4965cfc44 We've already set p = td->td_proc, so use it. 2010-10-18 15:46:58 +00:00
David Xu
cf7d9a8ca8 Create a global thread hash table to speed up thread lookup, use
rwlock to protect the table. In old code, thread lookup is done with
process lock held, to find a thread, kernel has to iterate through
process and thread list, this is quite inefficient.
With this change, test shows in extreme case performance is
dramatically improved.

Earlier patch was reviewed by: jhb, julian
2010-10-09 02:50:23 +00:00
Edward Tomasz Napierala
1a996ed1d8 Revert r210225 - turns out I was wrong; the "/*-" is not license-only
thing; it's also used to indicate that the comment should not be automatically
rewrapped.

Explained by:	cperciva@
2010-07-18 20:57:53 +00:00
Edward Tomasz Napierala
805cc58ac0 The "/*-" comment marker is supposed to denote copyrights. Remove non-copyright
occurences from sys/sys/ and sys/kern/.
2010-07-18 20:23:10 +00:00
Edward Tomasz Napierala
eea4ac8b3f Remove outdated comment and move part of it into more applicable place. 2010-07-18 19:29:12 +00:00
Ed Schouten
60ae52f785 Use ISO C99 integer types in sys/kern where possible.
There are only about 100 occurences of the BSD-specific u_int*_t
datatypes in sys/kern. The ISO C99 integer types are used here more
often.
2010-06-21 09:55:56 +00:00
Konstantin Belousov
41fd9c6369 Fix the double counting of the last process thread td_incruntime
on exit, that is done once in thread_exit() and the second time in
proc_reap(), by clearing td_incruntime.

Use the opportunity to revert to the pre-RUSAGE_THREAD exporting of ruxagg()
instead of ruxagg_locked() and use it from thread_exit().

Diagnosed and tested by:	neel
MFC after:	3 days
2010-05-24 10:23:49 +00:00
Konstantin Belousov
bed4c52416 Implement RUSAGE_THREAD. Add td_rux to keep extended runtime and ticks
information for thread to allow calcru1() (re)use.

Rename ruxagg()->ruxagg_locked(), ruxagg_tlock()->ruxagg() [1].
The ruxagg_locked() function no longer clears thread ticks nor
td_incruntime.

Requested by:	attilio [1]
Discussed with:	attilio, bde
Reviewed by:	bde
Based on submission by:	Alexander Krizhanovsky <ak natsys-lab com>
MFC after:	1 week
X-MFC-Note:	td_rux shall be moved to the end of struct thread
2010-05-04 05:55:37 +00:00
Konstantin Belousov
3087dc40a9 Extract thread_lock()/ruxagg()/thread_unlock() fragment into utility
function ruxagg_tlock().
Convert the definition of kern_getrusage() to ANSI C.

Submitted by:	Alexander Krizhanovsky <ak natsys-lab com>
MFC after:	1 week
2010-05-01 14:46:17 +00:00
Randall Stewart
bec67fd3bb sched_getparam was just plain broke for time-share
processes. It did not return an error but instead
just let garbage be passed back. This I fix so
it actually properly translates the priority the
process is at to a posix's high means more priority.
I also fix it so that if the ULE scheduler has bumped
it up to a realtime process you get back a sane value
i.e. the highest priority (63 for time-share).

sched_setscheduler() had the setting of the
timeshare class priority disabled. With some notes
about rejecting the posix high numbers is greater
priority and use nice instead. This fix also
adjusts that to work, with the cavet that a t-s
process may well get bumped up or down i.e. the
setscheduler() will NOT change the nice value only
the current priority. I think this is reasonable
considering if the user wants to play with nice then
he can. At least all the posix'ish interfaces now
respond sanely.

MFC after:	3 weeks
2010-03-03 21:46:51 +00:00
Konstantin Belousov
3364c323e6 Implement global and per-uid accounting of the anonymous memory. Add
rlimit RLIMIT_SWAP that limits the amount of swap that may be reserved
for the uid.

The accounting information (charge) is associated with either map entry,
or vm object backing the entry, assuming the object is the first one
in the shadow chain and entry does not require COW. Charge is moved
from entry to object on allocation of the object, e.g. during the mmap,
assuming the object is allocated, or on the first page fault on the
entry. It moves back to the entry on forks due to COW setup.

The per-entry granularity of accounting makes the charge process fair
for processes that change uid during lifetime, and decrements charge
for proper uid when region is unmapped.

The interface of vm_pager_allocate(9) is extended by adding struct ucred *,
that is used to charge appropriate uid when allocation if performed by
kernel, e.g. md(4).

Several syscalls, among them is fork(2), may now return ENOMEM when
global or per-uid limits are enforced.

In collaboration with:	pho
Reviewed by:	alc
Approved by:	re (kensmith)
2009-06-23 20:45:22 +00:00
David Xu
300fa5ef6e Don't rearm callout if the process is exiting, it may leak a callout
because callout_drain() only waits for running callout, but not disable
it if it is rearmed.
2008-10-24 01:09:24 +00:00
Dag-Erling Smørgrav
1ede983cc9 Retire the MALLOC and FREE macros. They are an abomination unto style(9).
MFC after:	3 months
2008-10-23 15:53:51 +00:00
Ed Schouten
c27991e819 Fix a small typo in a comment in calcru1().
The word "happene" should read "happened".

Submitted by:	Jille Timmermans <jille quis cx>
2008-09-05 15:55:06 +00:00
Ed Schouten
bc093719ca Integrate the new MPSAFE TTY layer to the FreeBSD operating system.
The last half year I've been working on a replacement TTY layer for the
FreeBSD kernel. The new TTY layer was designed to improve the following:

- Improved driver model:

  The old TTY layer has a driver model that is not abstract enough to
  make it friendly to use. A good example is the output path, where the
  device drivers directly access the output buffers. This means that an
  in-kernel PPP implementation must always convert network buffers into
  TTY buffers.

  If a PPP implementation would be built on top of the new TTY layer
  (still needs a hooks layer, though), it would allow the PPP
  implementation to directly hand the data to the TTY driver.

- Improved hotplugging:

  With the old TTY layer, it isn't entirely safe to destroy TTY's from
  the system. This implementation has a two-step destructing design,
  where the driver first abandons the TTY. After all threads have left
  the TTY, the TTY layer calls a routine in the driver, which can be
  used to free resources (unit numbers, etc).

  The pts(4) driver also implements this feature, which means
  posix_openpt() will now return PTY's that are created on the fly.

- Improved performance:

  One of the major improvements is the per-TTY mutex, which is expected
  to improve scalability when compared to the old Giant locking.
  Another change is the unbuffered copying to userspace, which is both
  used on TTY device nodes and PTY masters.

Upgrading should be quite straightforward. Unlike previous versions,
existing kernel configuration files do not need to be changed, except
when they reference device drivers that are listed in UPDATING.

Obtained from:		//depot/projects/mpsafetty/...
Approved by:		philip (ex-mentor)
Discussed:		on the lists, at BSDCan, at the DevSummit
Sponsored by:		Snow B.V., the Netherlands
dcons(4) fixed by:	kan
2008-08-20 08:31:58 +00:00
Pawel Jakub Dawidek
4682cd0b7d Remove extra uihold() call that accidentally sneak in during perforce
change @125544.
2008-03-19 07:52:07 +00:00
Jeff Roberson
374ae2a393 - Relax requirements for p_numthreads, p_threads, p_swtick, and p_nice from
requiring the per-process spinlock to only requiring the process lock.
 - Reflect these changes in the proc.h documentation and consumers throughout
   the kernel.  This is a substantial reduction in locking cost for these
   fields and was made possible by recent changes to threading support.
2008-03-19 06:19:01 +00:00
Pawel Jakub Dawidek
709446e782 Whitespace cleanups. 2008-03-16 21:32:20 +00:00
Pawel Jakub Dawidek
1b072fbcab - Use wait-free method to manage ui_sbsize and ui_proccnt fields in the
uidinfo structure. This entirely removes contention observed on the
  ui_mtxp mutex (as it is now gone).
- Convert the uihashtbl_mtx mutex to a rwlock, as most of the time we just
  need to read-lock it.

Reviewed by:	jhb, jeff, kris & others
Tested by:	kris
2008-03-16 21:29:02 +00:00
Pawel Jakub Dawidek
e056770745 Style fixes. 2008-03-16 18:26:59 +00:00
Pawel Jakub Dawidek
67e83b07c6 Fix information leak. We can find PIDs of running processes from within
a jail, etc. by simply calling setpriority(PRIO_PROCESS, <PID>, 0) and
checking the return value: 0 means that the process exists and -1 that
it doesn't exist.

Reviewed by:	rwatson
MFC after:	1 week
2008-03-16 17:55:06 +00:00
Jeff Roberson
6617724c5f Remove kernel support for M:N threading.
While the KSE project was quite successful in bringing threading to
FreeBSD, the M:N approach taken by the kse library was never developed
to its full potential.  Backwards compatibility will be provided via
libmap.conf for dynamically linked binaries and static binaries will
be broken.
2008-03-12 10:12:01 +00:00
Robert Watson
d92909c1d4 Don't zero td_runtime when billing thread CPU usage to the process;
maintain a separate td_incruntime to hold unbilled CPU usage for
the thread that has the previous properties of td_runtime.

When thread information is requested using the thread monitoring
sysctls, export thread td_runtime instead of process rusage runtime
in kinfo_proc.

This restores the display of individual ithread and other kernel
thread CPU usage since inception in ps -H and top -SH, as well for
libthr user threads, valuable debugging information lost with the
move to try kthreads since they are no longer independent processes.

There is universal agreement that we should rewrite the process and
thread export sysctls, but this commit gets things going a bit
better in the mean time.  Likewise, there are resevations about the
continued validity of statclock given the speed of modern processors.

Reviewed by:		attilio, emaste, jhb, julian
2008-01-10 22:11:20 +00:00
David Xu
435806d31b Fix LOR of thread lock and umtx's priority propagation mutex due
to the reworking of scheduler lock.

MFC: after 3 days
2007-12-11 08:25:36 +00:00
Jeff Roberson
fb62eea266 - Use ruxagg() in calcru() to make sure we have current tick information
from all threads.

Discussed with:	bde, attilio
Approved by:	re
2007-07-17 01:08:09 +00:00
John Baldwin
59d8f3ff08 Fix a couple of issues with the stack limit for 32-bit processes on 64-bit
kernels exposed by the recent fixes to resource limits for 32-bit processes
on 64-bit kernels:
- Let ABIs expose their maximum stack size via a new pointer in sysentvec
  and use that in preference to maxssiz during exec() rather than always
  using maxssiz for all processses.
- Apply the ABI's limit fixup to the previous stack size when adjusting
  RLIMIT_STACK to determine if the existing mapping for the stack needs to
  be grown or shrunk (as well as how much it should be grown or shrunk).

Approved by:	re (kensmith)
2007-07-12 18:01:31 +00:00