Commit Graph

172 Commits

Author SHA1 Message Date
Eric van Gyzen
3f8455b090 Add clock_nanosleep()
Add a clock_nanosleep() syscall, as specified by POSIX.
Make nanosleep() a wrapper around it.

Attach the clock_nanosleep test from NetBSD. Adjust it for the
FreeBSD behavior of updating rmtp only when interrupted by a signal.
I believe this to be POSIX-compliant, since POSIX mentions the rmtp
parameter only in the paragraph about EINTR. This is also what
Linux does. (NetBSD updates rmtp unconditionally.)

Copy the whole nanosleep.2 man page from NetBSD because it is complete
and closely resembles the POSIX description. Edit, polish, and reword it
a bit, being sure to keep any relevant text from the FreeBSD page.

Reviewed by:	kib, ngie, jilles
MFC after:	3 weeks
Relnotes:	yes
Sponsored by:	Dell EMC
Differential Revision:	https://reviews.freebsd.org/D10020
2017-03-19 00:51:12 +00:00
Eric van Gyzen
4cf66812ea nanosleep: plug a kernel memory disclosure
nanosleep() updates rmtp on EINVAL.  In that case, kern_nanosleep()
has not updated rmt, so sys_nanosleep() updates the user-space rmtp
by copying garbage from its stack frame.  This is not only a kernel
memory disclosure, it's also not POSIX-compliant.  Fix it to update
rmtp only on EINTR.

Reviewed by:	jilles (via D10020), dchagin
MFC after:	3 days
Security:	possibly
Sponsored by:	Dell EMC
Differential Revision:	https://reviews.freebsd.org/D10044
2017-03-18 20:16:23 +00:00
Conrad Meyer
90a79ac576 Use time_t for intermediate values to avoid overflow in clock_ts_to_ct
Add additionally safety and overflow checks to clock_ts_to_ct and the
BCD routines while we're here.

Perform a safety check in sys_clock_settime() first to avoid easy local
root panic, without having to propagate an error value back through
dozens of APIs currently lacking error returns.

PR:		211960, 214300
Submitted by:	Justin McOmie <justin.mcomie at gmail.com>, kib@
Reported by:	Tim Newsham <tim.newsham at nccgroup.trust>
Reviewed by:	kib@
Sponsored by:	Dell EMC Isilon, FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D9279
2017-01-24 18:05:29 +00:00
Ed Maste
69a2875821 Renumber license clauses in sys/kern to avoid skipping #3 2016-09-15 13:16:20 +00:00
Konstantin Belousov
1822421c0b Remove Giant from settime(), tc_setclock_mtx guards tc_windup() calls,
and there is no other issues with parallel settime().  Remove spl()
vestiges there as well.

Tested by:	pho (as part of the whole patch)
Reviewed by:	jhb (same)
Discussed wit:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	1 month
Differential revision:	https://reviews.freebsd.org/D7302
2016-07-27 11:54:24 +00:00
Konstantin Belousov
de56aee0bf Trace timeval parameters to the getitimer(2) and setitimer(2) syscalls.
Reviewed by:	jhb
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D7158
2016-07-13 14:37:58 +00:00
Pedro F. Giffuni
55e0987aea sys: extend use of the howmany() macro when available.
We have a howmany() macro in the <sys/param.h> header that is
convenient to re-use as it makes things easier to read.
2016-04-26 15:38:17 +00:00
Dmitry Chagin
3e18d701de Verify that tv_sec value specified in settimeofday() and clock_settime()
(CLOCK_REALTIME case) system calls is non negative.
This commit hides a kernel panic in atrtc_settime() as the clock_ts_to_ct()
does not properly convert negative tv_sec.

ps. in my opinion clock_ts_to_ct() should be rewritten to properly handle
negative tv_sec values.

Differential Revision:	https://reviews.freebsd.org/D4714
Reviewed by:		kib

MFC after:	1 week
2015-12-27 15:37:07 +00:00
Ian Lepore
f8f02928d3 Fix an off by one in ppsratecheck(). If you asked for N=1 you'd get one,
but for any N>1 you'd get N-1 packets/events per second.
2015-01-11 20:48:29 +00:00
Dmitry Chagin
47ce3e5326 Allow clock_getcpuclockid() on the CPU-time clock for zombie process.
Posix does not prohibit this.

Differential Revision:	https://reviews.freebsd.org/D1470

Reviewed by:	kib
MFC after:	1 week
2015-01-10 07:22:38 +00:00
Konstantin Belousov
5c7bebf961 The process spin lock currently has the following distinct uses:
- Threads lifetime cycle, in particular, counting of the threads in
  the process, and interlocking with process mutex and thread lock.
  The main reason of this is that turnstile locks are after thread
  locks, so you e.g. cannot unlock blockable mutex (think process
  mutex) while owning thread lock.

- Virtual and profiling itimers, since the timers activation is done
  from the clock interrupt context.  Replace the p_slock by p_itimmtx
  and PROC_ITIMLOCK().

- Profiling code (profil(2)), for similar reason.  Replace the p_slock
  by p_profmtx and PROC_PROFLOCK().

- Resource usage accounting.  Need for the spinlock there is subtle,
  my understanding is that spinlock blocks context switching for the
  current thread, which prevents td_runtime and similar fields from
  changing (updates are done at the mi_switch()).  Replace the p_slock
  by p_statmtx and PROC_STATLOCK().

The split is done mostly for code clarity, and should not affect
scalability.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-11-26 14:10:00 +00:00
Bjoern A. Zeeb
e346f8c452 Split up sys_ktimer_getoverrun() into a sys_ and a kern_ variant
and export the kern_ version needed by an upcoming linuxolator change.

MFC after:	3 days
Sponsored by:	DARPA,AFRL
2014-08-07 16:49:50 +00:00
Alexander Motin
d10a1df8d7 Fix VIRTUAL and PROF interval timers for short intervals, broken at r247903.
Due to the way those timers are implemented, we can't handle very short
intervals.  In addition to that mentioned patch caused math overflows
for short intervals.  To avoid that round those intervals to 1 tick.

PR:		kern/187668
MFC after:	1 week
2014-04-16 18:37:46 +00:00
Konstantin Belousov
643ee87175 Implement compat32 wrappers for the ktimer_* syscalls.
Reported, reviewed and tested by:	Petr Salinger <Petr.Salinger@seznam.cz>
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2013-07-21 19:43:52 +00:00
Konstantin Belousov
d31e4b3a58 id_t is 64bit, provide the compat32 wrapper for clock_getcpuclockid2(2).
Reported and tested by:	Petr Salinger <Petr.Salinger@seznam.cz>
PR:	threads/180652
Sponsored by:	The FreeBSD Foundation
2013-07-20 13:39:41 +00:00
Konstantin Belousov
c7c536c7f5 Allow to call clock_gettime() on the clock id for zombie process.
Reported by:	Petr Salinger <Petr.Salinger@seznam.cz>
PR:	threads/180496
Sponsored by:	The FreeBSD Foundation
2013-07-13 19:32:50 +00:00
Alexander Motin
0dbf17e6eb Make kern_nanosleep() and pause_sbt() to use per-CPU sleep queues.
This removes significant sleep queue lock congestion on multithreaded
microbenchmarks, making them scale to multiple CPUs almost linearly.
2013-03-12 06:58:49 +00:00
Alexander Motin
b5ea3779da Reduce minimal time intervals of setitimer(2) from 1/HZ to 1/(16*HZ) by
using callout_reset_sbt() instead of callout_reset().  We can't remove
lower limit completely in this case because of significant processing
overhead, caused by unability to use direct callout execution due to using
process mutex in callout handler for sending SEGALRM signal.  With support
of periodic events that would allow unprivileged user to abuse the system.

Reviewed by:	davide
2013-03-06 22:40:47 +00:00
Alexander Motin
980c545d76 Fix time math overflows and improve zero intervals handling in poll(),
select(), nanosleep() and kevent() functions after calloutng changes.

Reported by:	bde
2013-03-06 19:37:38 +00:00
Davide Italiano
098176f0d0 MFcalloutng:
kern_nanosleep() is now converted to use tsleep_sbt(). With this change
nanosleep() and usleep() can handle sub-tick precision for timeouts.
Also, try to help coalesce of events passing as argument to tsleep_bt()
a precision value calculated as a percentage of the sleep time.
This percentage is default 5%, but it can tuned according to users
need via the sysctl interface.

Sponsored by:	Google Summer of Code 2012, iXsystems inc.
Tested by:	flo, marius, ian, markj, Fabian Keil
2013-03-04 15:57:41 +00:00
Konstantin Belousov
f7e50ea722 Fix a race between kern_setitimer() and realitexpire(), where the
callout is started before kern_setitimer() acquires process mutex, but
looses a race and kern_setitimer() gets the process mutex before the
callout.  Then, assuming that new specified struct itimerval has
it_interval zero, but it_value non-zero, the callout, after it starts
executing again, clears p->p_realtimer.it_value, but kern_setitimer()
already rescheduled the callout.

As the result of the race, both p_realtimer is zero, and the callout
is rescheduled. Then, in the exit1(), the exit code sees that it_value
is zero and does not even try to stop the callout. This allows the
struct proc to be reused and eventually the armed callout is
re-initialized.  The consequence is the corrupted callwheel tailq.

Use process mutex to interlock the callout start, which fixes the race.

Reported and tested by:	pho
Reviewed by:	jhb
MFC after:	2 weeks
2012-12-04 20:49:39 +00:00
David Xu
d65f1abca7 Implement syscall clock_getcpuclockid2, so we can get a clock id
for process, thread or others we want to support.
Use the syscall to implement POSIX API clock_getcpuclock and
pthread_getcpuclockid.

PR:	168417
2012-08-17 02:26:31 +00:00
Kip Macy
8451d0dd78 In order to maximize the re-usability of kernel code in user space this
patch modifies makesyscalls.sh to prefix all of the non-compatibility
calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel
entry points and all places in the code that use them. It also
fixes an additional name space collision between the kernel function
psignal and the libc function of the same name by renaming the kernel
psignal kern_psignal(). By introducing this change now we will ease future
MFCs that change syscalls.

Reviewed by:	rwatson
Approved by:	re (bz)
2011-09-16 13:58:51 +00:00
David Xu
cf7d9a8ca8 Create a global thread hash table to speed up thread lookup, use
rwlock to protect the table. In old code, thread lookup is done with
process lock held, to find a thread, kernel has to iterate through
process and thread list, this is quite inefficient.
With this change, test shows in extreme case performance is
dramatically improved.

Earlier patch was reviewed by: jhb, julian
2010-10-09 02:50:23 +00:00
Robert Watson
885868cd8f Remove VOP_LEASE and supporting functions. This hasn't been used since
the removal of NQNFS, but was left in in case it was required for NFSv4.
Since our new NFSv4 client and server can't use it for their
requirements, GC the old mechanism, as well as other unused lease-
related code and interfaces.

Due to its impact on kernel programming and binary interfaces, this
change should not be MFC'd.

Proposed by:    jeff
Reviewed by:    jeff
Discussed with: rmacklem, zach loafman @ isilon
2009-04-10 10:52:19 +00:00
Colin Percival
3f935cf342 Correctly sanity-check timer IDs. [SA-09:06]
Limit the size of malloced buffer when dumping environment
variables. [EN-09:01]

Approved by:	so (cperciva)
Approved by:	re (kensmith)
Security:	FreeBSD-SA-09:06.ktimer
Errata:		FreeBSD-EN-09:01.kenv
2009-03-23 00:00:50 +00:00
Ed Schouten
c90c9021e9 Remove even more unneeded variable assignments.
kern_time.c:
- Unused variable `p'.

kern_thr.c:
- Variable `error' is always caught immediately, so no reason to
  initialize it. There is no way that error != 0 at the end of
  create_thread().

kern_sig.c:
- Unused variable `code'.

kern_synch.c:
- `rval' is always assigned in all different cases.

kern_rwlock.c:
- `v' is always overwritten with RW_UNLOCKED further on.

kern_malloc.c:
- `size' is always initialized with the proper value before being used.

kern_exit.c:
- `error' is always caught and returned immediately. abort2() never
  returns a non-zero value.

kern_exec.c:
- `len' is always assigned inside the if-statement right below it.

tty_info.c:
- `td' is always overwritten by FOREACH_THREAD_IN_PROC().

Found by:	LLVM's scan-build
2009-02-26 15:51:54 +00:00
David Xu
d6b6592ec0 In realtimer_delete(), clear timer's value and interval to tell
realtimer_expire() to not rearm the timer, otherwise there is a chance
that a callout will be left there and be tiggered in future unexpectly.

Bug reported by: tegge@
2008-10-20 02:37:53 +00:00
David Xu
0e17ccbe36 Make sure reading td_runtime in critical section since thread may be
preempted and td_runtime will be modified.
2008-01-18 13:00:28 +00:00
David Xu
00d6ac63cd Add POSIX clock id CLOCK_THREAD_CPUTIME_ID, this can be used to measure
per-thread runtime in user code.
2008-01-18 07:04:42 +00:00
Attilio Rao
a1fe14bc33 rufetch and calcru sometimes should be called atomically together.
This patch fixes places where they should be called atomically changing
their locking requirements (both assume per-proc spinlock held) and
introducing rufetchcalc which wrappers both calls to be performed in
atomic way.

Reviewed by: jeff
Approved by: jeff (mentor)
2007-06-09 21:48:44 +00:00
Jeff Roberson
982d11f836 Commit 14/14 of sched_lock decomposition.
- Use thread_lock() rather than sched_lock for per-thread scheduling
   sychronization.
 - Use the per-process spinlock rather than the sched_lock for per-process
   scheduling synchronization.

Tested by:      kris, current@
Tested on:      i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
2007-06-05 00:00:57 +00:00
Robert Watson
c14d15ae3e Remove MAC Framework access control check entry points made redundant with
the introduction of priv(9) and MAC Framework entry points for privilege
checking/granting.  These entry points exactly aligned with privileges and
provided no additional security context:

- mac_check_sysarch_ioperm()
- mac_check_kld_unload()
- mac_check_settime()
- mac_check_system_nfsd()

Add mpo_priv_check() implementations to Biba and LOMAC policies, which,
for each privilege, determine if they can be granted to processes
considered unprivileged by those two policies.  These mostly, but not
entirely, align with the set of privileges granted in jails.

Obtained from:	TrustedBSD Project
2007-04-22 15:31:22 +00:00
Robert Watson
873fbcd776 Further system call comment cleanup:
- Remove also "MP SAFE" after prior "MPSAFE" pass. (suggested by bde)
- Remove extra blank lines in some cases.
- Add extra blank lines in some cases.
- Remove no-op comments consisting solely of the function name, the word
  "syscall", or the system call name.
- Add punctuation.
- Re-wrap some comments.
2007-03-05 13:10:58 +00:00
Robert Watson
0c14ff0eb5 Remove 'MPSAFE' annotations from the comments above most system calls: all
system calls now enter without Giant held, and then in some cases, acquire
Giant explicitly.

Remove a number of other MPSAFE annotations in the credential code and
tweak one or two other adjacent comments.
2007-03-04 22:36:48 +00:00
David Xu
843b99c6f7 - Remove third parameter of itimer_find, the parameter is always zero.
- Call callout_drain on deleting POSIX timer.
- Use kern_timer_delete in exiting hook.
2006-11-28 03:24:34 +00:00
Tom Rhodes
6aeb05d7be Merge posix4/* into normal kernel hierarchy.
Reviewed by:	glanced at by jhb
Approved by:	silence on -arch@ and -standards@
2006-11-11 16:26:58 +00:00
Robert Watson
acd3428b7d Sweep kernel replacing suser(9) calls with priv(9) calls, assigning
specific privilege names to a broad range of privileges.  These may
require some future tweaking.

Sponsored by:           nCircle Network Security, Inc.
Obtained from:          TrustedBSD Project
Discussed on:           arch@
Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri,
                        Alex Lyashkov <umka at sevcity dot net>,
                        Skip Ford <skip dot ford at verizon dot net>,
                        Antoine Brodin <antoine dot brodin at laposte dot net>
2006-11-06 13:42:10 +00:00
Robert Watson
aed5570872 Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.h
begun with a repo-copy of mac.h to mac_framework.h.  sys/mac.h now
contains the userspace and user<->kernel API and definitions, with all
in-kernel interfaces moved to mac_framework.h, which is now included
across most of the kernel instead.

This change is the first step in a larger cleanup and sweep of MAC
Framework interfaces in the kernel, and will not be MFC'd.

Obtained from:	TrustedBSD Project
Sponsored by:	SPARTA
2006-10-22 11:52:19 +00:00
Poul-Henning Kamp
94d67e0fb8 Move tz_minuteswest and tz_dsttime to subr_clock.c 2006-10-02 16:06:26 +00:00
Poul-Henning Kamp
f645b0b51c First part of a little cleanup in the calendar/timezone/RTC handling.
Move relevant variables to <sys/clock.h> and fix #includes as necessary.

Use libkern's much more time- & spamce-efficient BCD routines.
2006-10-02 12:59:59 +00:00
Alexander Leidinger
993182e57c - Change process_exec function handlers prototype to include struct
image_params arg.
- Change struct image_params to include struct sysentvec pointer and
  initialize it.
- Change all consumers of process_exit/process_exec eventhandlers to
  new prototypes (includes splitting up into distinct exec/exit functions).
- Add eventhandler to userret.

Sponsored by:		Google SoC 2006
Submitted by:		rdivacky
Parts suggested by:	jhb (on hackers@)
2006-08-15 12:10:57 +00:00
David Xu
aff5bcb1b2 INT_MAX is defined in file sys/limits.h, include the file now. 2006-08-02 07:34:51 +00:00
David Xu
61d3a4efc2 Let kernel POSIX timer code and mqueue code to use integer as a resource
handle, the timer_t and mqd_t types will be a pointer which userland
will define it.
2006-03-01 06:29:34 +00:00
David Xu
3e70c6f047 Fix compiling warning on 64 bits system. 2005-12-09 13:16:48 +00:00
David Xu
d26b1a1fb9 Register itimers_event_hook as a kernel event handler, so I don't
have to duplicate code to call it in exec() and exit1().
2005-12-09 05:43:26 +00:00
David Xu
77e718f773 1. Set timer configuration values for sysconf().
2. Set overrun limit to INT_MAX, report ERANGE error if overrun will be
   greater than INT_MAX.
2005-12-01 07:56:15 +00:00
Robert Watson
5e758b9561 Add several aliases for existing clockid_t names to indicate that the
application wishes to request high precision time stamps be returned:

Alias                           Existing

CLOCK_REALTIME_PRECISE          CLOCK_REALTIME
CLOCK_MONOTONIC_PRECISE         CLOCK_MONOTONIC
CLOCK_UPTIME_PRECISE            CLOCK_UPTIME

Add experimental low-precision clockid_t names corresponding to these
clocks, but implemented using cached timestamps in kernel rather than
a full time counter query.  This offers a minimum update rate of 1/HZ,
but in practice will often be more frequent due to the frequency of
time stamping in the kernel:

New clockid_t name              Approximates existing clockid_t

CLOCK_REALTIME_FAST             CLOCK_REALTIME
CLOCK_MONOTONIC_FAST            CLOCK_MONOTONIC
CLOCK_UPTIME_FAST               CLOCK_UPTIME

Add one additional new clockid_t, CLOCK_SECOND, which returns the
current second without performing a full time counter query or cache
lookup overhead to make sure the cached timestamp is stable.  This is
intended to support very low granularity consumers, such as time(3).

The names, visibility, and implementation of the above are subject
to change, and will not be MFC'd any time soon.  The goal is to
expose lower quality time measurement to applications willing to
sacrifice accuracy in performance critical paths, such as when taking
time stamps for the purpose of rescheduling select() and poll()
timeouts.  Future changes might include retrofitting the time counter
infrastructure to allow the "fast" time query mechanisms to use a
different time counter, rather than a cached time counter (i.e.,
TSC).

NOTE: With different underlying time mechanisms exposed, using
different time query mechanisms in the same application may result in
relative non-monoticity or the appearance of clock stalling for a
single clockid_t, as a cached time stamp queried after a precision
time stamp lookup may be "before" the time returned by the earlier
live time counter query.
2005-11-27 00:55:18 +00:00
Andre Oppermann
5eefd88949 Add CLOCK_UPTIME to clock_gettime(2) reporting the current
uptime measured in SI seconds.

Sponsored by:	TCP/IP Optimization Fundraise 2005
2005-11-18 16:51:13 +00:00
David Xu
8f0371f19d Fix name compatible problem with POSIX standard. the sigval_ptr and
sigval_int really should be sival_ptr and sival_int.
Also sigev_notify_function accepts a union sigval value but not a
pointer.
2005-11-04 09:41:00 +00:00