* Fix a race in chunk_dealloc_dss().
* Check for allocation failure before zeroing memory in base_calloc().
Merge enhancements from a divergent version of jemalloc:
* Convert thread-specific caching from magazines to an algorithm that is
more tunable, and implement incremental GC.
* Add support for medium size classes, [4KiB..32KiB], 2KiB apart by
default.
* Add dirty page tracking for pages within active small/medium object
runs. This allows malloc to track precisely which pages are in active
use, which makes dirty page purging more effective.
* Base maximum dirty page count on proportion of active memory.
* Use optional zeroing in arena_chunk_alloc() to avoid needless zeroing
of chunks. This is useful in the context of DSS allocation, since a
long-lived application may commonly recycle chunks.
* Increase the default chunk size from 1MiB to 4MiB.
Remove feature:
* Remove the dynamic rebalancing code, since thread caching reduces its
utility.
- Make sure the mode argument is either a character or a block device.
- Use S_IS*() instead of checking S_IF*-flags by hand.
- Don't use kern.devname when the argument is already NODEV.
- Always call snprintf with the proper amount of arguments corresponding
with the format.
- Perform some whitespace fixes. Tabs instead of 4 spaces, missing space
for return statement.
- Remove unneeded includes.
When we had utmp(5), we had to list all the psuedo-terminals in ttys(5)
to make ttyslot(3) function properly. Now that pututxline(3) deals with
slot allocation internally (not based on TTY names), we don't need to
list all the TTYs on the system in ttys(5) to make user accounting work
properly.
This patch removes all the entries from the /etc/ttys files, but also
the pts(4) entries that were appended implicitly, which was added in
r154838.
Continuous catopen() calls cause 4 failig stat(2) each, which means a lot
of overhead. It is also a good idea to keep the opened catalogs in the memory
to speed up further catopen() calls to the same catalog since these catalogs
are not big at all. In this case, we count references and only free() the
allocated space when the reference count reaches 0. The reads and writes to
the cache are syncronized with an rwlock when these functions are called from
a threaded program.
Requested by: kib
Approved by: delphij
I've noticed many applications do a bad job at timekeeping, for several
reasons:
- Applications like screen(1) don't update time records when restoring
the old user login record.
- Many applications only set ut_tv.tv_sec, not ut_tv.tv_usec.
This causes many problems for tools such as ac(8), which require the
timestamps to be properly ordered. This is why I've decided to let the
utmpx code obtain valid timestamps itself.
I've discussed this issue with the Austin Group and it will be fixed in
future revisions of the specification. The issue was that ut_line fields
weren't supposed to be valid for LOGIN_PROCESS entries, while
getutxline() would try to match these records anyway.
They also agreed on our way of implementing pututxline() without
getutxid() (which other operating systems also do), but unfortunately
they disagreed with our way of replacing DEAD_PROCESS entries, which is
a pity. The current specification allows the utmpx database to become
infinitely big over time.
See also: http://austingroupbugs.net/view.php?id=213#c378
It makes hardly any sense to expose a symbol which should only be
provided for binary compatibility, but it seems we don't have a lot of
choice here. There are many autoconf scripts out there that try to
create a binary that links against the old symbol to see whether
uname(3) is present. These scripts fail to detect uname(3) now.
It should be noted that the behaviour we implement is not against the
standards:
| The following shall be declared as a function and may also be defined
| as a macro:
|
| int uname(struct utsname *);
POSIX 2008 and XSI 7require strcoll() for opendir() is not true.
I can't find such requirement in POSIX 2008 and XSI 7.
So, back out that part of my commit, returning old strcmp(), and remove
this misleading comment.
It also matches now how our 'ls' works for years.
b) Remove comment expressed 2 fears:
1) One just simple describe how strcoll() works in _any_ context,
not for directories only. Are we plan to remove strcoll() from everything
just because it is little more complex than strcmp()? I doubt, and
directories give nothing different here. Moreover, strcoll() used
in 'ls' for years and nobody complaints yet.
2) Plain wrong statement about undefined strcoll() behaviour. strcoll()
always gives predictable results, falling back to strcmp() on any
trouble, see strcoll(3).
No objections from -current list discussion.
- Massively reduce BSS usage. Let futx_to_utx() dynamically allocate the
structure. There is only a very small amount of applications out there
that needs to use the utmpx database. Wasting 1 KB on unused
structures makes little sense.
- Just let getutxid() search for matching ut_id's for any *PROCESS-type.
This makes the code a bit more future-proof.
- Fix a POSIX-mistake: when reading POSIX and the OpenSolaris
implementation, getutxline() must return USER_PROCESS and
LOGIN_PROCESS records whose ut_lines match. When reading POSIX, it
seems LOGIN_PROCESS should not use ut_line at the first place. I have
reported this issue.
After comparing how other systems deal with utmp/utmpx, I noticed many
systems don't even care about ttyslot(3) anymore, since utmpx doesn't
use TTY slots anyway. We don't provide any tools to access old utmp
files anymore, so there is no use in letting applications write to a
proper offset within the utmp file.
Just let ttyslot(3) always return 0, which seems to be the default
behaviour on operating systems like Linux as well.
Nowadays uname(3) is an inline function around __xuname(3). Prevent
linkage of new binaries against this compatibility function, similar to
what I did with ttyslot(3).
This utility allows users to convert their wtmp databases to the new
format. It makes no sense for users to keep their wtmp log files if they
are unable to view them.
It basically copies ut_line into ut_id as well. This makes it possible
for last(1) and ac(8) to match login records with their corresponding
logout record.
I forgot to cast the size_t's back to off_t before negating them,
causing all sorts of artifacts where the log files would grow to 2^32 -
197 bytes.
Reported by: ume
Even though we use __sym_compat(), we should list the symbol in
Symbol.map.
ttyslot() is now listed as follows, which seems to do the right thing:
| Symbol table '.dynsym' contains 2755 entries:
| Num: Value Size Type Bind Vis Ndx Name
| 613: 00000000000477b0 121 FUNC GLOBAL DEFAULT 10 ttyslot@FBSD_1.0
Reported by: kib
Phase out ttyslot(3).
The ttyslot() function was originally part for SUSv1, marked LEGACY in
SUSv2 and removed later on. This function only makes sense when using
utmp(5), because it was used to determine the offset of the record for
the controlling TTY. It makes little sense to keep it here, because the
new utmpx file format doesn't index based on TTY slots.
The ttyslot() function was originally part for SUSv1, marked LEGACY in
SUSv2 and removed later on. This function only makes sense when using
utmp(5), because it was used to determine the offset of the record for
the controlling TTY. It makes little sense to keep it here, because the
new utmpx file format doesn't index based on TTY slots.
The utmpx interface is the standardized interface of the user accounting
database. The standard only defines a subset of the functions that were
present in System V-like systems.
I'd like to highlight some of the traits my implementation has:
- The standard allows the on-disk format to be different than the
in-memory representation (struct utmpx). Most operating systems don't
do this, but we do. This allows us to keep our ABI more stable, while
giving us the opportunity to modify the on-disk format. It also allows
us to use a common file format across different architectures (i.e.
byte ordering).
- Our implementation of pututxline() also updates wtmp and lastlog (now
called utx.log and utx.lastlogin). This means the databases are more
likely to be in sync.
- Care must be taken that our implementation discard any fields that are
not applicable. For example, our DEAD_PROCESS records do not hold a
TTY name. Just a time stamp, a record identifier and a process
identifier. It also guarantees that strings (ut_host, ut_line and
ut_user) are null terminated. ut_id is obviously not null terminated,
because it's not a string.
- The API and its behaviour should be conformant to POSIX, but there may
be things that slightly deviate from the standard. This implementation
uses separate file descriptors when writing to the log files. It also
doesn't use getutxid() to search for a field to overwrite. It uses an
allocation strategy similar to getutxid(), but prevents DEAD_PROCESS
records from accumulating.
Make sure libulog doesn't overwrite the manpages shipped with our C
library. Also keep the symbol list in Symbol.map sorted.
I'll bump __FreeBSD_version later this evening. I first want to convert
everything to <utmpx.h> and get rid of <utmp.h>.
Prior to this commit, fread/fwrite calls with size * nmemb > SIZE_MAX
were handled by reading or writing (size_t)(size * nmemb) bytes; for
example, on 32-bit platforms, fread(ptr, 641, 6700417, f) would read 1
byte and indicate that the requested 6700417 blocks had been read.
This commit adds a check for such integer overflows, and treats them as
if an overly large request was passed to read/write; i.e., it sets errno
to EINVAL, sets the error indicator on the file, and returns a short
object count (0, to be specific).
The overflow check involves an integer division, so as a performance
optimization we check first to see if both size and nmemb are less than
2^16; if they are, no overflow is possible and we avoid the division.
We assume here that size_t is at least 32 bits; this appears to be true
on all platforms FreeBSD supports.
Although this commit fixes an integer overflow, it is not likely to have
any security implications, since any program which would be affected by
this bug fix is quite clearly already very confused.
Reviewed by: kib
MFC after: 1 month
r195030 | gonzo | 2009-06-25 19:27:31 -0600 (Thu, 25 Jun 2009) | 4 lines
- Switch to libc softfloat from libgcc implementation. The problem
with latter is that it is not complete, fpsetXXX/fpgetXXX
functions are missing.
r197800 | gonzo | 2009-10-06 00:35:52 -0600 (Tue, 06 Oct 2009) | 3 lines
- curbrk variable for sbrk and brk should be the same
- Add correct variable names to Symbol.map
r195025 | gonzo | 2009-06-25 19:01:50 -0600 (Thu, 25 Jun 2009) | 4 lines
- Move fpgetXXX.c/fpsetXXX.c sources to hardfloat subdir/
to prevenmt them from being mixed up with lib/libc/softfloat
files with the same names
alphasort-like interface to the comparision function required by
qsort() and qsort_r().
For opendir() thunk and alphasort(), comment on why we deviated from
POSIX by using strcmp() instead of strcoll().
Requested and reviewed by: bde
MFC after: 2 weeks
now type sema_t is a structure which can be put in a shared memory area,
and multiple processes can operate it concurrently.
User can either use mmap(MAP_SHARED) + sem_init(pshared=1) or use sem_open()
to initialize a shared semaphore.
Named semaphore uses file system and is located in /tmp directory, and its
file name is prefixed with 'SEMD', so now it is chroot or jail friendly.
In simplist cases, both for named and un-named semaphore, userland code
does not have to enter kernel to reduce/increase semaphore's count.
The semaphore is designed to be crash-safe, it means even if an application
is crashed in the middle of operating semaphore, the semaphore state is
still safely recovered by later use, there is no waiter counter maintained
by userland code.
The main semaphore code is in libc and libthr only has some necessary stubs,
this makes it possible that a non-threaded application can use semaphore
without linking to thread library.
Old semaphore implementation is kept libc to maintain binary compatibility.
The kernel ksem API is no longer used in the new implemenation.
Discussed on: threads@
Std 1003.1-2008. Both Linux and Solaris conforms to the new definitions,
so we better follow too (older glibc used old BSDish alphasort prototype
and corresponding type of the comparision function for scandir). While
there, change the definitions of the functions to ANSI C and fix several
style issues nearby.
Remove requirement for "sys/types.h" include for functions from manpage.
POSIX also requires that alphasort(3) sorts as if strcoll(3) was used,
but leave the strcmp(3) call in the function for now.
Adapt in-tree callers of scandir(3) to new declaration. The fact that
select_sections() from catman(1) could modify supplied struct dirent is
a bug.
PR: standards/142255
MFC after: 2 weeks
r195175. Remove all definitions, documentation, and usage.
fifo_misc.c:
Remove all kqueue tests as fifo_io.c performs all those that
would have remained.
Reviewed by: rwatson
MFC after: 3 weeks
X-MFC note: don't change vlan_link_state() function signature
Fix some wrong usages.
Note: this does not affect generated binaries as this argument is not used.
PR: 137213
Submitted by: Eygene Ryabinkin (initial version)
MFC after: 1 month
of setenv(), putenv() and unsetenv() when dealing with corrupt entries in
environ. They now output a warning and complete their task without error.
MFC after: 1 week
instead of returning an error if a corrupt (not a "name=value" string) entry
in the environ array is detected when (re)-building the internal
environment. This should prevent applications or libraries from
experiencing issues arising from the expectation that these calls will
complete even with corrupt entries. The behavior is now as it was prior to
7.0.
Reviewed by: jilles
MFC after: 1 week
find a variable. Include a note that it must not cause the internal
environment to be generated since malloc() depends upon getenv(). To call
malloc() would create a circular dependency.
Recommended by: green
Approved by: jilles
MFC after: 1 week
The maximum length of a username has nothing to do with the size of the
username in the utmp files. Use MAXLOGNAME, which is defined as 17
(UT_USERSIZE + 1).
The entries in the argv array are not const themselves, but sometimes we
want to fill in const values. Just make the array const and use
__DECONST() to make it const for the execve()-call itself.
Also convert the only K&R prototype to ANSI.
Reviewed by: carvay,
the.infamous.paul@gmail.com,
Joan Picanyol i Puig <lists-freebsd-es@biaix.org>,
Ing . Marcos Luis Ortiz Valmaseda <mlortiz@uci.cu>,
eskanete@gmail.com,
Jose M Rodriguez <josemi@freebsd.jazztel.es>,
Guillermo Hernandez <guillermo@QuerySoft.es>,
dani.doni@gmail.com
long instead of an int when examining the results of select() to look for
RPC requests. Previously this routine would ignore RPC requests to sockets
whose file descriptor mod 64 was greater than 31 on a 64-bit platform.
PR: amd64/141130
Submitted by: liujb of array networks
MFC after: 3 days
copied from NetBSD's manpage, and it also matches the behavior
described by the Open Group's online copy of setpgid.2 at
http://www.opengroup.org/onlinepubs/009695399/functions/setpgid.html
Obtained from: NetBSD
Submitted by: Petros Barbayiannis <petrosbarbayiannis@yahoo.gr>
MFC after: 1 week
**environ entries. This puts non-getenv(3) operations in line with
getenv(3) in that bad environ entries do not cause all operations to
fail. There is still some inconsistency in that getenv(3) in the
absence of any environment-modifying operation does not emit corrupt
environ entry warnings.
I also fixed another inconsistency in getenv(3) where updating the
global environ pointer would not be reflected in the return values.
It would have taken an intermediary setenv(3)/putenv(3)/unsetenv(3)
in order to see the change.
execvPe() is called by _execvpe(), which we added to implement
posix_spawnp(). We just took execvP() and added the envp argument.
Unfortunately we forgot to change the implementation to use envp over
environ.
This fixes the following piece of code:
| char * const arg[2] = { "env", NULL };
| char * const env[2] = { "FOO=BAR", NULL };
| posix_spawnp(NULL, "/usr/bin/env", NULL, NULL, arg, env);
MFC after: 2 weeks
FTS_NOCHDIR option is used. fts_build() could strip a trailing slash
from path name in post-order visit if a path pointing to an empty
directory was given for fts_open().
PR: bin/133907, kern/134513
Reviewed by: das
Approved by: trasz (mentor)
MFC after: 1 month
from SUSv4 XSI. Note that the functions are obsoleted, and only
provided to ease porting from System V-like systems. Since sigpause
already exists in compat with different interface, XSI sigpause is
named xsi_sigpause.
Reviewed by: davidxu
MFC after: 3 weeks
and moving the default initialization of prec into the else clause.
The clang static analyzer erroneously thought that nsec can be used
uninitialized here; it was not actually possible, but better to make
the code clearer. (Clang can't know that sprintf() won't modify *pi
behind the scenes.)
uninitialized. Initialize it to a safe value so that there's no
chance of returning an error if stack garbage happens to be equal to
(size_t)-1 or (size_t)-2.
Found by: Clang static analyzer
MFC after: 7 days
a feature that libstdc++ depends on to simulate the behavior of libc's
internal '__isthreaded' variable. One benefit of this is that _libc_once()
is now private to _once_stub.c.
Requested by: kan
with the additional property that it is safe for routines in libc to use
in both single-threaded and multi-threaded processes. Multi-threaded
processes use the pthread_once() implementation from the threading library
while single-threaded processes use a simplified "stub" version internal
to libc. The libc stub-version of pthread_once() now also uses the
simplified "stub" version as well instead of being a nop.
Reviewed by: deischen, Matthew Fleming @ Isilon
Suggested by: alc
MFC after: 1 week
Many operating systems also provide MAP_ANONYMOUS. It's not hard to
support this ourselves, we'd better add it to make it more likely for
applications to work out of the box.
Reviewed by: alc (mman.h)
well-known race condition, which elimination was the reason for the
function appearance in first place. If sigmask supplied as argument to
pselect() enables a signal, the signal might be delivered before thread
called select(2), causing lost wakeup. Reimplement pselect() in kernel,
making change of sigmask and sleep atomic.
Since signal shall be delivered to the usermode, but sigmask restored,
set TDP_OLDMASK and save old mask in td_oldsigmask. The TDP_OLDMASK
should be cleared by ast() in case signal was not gelivered during
syscall execution.
Reviewed by: davidxu
Tested by: pho
MFC after: 1 month
* retry various system calls on EINTR
* retry the rest after a short read (common if there is more than about 1K
of output)
* block SIGCHLD like system(3) does (note that this does not and cannot
work fully in threaded programs, they will need to be careful with wait
functions)
PR: 90580
MFC after: 1 month
one path. When the list is empty (contain only a NULL pointer), return
EINVAL instead of pretending to succeed, which will cause a NULL pointer
deference in a later fts_read() call.
Noticed by: Christoph Mallon (via rdivacky@)
MFC after: 2 weeks
- Tolerate applications that pass a NULL pointer for the buffer and
claim that the capacity of the buffer is nonzero.
- If an application passes in a non-NULL buffer pointer and claims the
buffer has zero capacity, we should free (well, realloc) it
anyway. It could have been obtained from malloc(0), so failing to
free it would be a small memory leak.
MFC After: 2 weeks
Reported by: naddy
PR: ports/138320
the type argument. This is known to fix some pthread_mutexattr_settype()
invocations, especially when it comes to pulseaudio.
Approved by: kib
deischen (threads)
MFC after: 3 days
- F_READAHEAD: specify the amount for sequential access. The amount is
specified in bytes and is rounded up to nearest block size.
- F_RDAHEAD: Darwin compatible version that use 128KB as the sequential
access size.
A third argument of zero disables the read-ahead behavior.
Please note that the read-ahead amount is also constrainted by sysctl
variable, vfs.read_max, which may need to be raised in order to better
utilize this feature.
Thanks Igor Sysoev for proposing the feature and submitting the original
version, and kib@ for his valuable comments.
Submitted by: Igor Sysoev <is rambler-co ru>
Reviewed by: kib@
MFC after: 1 month
a large page size that is greater than malloc(3)'s default chunk size but
less than or equal to 4 MB, then increase the chunk size to match the large
page size.
Most often, using a chunk size that is less than the large page size is not
a problem. However, consider a long-running application that allocates and
frees significant amounts of memory. In particular, it frees enough memory
at times that some of that memory is munmap()ed. Up until the first
munmap(), a 1MB chunk size is just fine; it's not a problem for the virtual
memory system. Two adjacent 1MB chunks that are aligned on a 2MB boundary
will be promoted automatically to a superpage even though they were
allocated at different times. The trouble begins with the munmap(),
releasing a 1MB chunk will trigger the demotion of the containing superpage,
leaving behind a half-used 2MB reservation. Now comes the real problem.
Unfortunately, when the application needs to allocate more memory, and it
recycles the previously munmap()ed address range, the implementation of
mmap() won't be able to reuse the reservation. Basically, the coalescing
rules in the virtual memory system don't allow this new range to combine
with its neighbor. The effect being that superpage promotion will not
reoccur for this range of addresses until both 1MB chunks are freed at some
point in the future.
Reviewed by: jasone
MFC after: 3 weeks
needed to satisfy static libraries that are compiled with -fpic
and linked into static binary afterwards. Several libraries in
gcc are examples of such static libs.
EV_RECEIPT is useful to disambiguating error conditions when multiple
events structures are passed to kevent(2). The error code is returned
in the data field and EV_ERROR is set.
Approved by: rwatson (co-mentor)
When the EV_DISPATCH flag is used the event source will be disabled
immediately after the delivery of an event. This is similar to the
EV_ONESHOT flag but it doesn't delete the event.
Approved by: rwatson (co-mentor)
Add user events support to kernel events which are not associated with any
kernel mechanism but are triggered by user level code. This is useful for
adding user level events to an event handler that may also be monitoring
kernel events.
Approved by: rwatson (co-mentor)
An additional one coming from http://www.research.att.com/~gsf/testregex/
was not added; at some point the entire AT&T regression test harness
should be imported here.
But that would also mean commitment to fix the uncovered errors.
PR: 130504
Submitted by: Chris Kuklewicz
When I wrote the pseudo-terminal driver for the MPSAFE TTY code, Robert
Watson and I agreed the best way to implement this, would be to let
posix_openpt() create a pseudo-terminal with proper permissions in place
and let grantpt() and unlockpt() be no-ops.
This isn't valid behaviour when looking at the spec. Because I thought
it was an elegant solution, I filed a bug report at the Austin Group
about this. In their last teleconference, they agreed on this subject.
This means that future revisions of POSIX may allow grantpt() and
unlockpt() to be no-ops if an open() on /dev/ptmx (if the implementation
has such a device) and posix_openpt() already do the right thing.
I'd rather put this in the manpage, because simply mentioning we don't
comply to any standard makes it look worse than it is. Right now we
don't, but at least we took care of it.
Approved by: re (kib)
MFC after: 3 days
other than the current system-wide size (32-bits) has been updated so
for now just cautiously turn the check off. While here fix the check
for IDs being too large which doesn't work due to type mis-matches.
Reviewed by: jhb (previous version)
Approved by: re (kib)
MFC after: 1 month (type mis-match fixes only)
compiled with stack protector.
Use libssp_nonshared library to pull __stack_chk_fail_local symbol into
each library that needs it instead of pulling it from libc. GCC
generates local calls to this function which result in absolute
relocations put into position-independent code segment, making dynamic
loader do extra work every time given shared library is being relocated
and making affected text pages non-shareable.
Reviewed by: kib
Approved by: re (kib)
behavior is mandated by POSIX.
- Do not fail requests that pass a length greater than SSIZE_MAX
(such as > 2GB on 32-bit platforms). The 'len' parameter is actually
an unsigned 'size_t' so negative values don't really make sense.
Submitted by: Alexander Best alexbestms at math.uni-muenster.de
Reviewed by: alc
Approved by: re (kib)
MFC after: 1 week
Right now nmemb is returned when size is 0. In newer versions of the
standards, it is explicitly required that fwrite() should return 0.
Submitted by: Christoph Mallon
Approved by: re (kib)
if the new file mode is the same as it was before; however, this
optimization must be disabled for filesystems that support NFSv4 ACLs.
Chmod uses pathconf(2) to determine whether this is the case - however,
pathconf(2) always follows symbolic links, while the 'chmod -h' doesn't.
This change adds lpathconf(3) to make it possible to solve that problem
in a clean way.
Reviewed by: rwatson (earlier version)
Approved by: re (kib)