Extend the ino_t, dev_t, nlink_t types to 64-bit ints. Modify
struct dirent layout to add d_off, increase the size of d_fileno
to 64-bits, increase the size of d_namlen to 16-bits, and change
the required alignment. Increase struct statfs f_mntfromname[] and
f_mntonname[] array length MNAMELEN to 1024.
ABI breakage is mitigated by providing compatibility using versioned
symbols, ingenious use of the existing padding in structures, and
by employing other tricks. Unfortunately, not everything can be
fixed, especially outside the base system. For instance, third-party
APIs which pass struct stat around are broken in backward and
forward incompatible ways.
Kinfo sysctl MIBs ABI is changed in backward-compatible way, but
there is no general mechanism to handle other sysctl MIBS which
return structures where the layout has changed. It was considered
that the breakage is either in the management interfaces, where we
usually allow ABI slip, or is not important.
Struct xvnode changed layout, no compat shims are provided.
For struct xtty, dev_t tty device member was reduced to uint32_t.
It was decided that keeping ABI compat in this case is more useful
than reporting 64-bit dev_t, for the sake of pstat.
Update note: strictly follow the instructions in UPDATING. Build
and install the new kernel with COMPAT_FREEBSD11 option enabled,
then reboot, and only then install new world.
Credits: The 64-bit inode project, also known as ino64, started life
many years ago as a project by Gleb Kurtsou (gleb). Kirk McKusick
(mckusick) then picked up and updated the patch, and acted as a
flag-waver. Feedback, suggestions, and discussions were carried
by Ed Maste (emaste), John Baldwin (jhb), Jilles Tjoelker (jilles),
and Rick Macklem (rmacklem). Kris Moore (kris) performed an initial
ports investigation followed by an exp-run by Antoine Brodin (antoine).
Essential and all-embracing testing was done by Peter Holm (pho).
The heavy lifting of coordinating all these efforts and bringing the
project to completion were done by Konstantin Belousov (kib).
Sponsored by: The FreeBSD Foundation (emaste, kib)
Differential revision: https://reviews.freebsd.org/D10439
patm(4) devices.
Maintaining an address family and framework has real costs when we make
infrastructure improvements. In the case of NATM we support no devices
manufactured in the last 20 years and some will not even work in modern
motherboards (some newer devices that patm(4) could be updated to
support apparently exist, but we do not currently have support).
With this change, support remains for some netgraph modules that don't
require NATM support code. It is unclear if all these should remain,
though ng_atmllc certainly stands alone.
Note well: FreeBSD 11 supports NATM and will continue to do so until at
least September 30, 2021. Improvements to the code in FreeBSD 11 are
certainly welcome.
Reviewed by: philip
Approved by: harti
9899:2011 Appendix K 3.7.4.1.
Other needed supporting types, defines and constraint_handler
infrastructure is added as specified in the C11 spec.
Submitted by: Tom Rix <trix@juniper.net>
Sponsored by: Juniper Networks
Discussed with: ed
MFC after: 3 weeks
Differential revision: https://reviews.freebsd.org/D9903
Differential revision: https://reviews.freebsd.org/D10161
Implement a new init(8) option in /etc/ttys. If this option is present
on the entry in /etc/ttys, the entry will be active if and only if it
exists. If the name starts with a '/', it will be considered an
absolute path. If not, it will be a path relative to /dev.
This allows one to turn off video console getty that aren't present
(while running a getty on them even when they aren't the system
console). Likewise with serial ports.
It differs from onifconsole in only requiring the device exist rather
than it be listed as one of the system consoles.
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D10037
Add a clock_nanosleep() syscall, as specified by POSIX.
Make nanosleep() a wrapper around it.
Attach the clock_nanosleep test from NetBSD. Adjust it for the
FreeBSD behavior of updating rmtp only when interrupted by a signal.
I believe this to be POSIX-compliant, since POSIX mentions the rmtp
parameter only in the paragraph about EINTR. This is also what
Linux does. (NetBSD updates rmtp unconditionally.)
Copy the whole nanosleep.2 man page from NetBSD because it is complete
and closely resembles the POSIX description. Edit, polish, and reword it
a bit, being sure to keep any relevant text from the FreeBSD page.
Reviewed by: kib, ngie, jilles
MFC after: 3 weeks
Relnotes: yes
Sponsored by: Dell EMC
Differential Revision: https://reviews.freebsd.org/D10020
the default partition, eMMC v4.41 and later devices can additionally
provide up to:
1 enhanced user data area partition
2 boot partitions
1 RPMB (Replay Protected Memory Block) partition
4 general purpose partitions (optionally with a enhanced or extended
attribute)
Of these "partitions", only the enhanced user data area one actually
slices the user data area partition and, thus, gets handled with the
help of geom_flashmap(4). The other types of partitions have address
space independent from the default partition and need to be switched
to via CMD6 (SWITCH), i. e. constitute a set of additional "disks".
The second kind of these "partitions" doesn't fit that well into the
design of mmc(4) and mmcsd(4). I've decided to let mmcsd(4) hook all
of these "partitions" up as disk(9)'s (except for the RPMB partition
as it didn't seem to make much sense to be able to put a file-system
there and may require authentication; therefore, RPMB partitions are
solely accessible via the newly added IOCTL interface currently; see
also below). This approach for one resulted in cleaner code. Second,
it retains the notion of mmcsd(4) children corresponding to a single
physical device each. With the addition of some layering violations,
it also would have been possible for mmc(4) to add separate mmcsd(4)
instances with one disk each for all of these "partitions", however.
Still, both mmc(4) and mmcsd(4) share some common code now e. g. for
issuing CMD6, which has been factored out into mmc_subr.c.
Besides simply subdividing eMMC devices, some Intel NUCs having UEFI
code in the boot partitions etc., another use case for the partition
support is the activation of pseudo-SLC mode, which manufacturers of
eMMC chips typically associate with the enhanced user data area and/
or the enhanced attribute of general purpose partitions.
CAVEAT EMPTOR: Partitioning eMMC devices is a one-time operation.
- Now that properly issuing CMD6 is crucial (so data isn't written to
the wrong partition for example), make a step into the direction of
correctly handling the timeout for these commands in the MMC layer.
Also, do a SEND_STATUS when CMD6 is invoked with an R1B response as
recommended by relevant specifications. However, quite some work is
left to be done in this regard; all other R1B-type commands done by
the MMC layer also should be followed by a SEND_STATUS (CMD13), the
erase timeout calculations/handling as documented in specifications
are entirely ignored so far, the MMC layer doesn't provide timeouts
applicable up to the bridge drivers and at least sdhci(4) currently
is hardcoding 1 s as timeout for all command types unconditionally.
Let alone already available return codes often not being checked in
the MMC layer ...
- Add an IOCTL interface to mmcsd(4); this is sufficiently compatible
with Linux so that the GNU mmc-utils can be ported to and used with
FreeBSD (note that due to the remaining deficiencies outlined above
SANITIZE operations issued by/with `mmc` currently most likely will
fail). These latter will be added to ports as sysutils/mmc-utils in
a bit. Among others, the `mmc` tool of the GNU mmc-utils allows for
partitioning eMMC devices (tested working).
- For devices following the eMMC specification v4.41 or later, year 0
is 2013 rather than 1997; so correct this for assembling the device
ID string properly.
- Let mmcsd.ko depend on mmc.ko. Additionally, bump MMC_VERSION as at
least for some of the above a matching pair is required.
- In the ACPI front-end of sdhci(4) describe the Intel eMMC and SDXC
controllers as such in order to match the PCI one.
Additionally, in the entry for the 80860F14 SDXC controller remove
the eMMC-only SDHCI_QUIRK_INTEL_POWER_UP_RESET.
OKed by: imp
Submitted by: ian (mmc_switch_status() implementation)
Use SRCTOP in place of .CURDIR/.. as appropriate. The hand-crafted
relative paths for the "links" option remain, though, since those are
relative to /usr/include/sys/<blah> not to the source tree.
Differential Revision: https://reviews.freebsd.org/D9932
Sponsored by: Netflix
Silence On: arch@ (twice)
Also mention <time.h> in sem_timedwait(3), because POSIX does,
and because the user will need it for clockid_t, struct timespec,
and TIMER_ABSTIME.
Reported by: bde
MFC after: 9 days
X-MFC with: r314179
Sponsored by: Dell EMC
This function allows the caller to specify the reference clock
and choose between absolute and relative mode. In relative mode,
the remaining time can be returned.
The API is similar to clock_nanosleep(3). Thanks to Ed Schouten
for that suggestion.
While I'm here, reduce the sleep time in the semaphore "child"
test to greatly reduce its runtime. Also add a reasonable timeout.
Reviewed by: ed (userland)
MFC after: 2 weeks
Relnotes: yes
Sponsored by: Dell EMC
Differential Revision: https://reviews.freebsd.org/D9656
Replace uses of the GCC __nonnull__ attribute with the clang nullability
qualifiers. The replacement should be transparent for clang developers as
the new qualifiers will produce the same warnings and will be useful for
static checkers but will not cause aggressive optimizations.
GCC will not produce such warnings and developers will have to use
upgraded GCC ports built with the system headers from r312538.
Hinted by: Apple's Libc-1158.20.4, Bionic libc
MFC after: 11.1 Release
Differential Revision: https://reviews.freebsd.org/D9004
While the checks are considered useful, the attribute does dangerous
optimizations, removing NULL checks where they can be needed. Remove the
uses of this attribute introduced in r281130: the changes were inspired on
Google's bionic where this attribute is not used anymore.
The __nonnull() attribute will be deprecrated from our headers and
replaced with the Clang _Nonnull qualifier in the future.
MFC after: 3 days
This change consists of two parts:
- allow libkvm to recognize /dev/vmm/* character devices as devices that
provide access to the physical memory of a system (similarly to /dev/fwmem*)
- allow libkvm to recognize that /dev/vmm/* and /dev/fwmem* devices provide
access to the physical memory of live remote systems and, thus, the memory
is writable
As a result, it should be possible to run commands like
$ kgdb -w /path/to/kernel /dev/fwmem0.0
$ kgdb /path/to/kernel /dev/vmm/guest
Reviewed by: kib, jhb
MFC after: 2 weeks
Relnotes: yes
Sponsored by: Panzura
Differential Revision: https://reviews.freebsd.org/D8679
This paves way to implement VDSO for the enlightened time counter.
Reviewed by: kib
MFC after: 1 week
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D8768
VSS stands for "Volume Shadow Copy Service". Unlike virtual machine
snapshot, it only takes snapshot for the virtual disks, so both
filesystem and applications have to aware of it, and cooperate the
whole VSS process.
This driver exposes two device files to the userland:
/dev/hv_fsvss_dev
Normally userland programs should _not_ mess with this device file.
It is currently used by the hv_vss_daemon(8), which freezes and
thaws the filesystem. NOTE: currently only UFS is supported, if
the system mounts _any_ other filesystems, the hv_vss_daemon(8)
will veto the VSS process.
If hv_vss_daemon(8) was disabled, then this device file must be
opened, and proper ioctls must be issued to keep the VSS working.
/dev/hv_appvss_dev
Userland application can opened this device file to receive the
VSS freeze notification, hold the VSS for a while (mainly to flush
application data to filesystem), release the VSS process, and
receive the VSS thaw notification i.e. applications can run again.
The VSS will still work, even if this device file is not opened.
However, only filesystem consistency is promised, if this device
file is not opened or is not operated properly.
hv_vss_daemon(8) is started by devd(8) by default. It can be disabled
by editting /etc/devd/hyperv.conf.
Submitted by: Hongjiang Zhang <honzhan microsoft com>
Reviewed by: kib, mckusick
MFC after: 3 weeks
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D8224
Now that the changes to the dirname(3) function had some time to settle,
let's go ahead and use the same approach for replacing basename(3) by a
simple implementation that modifies the input string, thereby making it
thread-safe and guaranteed to succeed.
Unlike dirname(3), this function already had a thread-safe variant
basename_r(3). This function had its own set of problems, like having an
upper bound on the pathname length. Keep this function around for
compatibility, but remove most references from the man page. Make the
man page more similar to that of dirname(3).
As the basename_r(3) function is only provided by FreeBSD (and Bionic),
depending on its use is even more implementation defined than assuming
that basename(3) is thread-safe.
Reviewed by: emaste
Differential Revision: https://reviews.freebsd.org/D8382
libc++'s stddef.h includes an existing definition of max_align_t for
C++11, but it is only defined for C++, not for C. In addition, GCC and
clang both define an alternate version of max_align_t that uses a
union of multiple types rather than a plain long double as in libc++.
This adds a __max_align_t to <sys/_types.h> that matches the GCC and
clang definition that is mapped to max_align_t in <stddef.h>.
PR: 210890
Reviewed by: dim
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D8194
Back in 2015 when I reimplemented these functions to use an AVL tree, I
was annoyed by the weakness of the typing of these functions. Both tree
nodes and keys are represented by 'void *', meaning that things like the
documentation for these functions are an absolute train wreck.
To make things worse, users of these functions need to cast the return
value of tfind()/tsearch() from 'void *' to 'type_of_key **' in order to
access the key. Technically speaking such casts violate aliasing rules.
I've observed actual breakages as a result of this by enabling features
like LTO.
I've filed a bug report at the Austin Group. Looking at the way the bug
got resolved, they made a pretty good step in the right direction. A new
type 'posix_tnode' has been added to correspond to tree nodes. It is
still defined as 'void' for source-level compatibility, but in the very
far future it could be replaced by a proper structure type containing a
key pointer.
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D8205
libzfs_core provides a rather limited but committed (stable) interface
for working with ZFS. We install libzfs_core shared library but we do
not install header files required for developing programs that use
the library. This change is to install the required header files
libzfs_core.h, libnvpair.h and sys/nvpair.h.
The headers are installed into the same locations as on illumos.
Reviewed by: mav, markj
Differential Revision: https://reviews.freebsd.org/D8005
that can be compiled on various OSes (including on older versions
of FreeBSD), make it possible to have it include the partitioning
scheme definitions without pulling in FreeBSD specifics.
In particular this means:
o move the scheme definitions iand related defines to header files
under sys/disk,
o make them (more) portable by using uint#_t (where applicable)
and renaming defines so that they at least have a good prefix,
o make the new headers stand-alone so that they don't need FreeBSD
definitions, like struct uuid(*)
o keep the original headers for compatibility, but rewrite them to
get the scheme definitions from <sys/disk/$scheme.h>.
(*) since UUID/GUID type definitions are non-portable and the GPT
scheme uses them, make it possible to have the scheme definitions
use an external type by allowing consumers of the header to set
GPT_UUID_TYPE. When GPT_UUID_TYPE has not been defined, the header
will use it's own type definition, which is the same as struct uuid.
The gpt_uuid_t typedef is created to abstract the details and allows
consumers to refer to a single type.
There is not conflict between the partitioning scheme headers and
what is defined in them. All headers can be included in the same
source files.
Note: consumers of the old headers have not been changed yet. Such
will be done if and when needed/beneficial.
Reviewed by: imp, jhb
MFC after: 1 month
Sponsored by: Bracket Computing
The setkey() and encrypt() functions are part of XSI, not the POSIX base
definitions. There is no strict requirement for us to provide these,
especially if we're only going to keep these around as undocumented
stubs. The same holds for des_setkey() and des_cipher().
Instead of providing functions that only generate warnings when linking,
simply disallow linking against them. The impact of this is relatively
low. It only causes two leaf ports to break. I'll see what I can do to
help out to get those fixed.
PR: 211626
file descriptor for the given posix mqueue. Export the
timer_oshandle_np() symbol to get ktimer id for the given posix timer.
Requested by: Lewis Donzis <lew@perftech.com>
Reviewed by: jilles
Discussed with: kan
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Right now our workaround is so good that it doesn't throw any warnings
on misuse. This means that people will keep on using the old version
of dirname(3) silently without fixing their code.
Go ahead and change the prototype of __old_dirname() to also use a plain
char *, so that we still get a compiler warning. This won't have any
negative effect on building older versions of FreeBSD on HEAD, as those
are built with -Werror disabled.
Differential Revision: https://reviews.freebsd.org/D7844
evdev is a generic input event interface compatible with Linux
evdev API at ioctl level. It allows using unmodified (apart from
header name) input evdev drivers in Xorg, Wayland, Qt.
This commit has only generic kernel API. evdev support for individual
hardware drivers like ukbd, ums, atkbd, etc. will be committed later.
Project was started by Jakub Klama as part of GSoC 2014. Jakub's
evdev implementation was later used as a base, updated and finished
by Vladimir Kondratiev.
Submitted by: Vladimir Kondratiev <wulf@cicgroup.ru>
Reviewed by: adrian, hans
Differential Revision: https://reviews.freebsd.org/D6998
As the xinstall(8) utility had to be patched up to work with the POSIXly
correct basename()/dirname() prototypes, we make it pretty hard to build
previous versions of FreeBSD on HEAD. xinstall(8) is part of the
bootstrap tools.
Add some logic to <libgen.h> to automatically detect bad calls to
dirname() based on the type of the argument. If the argument is of type
'const char *', we simply fall back to calling into dirname@FBSD_1.0
directly.
I'll also give basename() similar treatment when importing the
thread-safe version of that function.
Tested by: bdrewery, madpilot (thanks!)
time at year 2012. Only LC_COLLATE_MASK and LC_CTYPE_MASK are in the
right order.
The order here should match XLC_* from "xlocale_private.h" which, in turn,
match LC_* publicly visible order from <locale.h> which determines how
locale components are stored in the structure.
LC_*_MASK -> XLC_* translation done as "ffs(mask) - 1" in the querylocale()
and equivalent shift loop in the newlocale(), so mapped to some wrong
components (excluding two mentioned above).
Formally the fix is ABI breakage, but old code using those masks
never works properly in any case.
Only newlocale() and querylocale() are affected.
MFC after: 7 days
The syscall is a trivial wrapper around new VOP_FDATASYNC(), sharing
code with fsync(2). For all filesystems, this commit provides the
implementation which delegates the work of VOP_FDATASYNC() to
VOP_FSYNC(). This is functionally correct but not efficient.
This is not yet POSIX-compliant implementation, because it does not
ensure that queued AIO requests are completed before returning.
Reviewed by: mckusick
Discussed with: avg (ZFS), jhb (AIO part)
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D7471
glibc has a pretty nice function called crypt_r(3), which is nothing
more than crypt(3), but thread-safe. It accomplishes this by introducing
a 'struct crypt_data' structure that contains a buffer that is large
enough to hold the resulting string.
Let's go ahead and also add this function. It would be a shame if a
useful function like this wouldn't be usable in multithreaded apps.
Refactor crypt.c and all of the backends to no longer declare static
arrays, but write their output in a provided buffer.
There is no need to do any buffer length computation here, as we'll just
need to ensure that 'struct crypt_data' is large enough, which it is.
_PASSWORD_LEN is defined to 128 bytes, but in this case I'm picking 256,
as this is going to be part of the actual ABI.
Differential Revision: https://reviews.freebsd.org/D7306
we need to include it in -legacy or not. Since the ifdef was removed,
this broke building 10.x and older source trees on -current. Restore
just enough of _WITH_GETLINE to allow these older source trees to
still build and properly omit getline() from their -legacy library.
Just like with freelocale(3), I haven't been able to find any piece of
code that actually makes use of this function's return value, both in
base and in ports. The reason for this is that FreeBSD seems to be the
only operating system to have such a prototype. This is why I'm deciding
to not use symbol versioning for this.
It does seem that the pw(8) utility depends on the function's typing and
already had a switch in place to toggle between the FreeBSD and POSIX
variant of this function. Clean this up by always expecting the POSIX
variant.
There is also a single port that has a couple of local declarations of
setgrent(3) that need to be patched up. This is in the process of being
fixed.
PR: 211394 (exp-run)
When adding getline(3) and dprintf(3) into libc, those guards were added
to prevent breaking too many ports.
7 years later the ports tree have been fixed, it is time to remove this
FreeBSDism
While here remove the extra parenthesis surrounding dprintf(3)
Our version of this function currently returns an integer indicating
failure or success, whereas POSIX specifies that this function has no
return value. It returns void. Patch up the header, sources and man page
to use the right type. While there, use the opportunity to simplify the
body of this function.
Theoretically speaking, this change breaks the ABI of this function.
That said, I have yet to find any code that makes use of freelocale()'s
return value. I couldn't find any of it in the base system, nor did an
exp-run reveal any breakage caused by this change.
PR: 211394 (exp-run)
POSIX allows these functions to be implemented in a way that the
resulting string is stored in the input buffer. Though some may find
this annoying, this has the advantage that it makes it possible to
implement this function in a thread-safe way. It also means that they
can be implemented in a way that they work for paths of arbitrary
length, as the output string of these functions is never longer than
max(1, len(input)).
Portable code already needs to be written with this in mind, so in my
opinion it makes very little sense to allow the existing behaviour.
Prevent the base system from falling back to this by switching over to
POSIX prototypes.
I'm not going to bump the __FreeBSD_version for this. The reason is that
it's possible to account for this change in a portable way, without
depending on a specific version of FreeBSD. An exp-run was done some
time ago. As far as I know, all regressions as a result of this have
already been fixed.
I'll give this change some time to settle. In the long run I want to
replace our copies by ones that are thread-safe and don't depend on
PATH_MAX/MAXPATHLEN.
POSIX also declares NI_NUMERICSCOPE, which makes getnameinfo() return a
numerical scope identifier. The interesting thing is that support for
this is already present in code, but #ifdef disabled. Expose this
functionality by placing a definition for it in <netdb.h>.
While there, remove references to NI_WITHSCOPEID, as that got removed 11
years ago.
POSIX requires that MB_CUR_MAX expands to an expression of type size_t.
It currently expands to an int. As these are already macros, don't
change the underlying type of these functions. There is no ned to touch
those.
Differential Revision: https://reviews.freebsd.org/D6645
POSIX requires that these functions have an unsigned int for their first
argument; not an unsigned long.
My reasoning is that we can safely change these functions without
breaking the ABI. As far as I know, our supported architectures either
use registers for passing function arguments that are at least as big as
long (e.g., amd64), or int and long are of the same size (e.g., i386).
Reviewed by: ache
Differential Revision: https://reviews.freebsd.org/D6644
Both __alloc_align and __alloc_size can't be used when the function
returns a pointer to memory. This fixes breakage when building with
clang 3.4:
In file included from /usr/src/svn/usr.sbin/bhyve/atkbdc.c:40:
/usr/include/stdlib.h:176:6: error: '__alloc_size__' attribute only
applies to functions that return a pointer [-Werror,-Wignored-attributes]
Pointed out by: ngie, cem
Approved by: re (gjb)
This support appears to have been documented in nsswitch.conf(5) for some
time. The implementation adds two NSS netgroup providers to libc. The
default, compat, provides the behaviour documented in netgroup(5), so this
change does not make any user-visible behaviour changes. A files provider
is also implemented.
innetgr(3) is implemented as an optional NSS method so that providers such
as NIS which are able to implement efficient reverse lookup can do so.
A fallback implementation is used otherwise. getnetgrent_r(3) is added for
convenience and to provide compatibility with glibc and Solaris.
With a small patch to net/nss_ldap, it's possible to specify an ldap
netgroup provider, allowing one to query nisNetgroupTriple entries.
Sponsored by: EMC / Isilon Storage Division
The last argument of dbm_open() should be a mode_t according to POSIX;
not an int.
Reviewed by: pfg, kib
Differential Revision: https://reviews.freebsd.org/D6650
The strfmon_l() function provided by <xlocale/_monetary.h> is also part
of POSIX 2008's <monetary.h>, so it should be exposed by default.
Change the check used in <monetary.h> to be similar to the one that's
part of <wchar.h>, where we both test for __POSIX_VISIBLE and
_XLOCALE_H_.
According to POSIX, it should use void *, not char *. Unfortunately, the
dsize field also has the wrong type. It should be size_t. I'm not going
to change that, as that will break the ABI.
Reviewed by: pfg
Differential Revision: https://reviews.freebsd.org/D6647
POSIX 2008 added the psignal() function which has already been part of
the BSDs for a long time. The only difference is, the POSIX version uses
an 'int' for the signal number, unlike our version which uses an
'unsigned int'. Fix up the function to use an 'int'. This should not
affect the ABI.
According to POSIX, the netdb.h header must also provide in_addr_t and
in_port_t. It should also provide IPPORT_RESERVED. Copy over the
necessary bits from <netinet/in.h> to achieve that.
POSIX requires that <dirent.h> provides ino_t in the XSI case. In our
case, this wasn't being exposed, as d_ino is a macro that expands to
d_fileno that is an uint32_t, not an ino_t.
Using a cookie with meta mode causes it to *not rerun* (as normal make
does) unless the command changes or filemon-detected files change.
After all of the work done here it turns out that skipping installation
is dangerous since the install commands use <dir>/*.h. The actual build
command is not changing but the files installed are changing by the mere
act of adding a new header into the source tree. Thus we cannot safely
use meta mode logic here. It must always rerun and install the headers.
The install -C flag at least prevents churning timestamps when
installing a header that was already present.
Sponsored by: EMC / Isilon Storage Division
intention of the POSIX IEEE Std 1003.1TM-2008/Cor 1-2013.
A robust mutex is guaranteed to be cleared by the system upon either
thread or process owner termination while the mutex is held. The next
mutex locker is then notified about inconsistent mutex state and can
execute (or abandon) corrective actions.
The patch mostly consists of small changes here and there, adding
neccessary checks for the inconsistent and abandoned conditions into
existing paths. Additionally, the thread exit handler was extended to
iterate over the userspace-maintained list of owned robust mutexes,
unlocking and marking as terminated each of them.
The list of owned robust mutexes cannot be maintained atomically
synchronous with the mutex lock state (it is possible in kernel, but
is too expensive). Instead, for the duration of lock or unlock
operation, the current mutex is remembered in a special slot that is
also checked by the kernel at thread termination.
Kernel must be aware about the per-thread location of the heads of
robust mutex lists and the current active mutex slot. When a thread
touches a robust mutex for the first time, a new umtx op syscall is
issued which informs about location of lists heads.
The umtx sleep queues for PP and PI mutexes are split between
non-robust and robust.
Somewhat unrelated changes in the patch:
1. Style.
2. The fix for proper tdfind() call use in umtxq_sleep_pi() for shared
pi mutexes.
3. Removal of the userspace struct pthread_mutex m_owner field.
4. The sysctl kern.ipc.umtx_vnode_persistent is added, which controls
the lifetime of the shared mutex associated with a vnode' page.
Reviewed by: jilles (previous version, supposedly the objection was fixed)
Discussed with: brooks, Martin Simmons <martin@lispworks.com> (some aspects)
Tested by: pho
Sponsored by: The FreeBSD Foundation
I'm still not sure why only Pypy runs into the error with the function
typedefs. Fix it anyway.
Use __ssize_t instead of ssize_t for the types; it's possible for the size_t
type to not be visible if at the wrong POSIX_VISIBLE level.
A final (crossing my fingers) follow-up to r299456.
Sponsored by: EMC / Isilon Storage Division
Despite the private namespace, several broken ports depend on the __off64_t
name for the type. Export it exactly the same way off_t and __off_t are
exported.
A follow-up to r299456.
Suggested by: php56
Sponsored by: EMC / Isilon Storage Division
Two new functions are provided, bit_ffs_at() and bit_ffc_at(), which allow
for efficient searching of set or cleared bits starting from any bit offset
within the bit string.
Performance is improved by operating on longs instead of bytes and using
ffsl() for searches within a long. ffsl() is a compiler builtin in both
clang and gcc for most architectures, converting what was a brute force
while loop search into a couple of instructions.
All of the bitstring(3) API continues to be contained in the header file.
Some of the functions are large enough that perhaps they should be uninlined
and moved to a library, but that is beyond the scope of this commit.
sys/sys/bitstring.h:
Convert the majority of the existing bit string implementation from
macros to inline functions.
Properly protect the implementation from inadvertant macro expansion
when included in a user's program by prefixing all private
macros/functions and local variables with '_'.
Add bit_ffs_at() and bit_ffc_at(). Implement bit_ffs() and
bit_ffc() in terms of their "at" counterparts.
Provide a kernel implementation of bit_alloc(), making the full API
usable in the kernel.
Improve code documenation.
share/man/man3/bitstring.3:
Add pre-exisiting API bit_ffc() to the synopsis.
Document new APIs.
Document the initialization state of the bit strings
allocated/declared by bit_alloc() and bit_decl().
Correct documentation for bitstr_size(). The original code comments
indicate the size is in bytes, not "elements of bitstr_t". The new
implementation follows this lead. Only hastd assumed "elements"
rather than bytes and it has been corrected.
etc/mtree/BSD.tests.dist:
tests/sys/Makefile:
tests/sys/sys/Makefile:
tests/sys/sys/bitstring.c:
Add tests for all existing and new functionality.
include/bitstring.h
Include all headers needed by sys/bitstring.h
lib/libbluetooth/bluetooth.h:
usr.sbin/bluetooth/hccontrol/le.c:
Include bitstring.h instead of sys/bitstring.h.
sbin/hastd/activemap.c:
Correct usage of bitstr_size().
sys/dev/xen/blkback/blkback.c
Use new bit_alloc.
sys/kern/subr_unit.c:
Remove hard-coded assumption that sizeof(bitstr_t) is 1. Get rid of
unrb.busy, which caches the number of bits set in unrb.map. When
INVARIANTS are disabled, nothing needs to know that information.
callapse_unr can be adapted to use bit_ffs and bit_ffc instead.
Eliminating unrb.busy saves memory, simplifies the code, and
provides a slight speedup when INVARIANTS are disabled.
sys/net/flowtable.c:
Use the new kernel implementation of bit-alloc, instead of hacking
the old libc-dependent macro.
sys/sys/param.h
Update __FreeBSD_version to indicate availability of new API
Submitted by: gibbs, asomers
Reviewed by: gibbs, ngie
MFC after: 4 weeks
Sponsored by: Spectra Logic Corp
Differential Revision: https://reviews.freebsd.org/D6004
This will only be done if the target is defined, so if the target is
defined after bsd.sys.mk is included then it needs to manually add
${META_DEPS} still.
Sponsored by: EMC / Isilon Storage Division
etc) in stdlib.h. These will be needed for newer versions of libc++,
which uses them for defining overloaded versions of abs() and div().
MFC after: 1 week
Extend it to other cases of meta mode cookies so they get the proper rm
cookie behavior when a .meta file detects it needs to rebuild and fails.
Sponsored by: EMC / Isilon Storage Division
This file is using stage-install, so all of the .dirdep files
are properly handled. The cookie handling also properly
handles rebuilds with .meta files. DESTDIR from bsd.sys.mk is also
respected for staging. This logic came in r239572.
Sponsored by: EMC / Isilon Storage Division
The meta file may decide the target is out of date but nothing
ensures that the *next* build will build this target if it
fails this time for some reason; it is still out-of-date
until it succeeds.
Convert the include/ cookie usage to the global versions.
Sponsored by: EMC / Isilon Storage Division
The defines for xdr_rpc* in xdr.h are wrong. It could be
very well that Solaris did strip the '_t' from xdr_u_int32_t,
but Solaris has a xdr_u_int32 function, we don't have this.
So all of this defines will lead to an unresolved symbol.
This explains why we do not use these functions in FreeBSD
while they are used in Illumos/Solaris.
Obtained from: linux libtirpc (git 7864122e61ffe4db1aa8ace89117358a1e3a391b)
MFC after: 3 weeks
SunRPC is using xp_sock in SVCXPRT, while TI-RPC is using
xp_fd. Add a compatibility define.
Illumos has something similar for the non-kernel case.
Obtained from: linux-nfs project (git 0d94036c3a0d4c24d22bf6a8c40ac6625d972c29)
Add missing Symbol.map entry for __aligned_alloc.
Add weak-->strong symbol binding for
{malloc_stats_print,mallctl,mallctlnametomib,mallctlbymib} -->
{__malloc_stats_print,__mallctl,__mallctlnametomib,__mallctlbymib}. These
bindings complete the set necessary to allow applications to replace all
malloc-related symbols.
breaking the ABI. Special value is stored in the lock pointer to
indicate shared lock, and offline page in the shared memory is
allocated to store the actual lock.
Reviewed by: vangyzen (previous version)
Discussed with: deischen, emaste, jhb, rwatson,
Martin Simmons <martin@lispworks.com>
Tested by: pho
Sponsored by: The FreeBSD Foundation
enabled in the compilation environment, i.e. for ANSI C use of
#include <signal.h>.
Requested and reviewed by: bde
Sponsored by: The FreeBSD Foundation
MFC after: 13 days
ucontext_t available. Our code even has XXX comment about this.
Add a bit of compliance by moving struct __ucontext definition into
sys/_ucontext.h and including it into signal.h and sys/ucontext.h.
Several machine/ucontext.h headers were changed to use namespace-safe
types (like uint64_t->__uint64_t) to not depend on sys/types.h.
struct __stack_t from sys/signal.h is made always visible in private
namespace to satisfy sys/_ucontext.h requirements.
Apparently mips _types.h pollutes global namespace with f_register_t
type definition. This commit does not try to fix the issue.
PR: 207079
Reported and tested by: Ting-Wei Lan <lantw44@gmail.com>
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
control algorithm options. The argument is variable length and is opaque
to TCP, forwarded directly to the algorithm's ctl_output method.
Provide new includes directory netinet/cc, where algorithm specific
headers can be installed.
The new API doesn't yet have any in tree consumers.
The original code written by lstewart.
Reviewed by: rrs, emax
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D711
- Avoid namespace pollution and move definitions of _POSIX2_CHARCLASS_NAME_MAX
and _POSIX2_COLL_WEIGHTS_MAX into the .2001 section.
With input from bde.
Submitted by bde
Set _PATH_DEFPATH to
/sbin:/bin:/usr/sbin:/usr/bin:/usr/local/sbin:/usr/local/bin. This is the
path in the default class in the default /etc/login.conf,
excluding ~/bin which would not be expanded properly in a string
constant.
For normal logins, _PATH_DEFPATH is overridden by /etc/login.conf,
~/.login_conf or shell startup files. _PATH_DEFPATH is still used as a
default by execlp(), execvp(), posix_spawnp() and sh if PATH is not set, and
by cron. Especially the latter is a common trap (most recently in PR
204813).
PR: 204813
Reviewed by: secteam (delphij), alfred
Traditionally the hcreate() function creates a hash table that uses
chaining, using a fixed user-provided size. The problem with this
approach is that this often either wastes memory (table too big) or
yields bad performance (table too small). For applications it may not
always be easy to estimate the right hash table size. A fixed number
only increases performance compared to a linked list by a constant
factor.
This problem can be solved easily by dynamically resizing the hash
table. If the size of the hash table is at least doubled, this has no
negative on the running time complexity. If a dynamically sized hash
table is used, we can also switch to using open addressing instead of
chaining, which has the advantage of just using a single allocation for
the entire table, instead of allocating many small objects.
Finally, a problem with the existing implementation is that its
deterministic algorithm for hashing makes it possible to come up with
fixed patterns to trigger an excessive number of collisions. We can
easily solve this by using FNV-1a as a hashing algorithm in combination
with a randomly generated offset basis.
Measurements have shown that this implementation is about 20-25% faster
than the existing implementation (even if the existing implementation is
given an excessive number of buckets). Though it allocates more memory
through malloc() than the old implementation (between 4-8 pointers per
used entry instead of 3), process memory use is similar to the old
implementation as if the estimated size was underestimated by a factor
10. This is due to the fact that malloc() needs to perform less
bookkeeping.
Reviewed by: jilles, pfg
Obtained from: https://github.com/NuxiNL/cloudlibc
Differential Revision: https://reviews.freebsd.org/D4644
The existing implementations of POSIX tsearch() and tdelete() don't
attempt to perform any balancing at all. Testing reveals that inserting
100k nodes into a tree sequentially takes approximately one minute on my
system.
Though most other BSDs also don't use any balanced tree internally, C
libraries like glibc and musl do provide better implementations. glibc
uses a red-black tree and musl uses an AVL tree.
Red-black trees have the advantage over AVL trees that they only require
O(1) rotations after insertion and deletion, but have the disadvantage
that the tree has a maximum depth of 2*log2(n) instead of 1.44*log2(n).
My take is that it's better to focus on having a lower maximum depth,
for the reason that in the case of tsearch() the invocation of the
comparator likely dominates the running time.
This change replaces the tsearch() and tdelete() functions by versions
that create an AVL tree. Compared to musl's implementation, this version
is different in two different ways:
- We don't keep track of heights; just balances. This is sufficient.
This has the advantage that it reduces the number of nodes that are
being accessed. Storing heights requires us to also access all of the
siblings along the path.
- Don't use any recursion at all. We know that the tree cannot 2^64
elements in size, so the height of the tree can never be larger than
96. Use a 128-bit bitmask to keep track of the path that is computed.
This allows us to iterate over the same path twice, meaning we can
apply rotations from top to bottom.
Inserting 100k nodes into a tree now only takes 0.015 seconds. Insertion
seems to be twice as fast as glibc, whereas deletion has about the same
performance. Unlike glibc, it uses a fixed amount of memory.
I also experimented with both recursive and iterative bottom-up
implementations of the same algorithm. This iterative top-down version
performs similar to the recursive bottom-up version in terms of speed
and code size.
For some reason, the iterative bottom-up algorithm was actually 30%
faster for deletion, but has a quadratic memory complexity to keep track
of all the parent pointers.
Reviewed by: jilles
Obtained from: https://github.com/NuxiNL/cloudlibc
Differential Revision: https://reviews.freebsd.org/D4412
In r289315, I added new fields to res_state. This broke binary
backward compatibility. It also broke some ports (and possibly
other code) by requiring the definition of time_t and struct timespec.
Fix these problems by moving the new fields into __res_state_ext.
Suggested by: ume
Reviewed by: ume
MFC after: 3 days
Sponsored by: Dell Inc.
Differential Revision: https://reviews.freebsd.org/D4472
r289315 required time_t and struct timespec to be defined before
including <resolv.h>. This broke the build of net-mgmt/sx, at least.
Include <sys/timespec.h> in resolv.h to fix this with minimal pollution.
Reported by: Raphael Kubo da Costa <rakuco>
MFC after: 3 days
Sponsored by: Dell Inc.
This avoids the need for an afterinstall: hook and a check for LIBRARIES_ONLY.
It also now respects INCLUDEDIR.
This came in r249484.
Sponsored by: EMC / Isilon Storage Division
Because of how osreldate.h was being built with newvers.sh, which always
spat out a vers.c dependent on SVN or git, the meta mode build was
considering osreldate.h to depend on the current git or SVN index. This
would lead to entire tree rebuilds when modifying git's index. There's
no reason to be generating vers.c here so just skip it.
While here, in mk-osreldate.sh rename PARAM_H to proper PARAMFILE (which
newvers.sh already has a default for) and remove unneeded export.
Sponsored by: EMC / Isilon Storage Division
The _SKIP_BUILD is used while computing DIRDEPS. If MACHINE=host is passed in
then this logic was replacing 'MACHINE' with a literal value of the host arch,
which then caused the dirdeps graph to be wrong since it no longer had the
literal 'host' for any of include's dependencies.
This is a NOP currently since include/ is not usually built with MACHINE=host.
Sponsored by: EMC / Isilon Storage Division
This allows META_FILES option to be renamed META_MODE.
Also add META_COOKIE_TOUCH for use in targets that can benefit
from a cookie when in meta mode.
Differential Revision: https://reviews.freebsd.org/D4153
Reviewed by: bdrewery
On each resolver query, use stat(2) to see if the modification time
of /etc/resolv.conf has changed. If so, reload the file and reinitialize
the resolver library. However, only call stat(2) if at least two seconds
have passed since the last call to stat(2), since calling it on every
query could kill performance.
This new behavior is enabled by default. Add a "reload-period" option
to disable it or change the period of the test.
Document this behavior and option in resolv.conf(5).
Polish the man page just enough to appease igor.
https://lists.freebsd.org/pipermail/freebsd-arch/2015-October/017342.html
Reviewed by: kp, wblock
Discussed with: jilles, imp, alfred
MFC after: 1 month
Relnotes: yes
Sponsored by: Dell Inc.
Differential Revision: https://reviews.freebsd.org/D3867
FreeBSD extended ctypes to include numbers (e.g. isnumber()) but never
actually implemented it. The isnumber() function was equivalent to the
isdigit() function in every case.
Now that DragonFly's ctype source files have number definitions, the
number ctype can finally be implemented. It's given a new flag _CTYPE_N.
The isalnum() and iswalnum() functions have been changed to use this
flag rather than the _CTYPE_D digit flag.
While isalnum(), isnumber(), and their wide equivalents now return
different values in locale cases, the ishexnumber() and iswhexnumber()
functions are unchanged. They are still aliases for isxdigit() and
iswxdigit().
Also change ctype.h for isdigit and isxdigit to use sbistype like the
other functions.
Obtained from: dragonfly
If the command to be ran changes then a rebuild is caused. Checking
exists(${DESTDIR}...) from make results in this on the 2nd and
possibly subsequent builds due to staging during build. Avoid this
by always running the existence check in the make sh command.
Sponsored by: EMC / Isilon Storage Division
- Use _Bool rather than bool to resolve missing type errors in malloc_np.h.
- Fix malloc manual page #include documentation.
- Add *allocm manual pages to obsolete files.
Submitted by: jbeich
Start using the gcc sentinel attribute, which can be used to
mark varargs function that need a NULL pointer to mark argument
termination, like execl(3).
Relnotes: yes
This function is equivalent to fclose(3) function except that it
does not close the underlying file descriptor.
fdclose(3) is step forward to make FILE structure private.
Reviewed by: wblock, jilles, jhb, pjd
Approved by: pjd (mentor)
Differential Revision: https://reviews.freebsd.org/D2697
Off by default, build behaves normally.
WITH_META_MODE we get auto objdir creation, the ability to
start build from anywhere in the tree.
Still need to add real targets under targets/ to build packages.
Differential Revision: D2796
Reviewed by: brooks imp
This lets the compiler know about the alignment of pointers returned
by aligned_alloc(3), posix_memalign(3). and contigmalloc(9)
Currently this is only supported in recent gcc but we are ready to
use it if clang implements it.
Relnotes: yes
Add a manpage for it, assign the copyright to the OpenBSD project on it since it
is mostly copy/paste from OpenBSD manpage.
style(9) fixes
Differential Revision: https://reviews.freebsd.org/D2420
Reviewed by: kib
discontinued by its initial authors. In FreeBSD the code was already
slightly edited during the pf(4) SMP project. It is about to be edited
more in the projects/ifnet. Moving out of contrib also allows to remove
several hacks to the make glue.
Reviewed by: net@
The `nonnull' attribute specifies that some function parameters should be
non-null pointers. This is very useful as it helps the compiler generate
warnings on suspicious code and can also enable some small optimizations.
Also start using 'alloc_size' attribute in the allocator functions.
This is an initial step to better integrate our libc with the compiler:
these attributes are fully supported by clang and they are also useful
for the static analyzer.
Note that due to some bogus internal procedure in the way gcc ports
are built they may require updating if they were built before r280801.
Relnotes: yes
Hinted by: Android's bionic libc
Differential Revision: https://reviews.freebsd.org/D2107
GCC is still carries an old version of cdefs.h which doesn't
accept multiple parameters for the nonnull attribute.
Since this issue probably affects many ports in the tree
we will revert it for now until gcc gets fixed.
The `nonnull' attribute specifies that some function parameters should be
non-null pointers. This is very useful as it helps the compiler generate
warnings on suspicious code and can also enable some small optimizations.
In clang this is also useful for the static analyzer.
While we could go on defining this all over the tree, it only
makes sense to annotate a subset of critical functions.
Hinted by: Android's bionic libc
Differential Revision: https://reviews.freebsd.org/D2101