Commit Graph

8306 Commits

Author SHA1 Message Date
Hans Petter Selasky
f3e7afe2d7 Implement kernel support for hardware rate limited sockets.
- Add RATELIMIT kernel configuration keyword which must be set to
enable the new functionality.

- Add support for hardware driven, Receive Side Scaling, RSS aware, rate
limited sendqueues and expose the functionality through the already
established SO_MAX_PACING_RATE setsockopt(). The API support rates in
the range from 1 to 4Gbytes/s which are suitable for regular TCP and
UDP streams. The setsockopt(2) manual page has been updated.

- Add rate limit function callback API to "struct ifnet" which supports
the following operations: if_snd_tag_alloc(), if_snd_tag_modify(),
if_snd_tag_query() and if_snd_tag_free().

- Add support to ifconfig to view, set and clear the IFCAP_TXRTLMT
flag, which tells if a network driver supports rate limiting or not.

- This patch also adds support for rate limiting through VLAN and LAGG
intermediate network devices.

- How rate limiting works:

1) The userspace application calls setsockopt() after accepting or
making a new connection to set the rate which is then stored in the
socket structure in the kernel. Later on when packets are transmitted
a check is made in the transmit path for rate changes. A rate change
implies a non-blocking ifp->if_snd_tag_alloc() call will be made to the
destination network interface, which then sets up a custom sendqueue
with the given rate limitation parameter. A "struct m_snd_tag" pointer is
returned which serves as a "snd_tag" hint in the m_pkthdr for the
subsequently transmitted mbufs.

2) When the network driver sees the "m->m_pkthdr.snd_tag" different
from NULL, it will move the packets into a designated rate limited sendqueue
given by the snd_tag pointer. It is up to the individual drivers how the rate
limited traffic will be rate limited.

3) Route changes are detected by the NIC drivers in the ifp->if_transmit()
routine when the ifnet pointer in the incoming snd_tag mismatches the
one of the network interface. The network adapter frees the mbuf and
returns EAGAIN which causes the ip_output() to release and clear the send
tag. Upon next ip_output() a new "snd_tag" will be tried allocated.

4) When the PCB is detached the custom sendqueue will be released by a
non-blocking ifp->if_snd_tag_free() call to the currently bound network
interface.

Reviewed by:		wblock (manpages), adrian, gallatin, scottl (network)
Differential Revision:	https://reviews.freebsd.org/D3687
Sponsored by:		Mellanox Technologies
MFC after:		3 months
2017-01-18 13:31:17 +00:00
Enji Cooper
a0888761e8 Use SRCTOP where possible and use :H to manipulate .CURDIR to get rid of
unnecessarily long relative path .PATH values with make

MFC after:	1 days
Sponsored by:	Dell EMC Isilon
2017-01-17 03:58:37 +00:00
Maxim Sobolev
339efd75a4 Add a new socket option SO_TS_CLOCK to pick from several different clock
sources to return timestamps when SO_TIMESTAMP is enabled. Two additional
clock sources are:

o nanosecond resolution realtime clock (equivalent of CLOCK_REALTIME);
o nanosecond resolution monotonic clock (equivalent of CLOCK_MONOTONIC).

In addition to this, this option provides unified interface to get bintime
(equivalent of using SO_BINTIME), except it also supported with IPv6 where
SO_BINTIME has never been supported. The long term plan is to depreciate
SO_BINTIME and move everything to using SO_TS_CLOCK.

Idea for this enhancement has been briefly discussed on the Net session
during dev summit in Ottawa last June and the general input was positive.

This change is believed to benefit network benchmarks/profiling as well
as other scenarios where precise time of arrival measurement is necessary.

There are two regression test cases as part of this commit: one extends unix
domain test code (unix_cmsg) to test new SCM_XXX types and another one
implementis totally new test case which exchanges UDP packets between two
processes using both conventional methods (i.e. calling clock_gettime(2)
before recv(2) and after send(2)), as well as using setsockopt()+recv() in
receive path. The resulting delays are checked for sanity for all supported
clock types.

Reviewed by:    adrian, gnn
Differential Revision:  https://reviews.freebsd.org/D9171
2017-01-16 17:46:38 +00:00
Warren Block
b44047f3df Update the shm_open.2 man page to reflect objective reality.
PR:		215612
Submitted by:	rwatson
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D9066
2017-01-13 19:41:02 +00:00
Enji Cooper
cdebaff820 Upgrade NetBSD tests to 01.11.2017_23.20 snapshot
This contains some new testcases in /usr/tests/...:

- .../lib/libc
- .../lib/libthr
- .../lib/msun
- .../sys/kern

Tested on:	amd64, i386
MFC after:	1 month
2017-01-13 03:33:57 +00:00
Enji Cooper
9527fa4f66 Remove __HAVE_LONG_DOUBLE #define from t_strtod.c and place it in Makefile
This is to enable support in other testcases

Inspired by lib/msun/tests/Makefile .

MFC after:	1 week
2017-01-12 08:40:52 +00:00
Enji Cooper
20d24fbb93 Sync ^/vendor/NetBSD/tests/dist with upstream 2017-01-12 07:26:39 +00:00
Enji Cooper
2254eda87e Commit updates accepted upstream (NetBSD) 2017-01-12 07:21:56 +00:00
Ian Lepore
f64342e354 Rework tty_drain() to poll the hardware for completion, and restore
drain timeout handling to historical freebsd behavior.

The primary reason for these changes is the need to have tty_drain() call
ttydevsw_busy() at some reasonable sub-second rate, to poll hardware that
doesn't signal an interrupt when the transmit shift register becomes empty
(which includes virtually all USB serial hardware).  Such hardware hangs
in a ttyout wait, because it never gets an opportunity to trigger a wakeup
from the sleep in tty_drain() by calling ttydisc_getc() again, after
handing the last of the buffered data to the hardware.

While researching the history of changes to tty_drain() I stumbled across
some email describing the historical BSD behavior of tcdrain() and close()
on serial ports, and the ability of comcontrol(1) to control timeout
behavior.  Using that and some advice from Bruce Evans as a guide, I've
put together these changes to implement the hardware polling and restore
the historical timeout behaviors...

 - tty_drain() now calls ttydevsw_busy() in a loop at 10 Hz to accomodate
   hardware that requires polling for busy state.

 - The "new historical" behavior for draining during close(2) is retained:
   the drain timeout is "1 second without making any progress".  When the
   1-second timeout expires, if the count of bytes remaining in the tty
   layer buffer is smaller than last time, the timeout is extended for
   another second.  Unfortunately, the same logic cannot be extended all
   the way down to the hardware, because the interface to that layer is a
   simple busy/not-busy indication.

 - Due to the previous point, an application that needs a guarantee that
   all data has been transmitted must use TIOCDRAIN/tcdrain(3) before
   calling close(2).

 - The historical behavior of honoring the drainwait setting for TIOCDRAIN
   (used by tcdrain(3)) is restored.

 - The historical kern.drainwait sysctl to control the global default
   drainwait time is restored, but is now named kern.tty_drainwait.

 - The historical default drainwait timeout of 300 seconds is restored.

 - Handling of TIOCGDRAINWAIT and TIOCSDRAINWAIT ioctls is restored
   (this also makes the comcontrol(1) drainwait verb work again).

 - Manpages are updated to document these behaviors.

Reviewed by:	bde (prior version)
2017-01-12 00:48:06 +00:00
Enji Cooper
d87ba4cf3a Pull in changes from upstream for lib/libc/{c063,gen,string,sys} to address
issues resolved in FreeBSD or support added to testcases

In collaboration with:	<christos@NetBSD.org>
2017-01-11 08:15:18 +00:00
Enji Cooper
4023a42882 Import lib/libc/ttyio/t_ttyio.c,1.3 2017-01-10 10:06:15 +00:00
Enji Cooper
887eb0f8da t_raise: import a minor grammar error fix to diff reduce with NetBSD 2017-01-10 10:02:44 +00:00
Konstantin Belousov
b7c7684ae2 Export __cxa_thread_atexit_impl as an alias for __cxa_thread_atexit.
libstdc++ before gcc r244057 expected that libc provided
__cxa_thread_atexit_impl, and libstdc++ implemented
__cxa_thread_atexit, by forwarding the calls to _impl.  Mentioned gcc
revision checks for __cxa_thread_atexit in libc and does not provide
the symbol from libstdc++ if found.

This change helps older gcc, in particular, all released versions
which implement thread_local, by consolidating the implementation into
libc.  For that versions, if configured with the current libc, the
__cxa_thread_atexit is exported from libstdc++ as a trivial wrapper
around libc::__cxa_thread_atexit_impl.

The __cxa_thread_atexit implementation is put into separate source
file to allow for static linking with older libstdc++.a.

gcc bugzilla:	https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78968
Reported by:	Hannes Hauswedell <h2+fbsdports@fsfe.org>
PR:	215709
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2017-01-07 16:05:19 +00:00
Konstantin Belousov
4f64c5b3cf __vdso_gettc(): be extra careful with /dev/hpet mappings, never unmap
the mapping which might be accessed by other threads.

If a pointer to the /dev/hpet register page mapping was stored into
the hpet_dev_map, other threads might access the page at any time.
Never unmap it, instead, keep track of mappings for all hpet units in
smal array.  Store pointer to the newly mapped registers page using
CAS, to detect parallel mappings.

It appeared relatively easy to demonstrate the problem by arranging
two threads which perform gettimeofday(2) concurently, first time in
the process address space, when HPET is used for timecounter.

PR:	215715
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-01-04 16:10:52 +00:00
Pedro F. Giffuni
3795066839 Cleanup inelegant calloc(3) introduced in r310984. 2017-01-02 15:17:33 +00:00
Enji Cooper
c5c2166d45 Use calloc instead of malloc + memset(.., 0, ..)
MFC after:	1 week
2016-12-31 21:00:08 +00:00
Pedro F. Giffuni
ba7161ed72 Move __hidden attribute towards the end of the declaration.
Apple had them at the start but moving them to the end is better for
faster reading and fits better what is done in other FreeBSD headers.

MFC after:	5 days
2016-12-31 15:30:00 +00:00
John Baldwin
34ed0c63c8 Rename the 'flags' argument to getfsstat() to 'mode' and validate it.
This argument is not a bitmask of flags, but only accepts a single value.
Fail with EINVAL if an invalid value is passed to 'flag'.  Rename the
'flags' argument to getmntinfo(3) to 'mode' as well to match.

This is a followup to r308088.

Reviewed by:	kib
MFC after:	1 month
2016-12-27 20:21:11 +00:00
Enji Cooper
7c39dd2e16 Revert r310138
Adding %b support to vfprintf for parity with kernel space requires
more discussion/review.

In particular, many parties were concerned over introducing a
non-standard format qualifier to *printf(3) which didn't already
exist in other OSes, e.g. Linux, thus making code which used %b
harder to port to other operating systems.

Requested by:	many
2016-12-22 22:30:42 +00:00
Sepherosa Ziehau
fff5be0be3 hyperv: Implement userspace gettimeofday(2) with Hyper-V reference TSC
This 6 times gettimeofday performance, as measured by
tools/tools/syscall_timing

Reviewed by:	kib
MFC after:	1 week
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D8789
2016-12-19 07:40:45 +00:00
Conrad Meyer
ec055aeff5 vfprintf(3): Add support for kernel %b format
This is a direct port of the kernel %b format.

I'm unclear on if (more) non-portable printf extensions will be a
problem. I think it's desirable to have userspace formats include all
kernel formats, but there may be competing goals I'm not aware of.

Reviewed by:	no one, unfortunately
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D8426
2016-12-16 01:44:50 +00:00
Gleb Smirnoff
faf19a69a6 Address regressions in SA-16:37.libc.
PR:		215105
Submitted by:	<jtd2004a sbcglobal.net>
2016-12-07 23:18:00 +00:00
Michael Tuexen
159efc33a8 Fix a bug in sctp_sendmsgx(), where the sid provided by the user
was hot honored.

MFC after:	3 days
2016-12-07 21:24:49 +00:00
Bryan Drewery
2d22bf634a Support spaces in group names.
MFC after:	2 weeks
Sponsored by:	Dell EMC Isilon
2016-12-06 23:43:04 +00:00
Ed Schouten
8ab00b8fbc Properly sign extend the result of jrand48() and mrand48().
These functions are supposed to return a value between [_2^31, 2^31).
This doesn't seem to work on 64-bit systems, where we return a value
between [0, 3^32). Patch up the function to use proper casts to int32_t.
While there, fix some other style bugs.

MFC after:	2 weeks
2016-12-06 19:08:29 +00:00
Gleb Smirnoff
74e540d788 Fix possible buffer overflow(s) in link_ntoa(3).
A specially crafted sockaddr_dl argument can trigger a static buffer overflow
in the libc library, with possibility to rewrite with arbitrary data following
static buffers that belong to other library functions.

Reviewed by:	kib
Security:	FreeBSD-SA-16:37.libc
2016-12-06 18:50:33 +00:00
Eric van Gyzen
ff07dd913e thr_set_name(): silently truncate the given name as needed
Instead of failing with ENAMETOOLONG, which is swallowed by
pthread_set_name_np() anyway, truncate the given name to MAXCOMLEN+1
bytes.  This is more likely what the user wants, and saves the
caller from truncating it before the call (which was the only
recourse).

Polish pthread_set_name_np(3) and add a .Xr to thr_set_name(2)
so the user might find the documentation for this behavior.

Reviewed by:	jilles
MFC after:	3 days
Sponsored by:	Dell EMC
2016-12-03 01:14:21 +00:00
Bryan Drewery
710542df20 Fix setrlimit_test:setrlimit_memlock when the system has exceeded vm.max_wired.
This uses the same fix as r294894 did for the mlock test.  The code from
that commit is moved into a common object file which PROGS supports
building first.

Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D8689
2016-12-01 22:12:58 +00:00
Mark Johnston
64910ddbff Launder VPO_NOSYNC pages upon vnode deactivation.
As of r234483, vnode deactivation causes non-VPO_NOSYNC pages to be
laundered. This behaviour has two problems:

1. Dirty VPO_NOSYNC pages must be laundered before the vnode can be
   reclaimed, and this work may be unfairly deferred to the vnlru process
   or an unrelated application when the system is under vnode pressure.
2. Deactivation of a vnode with dirty VPO_NOSYNC pages requires a scan of
   the corresponding VM object's memq for non-VPO_NOSYNC dirty pages; if
   the laundry thread needs to launder pages from an unreferenced such
   vnode, it will reactivate and deactivate the vnode with each laundering,
   potentially resulting in a large number of expensive scans.

Therefore, ensure that all dirty pages are laundered upon deactivation,
i.e., when all maps of the vnode are removed and all references are
released.

Reviewed by:	alc, kib
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D8641
2016-11-26 21:00:27 +00:00
Jilles Tjoelker
295159dfa3 open(2): Clarify non-POSIX error when opening a symlink with O_NOFOLLOW.
We return [EMLINK] instead of [ELOOP] when trying to open a symlink with
O_NOFOLLOW, so that the original case of [ELOOP] can be distinguished. Code
like cmp -h and xz takes advantage of this.

PR:		214633
Reviewed by:	kib, imp
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D8586
2016-11-22 22:30:55 +00:00
Ed Maste
134ede2dd2 remove unnecessary vm includes from setproctitle
vm headers were needed only for the PS_STRINGS fallback, which was
removed in r297888.

MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2016-11-22 16:00:18 +00:00
Eric van Gyzen
81f91de9f7 Fix error reporting from wcstof()
When wcstof() skipped initial space and then parsing failed, it set
endptr to the first non-space character.  Fix it to correctly report
failure by setting endptr to the beginning of the input string.
The fix is from theraven@, who fixed this bug in wcstod() and
wcstold() in r227753.

While I'm here:

Move assignments out of declarations in wcstod() and wcstold().
This is against my personal preference, but it is our agreed style(9).

Set endptr correctly on malloc() failure in all three functions.

Remove an incorrect comment:  This is pointer arithmetic,
so the code was not actually making that assumption.

wcstold() advanced the wcp pointer beyond leading whitespace
and then reset it back to the beginning of the string.
Do not reset it.  This seems to have no functional effect,
since strtold_l() also skips leading whitespace.  I'm making
the change to keep this function consistent with wcstof() and
wcstod(), and because the C11 spec prescribes the use of iswspace()
to skip leading space.

Reported by:	libc++ unit test for std::stof(std::wstring)
MFC after:	8 days
Sponsored by:	Dell EMC
2016-11-20 20:13:22 +00:00
Gleb Smirnoff
00b5ffde8e Add flag SF_USER_READAHEAD to sendfile(2). When specified, the syscall won't
do any speculations about readahead, and use exactly the amount of readahead
specified by user.  E.g. setting SF_FLAGS(0, SF_USER_READAHEAD) will guarantee
that no readahead at all will be performed.
2016-11-17 21:36:18 +00:00
Ruslan Bukin
7804dd5212 Add full softfloat and hardfloat support for RISC-V.
Hardfloat is now default (use riscv64sf as TARGET_ARCH
for softfloat).

Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D8529
2016-11-16 15:21:32 +00:00
Jason Evans
bde951447f Update jemalloc to 4.3.1. 2016-11-09 18:42:30 +00:00
Edward Tomasz Napierala
319f1fd2ea Document that getfsstat(2) called with MNT_NOWAIT skips file systems
that are in the process of being unmounted.

Reviewed by:	des@ (earlier version)
MFC after:	1 month
2016-11-06 19:37:22 +00:00
Ed Schouten
34168b28e9 Replace basename(3) by a thread-safe implementation.
Now that the changes to the dirname(3) function had some time to settle,
let's go ahead and use the same approach for replacing basename(3) by a
simple implementation that modifies the input string, thereby making it
thread-safe and guaranteed to succeed.

Unlike dirname(3), this function already had a thread-safe variant
basename_r(3). This function had its own set of problems, like having an
upper bound on the pathname length. Keep this function around for
compatibility, but remove most references from the man page. Make the
man page more similar to that of dirname(3).

As the basename_r(3) function is only provided by FreeBSD (and Bionic),
depending on its use is even more implementation defined than assuming
that basename(3) is thread-safe.

Reviewed by:	emaste
Differential Revision:	https://reviews.freebsd.org/D8382
2016-11-03 20:21:34 +00:00
Ruslan Bukin
77bc2a1cd6 Locale fix for endian big (EB) machines.
We have locale files generated on EL machines (e.g. during cross-build
on amd64 host), but then we are using them on EB machines (e.g. MIPS64EB),
so proceed byte-swap if necessary.

All the libc tests passed successfully, including Russian collation.

Tested by:	br@, Hongyan Xia <hx242@cam.ac.uk>
Sponsored by:	DARPA, AFRL
Sponsored by:	HEIF5
Differential Revision:	https://reviews.freebsd.org/D8281
2016-11-01 13:54:44 +00:00
Ruslan Bukin
130a08a362 Detect integer overflow and limit the number of positional
arguments in the string format.

Sponsored by:	DARPA, AFRL
Sponsored by:	HEIF5
Differential Revision:	https://reviews.freebsd.org/D8286
2016-10-31 18:38:58 +00:00
Ruslan Bukin
5bca221511 Add full softfloat and hardfloat support for MIPS.
This adds new target architectures for hardfloat:
mipselhf mipshf mips64elhf mips64hf.

Tested in QEMU only.

Sponsored by:	DARPA, AFRL
Sponsored by:	HEIF5
Differential Revision:	https://reviews.freebsd.org/D8376
2016-10-31 15:33:58 +00:00
Justin Hibbits
aab03089ee Fix a copy&paste-o causing a segfault with sigsetjmp.
I'm not sure how this passed my code inspection and initial testing, it's
obviously wrong.  Found when debugging csh.
2016-10-29 01:22:55 +00:00
John Baldwin
ab1b41edb5 Fix formatting of tables.
Specifically, use .Ta instead of tabs to separate column entries.  While
here fix a few other things:
- Use .Sy for all column headers (previously only the first column header
  was bold)
- Use .Dv to markup constants used for MIB names.
- Use "1234" and "4321" for the byte order descriptions without
  thousands separators.
- Mark up header files in the first table with .In.

MFC after:	2 weeks
2016-10-28 18:09:08 +00:00
Justin Hibbits
80d2e10c9b Fix a typo which broke the build for powerpc.
It's spelled LIBC_SRCTOP not LIBC_SRC.

Pointy-hat to:	jhibbits
Reported by:	kib
2016-10-25 01:32:35 +00:00
Justin Hibbits
3bf61aa48d Remove the powerpcspe Symbol.map, it's identical to powerpc's.
Reported by:	kib
2016-10-23 18:08:34 +00:00
Justin Hibbits
54360de77b Reduce code duplication between powerpc and powerpcspe
They're nearly identical except for a few files.
Reported by:	kib
2016-10-22 21:51:58 +00:00
Justin Hibbits
8f34671a5e ptrace.S is not needed, libc/sys/ptrace.c exists already.
This was leftovers from the initial branch work.

Reported by:	kib
2016-10-22 13:11:09 +00:00
Justin Hibbits
dc9b124d66 Create a new MACHINE_ARCH for Freescale PowerPC e500v2
Summary:
The Freescale e500v2 PowerPC core does not use a standard FPU.
Instead, it uses a Signal Processing Engine (SPE)--a DSP-style vector processor
unit, which doubles as a FPU.  The PowerPC SPE ABI is incompatible with the
stock powerpc ABI, so a new MACHINE_ARCH was created to deal with this.
Additionaly, the SPE opcodes overlap with Altivec, so these are mutually
exclusive.  Taking advantage of this fact, a new file, powerpc/booke/spe.c, was
created with the same function set as in powerpc/powerpc/altivec.c, so it
becomes effectively a drop-in replacement.  setjmp/longjmp were modified to save
the upper 32-bits of the now-64-bit GPRs (upper 32-bits are only accessible by
the SPE).

Note: This does _not_ support the SPE in the e500v1, as the e500v1 SPE does not
support double-precision floating point.

Also, without a new MACHINE_ARCH it would be impossible to provide binary
packages which utilize the SPE.

Additionally, no work has been done to support ports, work is needed for this.
This also means no newer gcc can yet be used.  However, gcc's powerpc support
has been refactored which would make adding a powerpcspe-freebsd target very
easy.

Test Plan:
This was lightly tested on a RouterBoard RB800 and an AmigaOne A1222
(P1022-based) board, compiled against the new ABI.  Base system utilities
(/bin/sh, /bin/ls, etc) still function appropriately, the system is able to boot
multiuser.

Reviewed By:	bdrewery, imp
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D5683
2016-10-22 01:57:15 +00:00
Enji Cooper
aa1a469bcd Only build lib/libc/tests/iconv if MK_ICONV != no
MFC after:	1 week
Reported by:	damian@damianek.be
Sponsored by:	Dell EMC Isilon
2016-10-21 04:54:43 +00:00
John Baldwin
3abf45a148 Use 'cmd' rather than 'command' to match the function prototype. 2016-10-17 22:36:37 +00:00
Ed Schouten
279d89406b Improve phrasing of the STANDARDS section.
Reported by:	wblock
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D8205
2016-10-15 08:09:55 +00:00