data (headers). Historically the size of the headers was not checked
against the socket buffer space. Application could easily overcommit the
socket buffer space.
With the new sendfile (r293439) the problem remained, but a KASSERT was
inserted that checked that amount of data written to the socket matches
its space. In case when size of headers is bigger that socket space,
KASSERT fires. Without INVARIANTS the new sendfile won't panic, but
would report incorrect amount of bytes sent.
o With this change, the headers copyin is moved down into the cycle, after
the sbspace() check. The uio size is trimmed by socket space there,
which fixes the overcommit problem and its consequences.
o The compatibility handling for FreeBSD 4 sendfile headers API is pushed
up the stack to syscall wrappers. This required a copy and paste of the
code, but in turn this allowed to remove extra stack carried parameter
from fo_sendfile_t, and embrace entire compat code into #ifdef. If in
future we got more fo_sendfile_t function, the copy and paste level would
even reduce.
Reviewed by: emax, gallatin, Maxim Dounin <mdounin mdounin.ru>
Tested by: Vitalij Satanivskij <satan ukr.net>
Sponsored by: Netflix
- Mark AIO system calls as STD and remove the helpers to dynamically
register them.
- Use COMPAT6 for the old system calls with the older sigevent instead of
an 'o' prefix.
- Simplify the POSIX configuration to note that AIO is always available.
- Handle AIO in the default VOP_PATHCONF instead of special casing it in
the pathconf() system call. fpathconf() is still hackish.
- Remove freebsd32_aio_cancel() as it just called the native one directly.
Reviewed by: kib
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D5589
- Set td_errno so that ktrace and dtrace can obtain the syscall error
number in the usual way.
- Pass negative error numbers directly to the syscall layer, as they're
not intended to be returned to userland.
Reviewed by: kib
Sponsored by: EMC / Isilon Storage Division
Differential Revision: https://reviews.freebsd.org/D5425
allocated from exec_map. If many threads try to perform execve(2) in
parallel, the exec map is exhausted and some threads sleep
uninterruptible waiting for the map space. Then, the thread which won
the race for the space allocation, cannot single-thread the process,
causing deadlock.
Reported and tested by: pho (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
rework. The number of entries was supposed to be returned to the user,
not used as a scratch variable.
This broke RELENG_4 jails starting up on current systems.
and export them to userland.
- Define __HAVE_REG32 on platforms that define a reg32 structure and check
for this in <sys/procfs.h> to control when to export prstatus32, etc.
- Add prstatus32_t and prpsinfo32_t typedefs for the 32-bit structures.
libbfd looks for these types, and having them fixes 'gcore' in gdb of a
32-bit process on a 64-bit platform.
- Use the structure definitions from <sys/procfs.h> in gcore's elf32 core
dump code instead of duplicating the definitions.
Differential Revision: https://reviews.freebsd.org/D2142
Reviewed by: kib, nathanw (powerpc bits)
MFC after: 1 week
The core kernel part is patch file utimes.2008.4.diff from
pluknet@FreeBSD.org. I updated the code for API changes, added the manual
page and added compatibility code for old kernels. There is also audit and
Capsicum support.
A new UTIME_* constant might allow setting birthtimes in future.
Differential Revision: https://reviews.freebsd.org/D1426
Submitted by: pluknet (partially)
Reviewed by: delphij, pluknet, rwatson
Relnotes: yes
attachment to the process. Note that the command is not intended to
be a security measure, rather it is an obfuscation feature,
implemented for parity with other operating systems.
Discussed with: jilles, rwatson
Man page fixes by: rwatson
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
in r276564, change path type to char * (pathnames are always char *).
And remove bogus casts of malloc().
kern___getcwd() internally doesn't actually use or support u_char *
paths, except to copy them to a normal char * path.
These changes are not visible to libc as libc/gen/getcwd.c misdeclares
__getcwd() as taking a plain char * path.
While here remove _SYS_SYSPROTO_H_ for __getcwd() syscall as
we always have sysproto.h.
Pointed out by: bde
MFC after: 1 week
the orphaned descendants. Base of the API is modelled after the same
feature from the DragonFlyBSD.
Requested by: bapt
Reviewed by: jilles (previous version)
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks
have both kern_open() and kern_openat(); change the callers to use
kern_openat().
This removes one (sometimes two) levels of indirection and
consolidates arguments checks.
Reviewed by: mckusick
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
ever used. It didn't go into stable/10, neither was documented.
It might be useful, but we collectively decided to remove it, rather
leave it abandoned and unmaintained. It is removed in one single
commit, so restoring it should be easy, if anyone wants to reopen
this idea.
Sponsored by: Netflix
The kernel tracks syscall users so that modules can safely unregister them.
But if the module is not unloadable or was compiled into the kernel, there is
no need to do this.
Achieve this by adding SY_THR_STATIC_KLD macro which expands to SY_THR_STATIC
during kernel build and 0 otherwise.
Reviewed by: kib (previous version)
MFC after: 2 weeks
rather than u_char.
To try and play nice with the ABI, the u_char CPU ID values are clamped
at 254. The new fields now contain the full CPU ID, or -1 for no cpu.
Differential Revision: D955
Reviewed by: jhb, kib
Sponsored by: Norse Corp, Inc.
Move the SCTP syscalls to netinet with the rest of the SCTP code.
Submitted by: Steve Kiernan <stevek@juniper.net>
Reviewed by: tuexen, rrs
Obtained from: Juniper Networks, Inc.
syscalls themselves are tightly coupled with the network stack and
therefore should not be in the generic socket code.
The following four syscalls have been marked as NOSTD so they can be
dynamically registered in sctp_syscalls_init() function:
sys_sctp_peeloff
sys_sctp_generic_sendmsg
sys_sctp_generic_sendmsg_iov
sys_sctp_generic_recvmsg
The syscalls are also set up to be dynamically registered when COMPAT32
option is configured.
As a side effect of moving the SCTP syscalls, getsock_cap needs to be
made available outside of the uipc_syscalls.c source file. A proper
prototype has been added to the sys/socketvar.h header file.
API tests from the SCTP reference implementation have been run to ensure
compatibility. (http://code.google.com/p/sctp-refimpl/source/checkout)
Submitted by: Steve Kiernan <stevek@juniper.net>
Reviewed by: tuexen, rrs
Obtained from: Juniper Networks, Inc.
struct flock are done in the sys_fcntl(), which mean that compat32 used
direct access to userland pointers.
Move code from sys_fcntl() to new wrapper, kern_fcntl_freebsd(), which
performs neccessary userland memory accesses, and use it from both
native and compat32 fcntl syscalls.
Reported by: jhibbits
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
Add a separate field which exports tracer pid and add a new keyword
("tracer") for ps to display it.
This is a follow up to r270444.
Reviewed by: kib
MFC after: 1 week
Relnotes: yes
This includes:
o All directories named *ia64*
o All files named *ia64*
o All ia64-specific code guarded by __ia64__
o All ia64-specific makefile logic
o Mention of ia64 in comments and documentation
This excludes:
o Everything under contrib/
o Everything under crypto/
o sys/xen/interface
o sys/sys/elf_common.h
Discussed at: BSDcan
Make it really work for native FreeBSD programs. Before this it was broken
for years due to different number of pointer dereferences in Linux and
FreeBSD IOCTL paths, permanently returning errors to FreeBSD programs.
This change breaks the driver FreeBSD IOCTL ABI, making it more strict,
but since it was not working any way -- who bother.
Add shims for 32-bit programs on 64-bit host, translating the argument
of the SG_IO IOCTL for both FreeBSD and Linux ABIs.
With this change I was able to run 32-bit Linux sg3_utils tools and simple
32 and 64-bit FreeBSD test tools on both 32 and 64-bit FreeBSD systems.
MFC after: 1 month
call to freebsd32_convert_msg_in() with freebsd32_copyin_control() to
readin and convert in a single step. This makes it simpler to put all
the control messages in a single mbuf or mbuf cluster as per the
limitations imposed upon us by ip6_setpktopts().
The logic is as follows:
1. Go over the array of control messages to determine overall size
and include extra padding for proper alignment as we go.
2. Get a mbuf or mbuf cluster as needed or fail if the overall
(adjusted) size is larger than a cluster.
3. Go over the array of control messages again, but now copy them
into kernel space and into aligned offsets.
4. Update the length of the control message to take padding between
the header and the data into account (but not for padding added
between one control message and the next).
Obtained from: Juniper Networks, Inc.
MFC after: 1 week
Also, remove the expression which calculated the location of the
strings for a new image and grown over the time to be
non-comprehensible. Instead, calculate the offsets by steps, which
also makes fixing the alignments much cleaner.
Reported and reviewed by: alc
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
- Retire long time unused (basically always unused) sys__umtx_lock()
and sys__umtx_unlock() syscalls
- struct umtx and their supporting definitions
- UMUTEX_ERROR_CHECK flag
- Retire UMTX_OP_LOCK/UMTX_OP_UNLOCK from _umtx_op() syscall
__FreeBSD_version is not bumped yet because it is expected that further
breakages to the umtx interface will follow up in the next days.
However there will be a final bump when necessary.
Sponsored by: EMC / Isilon storage division
Reviewed by: jhb
further refinement is required as some device drivers intended to be
portable over FreeBSD versions rely on __FreeBSD_version to decide whether
to include capability.h.
MFC after: 3 weeks
change... This eliminates a cast, and also forces td_retval
(often 2 32-bit registers) to be aligned so that off_t's can be
stored there on arches with strict alignment requirements like
armeb (AVILA)... On i386, this doesn't change alignment, and on
amd64 it doesn't either, as register_t is already 64bits...
This will also prevent future breakage due to people adding additional
fields to the struct...
This gets AVILA booting a bit farther...
Reviewed by: bde
This fires off a kqueue note (of type sendfile) to the configured kqfd
when the sendfile transaction has completed and the relevant memory
backing the transaction is no longer in use by this transaction.
This is analogous to SF_SYNC waiting for the mbufs to complete -
except now you don't have to wait.
Both SF_SYNC and SF_KQUEUE should work together, even if it
doesn't necessarily make any practical sense.
This is designed for use by applications which use backing cache/store
files (eg Varnish) or POSIX shared memory (not sure anything is using
it yet!) to know when a region of memory is free for re-use. Note
it doesn't mark the region as free overall - only free from this
transaction. The application developer still needs to track which
ranges are in the process of being recycled and wait until all
pending transactions are completed.
TODO:
* documentation, as always
Sponsored by: Netflix, Inc.
for extending and reusing it.
The sendfile_sync wrapper is mostly just a "mbuf transaction" wrapper,
used to indicate that the backing store for a group of mbufs has completed.
It's only being used by sendfile for now and it's only implementing a
sleep/wakeup rendezvous. However, there are other potential signaling
paths (kqueue) and other potential uses (socket zero-copy write) where the
same mechanism would also be useful.
So, with that in mind:
* extract the sendfile_sync code out into sf_sync_*() methods
* teach the sf_sync_alloc method about the current config flag -
it will eventually know about kqueue.
* move the sendfile_sync code out of do_sendfile() - the only thing
it now knows about is the sfs pointer. The guts of the sync
rendezvous (setup, rendezvous/wait, free) is now done in the
syscall wrapper.
* .. and teach the 32-bit compat sendfile call the same.
This should be a no-op. It's primarily preparation work for teaching
the sendfile_sync about kqueue notification.
Tested:
* Peter Holm's sendfile stress / regression scripts
Sponsored by: Netflix, Inc.
in one of the many layers of indirection and shims through stable/7
in jail_handle_ips(). When it was cleaned up and unified through
kern_jail() for 8.x, the byte order swap was lost.
This only matters for ancient binaries that call jail(2) themselves
internally.
given process.
Note that the correctness of the trampoline length returned for ABIs
which do not use shared page depends on the correctness of the struct
sysvec sv_szsigcodebase member, which will be fixed on as-need basis.
Sponsored by: The FreeBSD Foundation
MFC after: 1 week