Commit Graph

650 Commits

Author SHA1 Message Date
John Baldwin
f12c034874 Fix some problems with effective mmap() offsets > 32 bits. This was
partially fixed on amd64 earlier.  Rather than forcing linux_mmap_common()
to use a 32-bit offset, have it accept a 64-bit file offset.  This offset
is then passed to the real mmap() call.  Rather than inventing a structure
to hold the normal linux_mmap args that has a 64-bit offset, just pass
each of the arguments individually to linux_mmap_common() since that more
closes matches the existing style of various kern_foo() functions.

Submitted by:	Christian Zander @ Nvidia
MFC after:	1 week
2009-10-28 20:17:54 +00:00
Konstantin Belousov
d6e029adbe In r197963, a race with thread being selected for signal delivery
while in kernel mode, and later changing signal mask to block the
signal, was fixed for sigprocmask(2) and ptread_exit(3). The same race
exists for sigreturn(2), setcontext(2) and swapcontext(2) syscalls.

Use kern_sigprocmask() instead of direct manipulation of td_sigmask to
reschedule newly blocked signals, closing the race.

Reviewed by:	davidxu
Tested by:	pho
MFC after:	1 month
2009-10-27 10:47:58 +00:00
Dag-Erling Smørgrav
80c03b8eee As jhb@ pointed out to me, r197057 was incorrect, not least because these
are generated files.
2009-09-10 13:20:27 +00:00
Bjoern A. Zeeb
89ffc202d6 Fix handling of .note.ABI-tag section for GNU systems [1].
Handle GNU/Linux according to LSB Core Specification 4.0,
Chapter 11. Object Format, 11.8. ABI note tag.

Also check the first word of desc, not only name, according to
glibc abi-tags specification to distinguish between Linux and
kFreeBSD.

Add explicit handling for Debian GNU/kFreeBSD, which runs
on our kernels as well [2].

In {amd64,i386}/trap.c, when checking osrel of the current process,
also check the ABI to not change the signal behaviour for Linux
binary processes, now that we save an osrel version for all three
from the lists above in struct proc [2].

These changes make it possible to run FreeBSD, Debian GNU/kFreeBSD
and Linux binaries on the same machine again for at least i386 and
amd64, and no longer break kFreeBSD which was detected as GNU(/Linux).

PR:		kern/135468
Submitted by:	dchagin [1] (initial patch)
Suggested by:	kib [2]
Tested by:	Petr Salinger (Petr.Salinger seznam.cz) for kFreeBSD
Reviewed by:	kib
MFC after:	3 days
2009-08-24 16:19:47 +00:00
John Baldwin
7c020cbbe5 Return ENOSYS instead of EINVAL for invalid function codes to match the
behavior of Linux.

Reported by:	Alexander Best  alexbestms of math.uni-muenster.de
Approved by:	re (kib)
2009-06-26 19:39:33 +00:00
Dmitry Chagin
f8cd0af232 Implement accept4 syscall.
Approved by:	kib (mentor)
MFC after:	1 month
2009-06-01 20:48:39 +00:00
Robert Watson
33dd50646e Regenerate generated syscall files following changes to struct sysent in
r193234.
2009-06-01 16:14:38 +00:00
Dmitry Chagin
3933bde22e Somewhere between 2.6.23 and 2.6.27, Linux added SOCK_CLOEXEC and
SOCK_NONBLOCK flags, that allow to save fcntl() calls.

Implement a variation of the socket() syscall which takes a flags
in addition to the type argument.

Approved by:	kib (mentor)
MFC after:	1 month
2009-05-16 18:48:41 +00:00
Dmitry Chagin
8d30f381ef Do not export AT_CLKTCK when emulating Linux kernel prior
to 2.4.0, as it has appeared in the 2.4.0-rc7 first time.
Being exported, AT_CLKTCK is returned by sysconf(_SC_CLK_TCK),
glibc falls back to the hard-coded CLK_TCK value when aux entry
is not present.

Glibc versions prior to 2.2.1 always use hard-coded CLK_TCK value.

For older applications/libc's which depends on hard-coded CLK_TCK
value user should set compat.linux.osrelease less than 2.4.0.

Approved by:	kib (mentor)
2009-05-10 18:43:43 +00:00
Dmitry Chagin
1ca16454b3 Rework r189362, r191883.
The frequency of the statistics clock is given by stathz.
Use stathz if it is available, otherwise use hz.

Pointed out by:	bde

Approved by:	kib (mentor)
2009-05-10 18:16:07 +00:00
Jamie Gritton
7ae27ff49f Move the per-prison Linux MIB from a private one-off pointer to the new
OSD-based jail extensions.  This allows the Linux MIB to accessed via
jail_set and jail_get, and serves as a demonstration of adding jail support
to a module.

Reviewed by:	dchagin, kib
Approved by:	bz (mentor)
2009-05-07 18:36:47 +00:00
Dmitry Chagin
13f20d7e86 To avoid excessive code duplication move MI definitions to the MI
header file. As it is defined in Linux.

Approved by:	kib (mentor)
MFC after:	1 month
2009-05-07 09:39:20 +00:00
Dmitry Chagin
d789bfd562 Move extern variable definitions to the header file.
Approved by:	kib (mentor)
MFC after:	1 month
2009-05-02 10:06:49 +00:00
Dmitry Chagin
79262bf1f0 Reimplement futexes.
Old implemention used Giant to protect the kernel data structures,
but at the same time called malloc(M_WAITOK), that could cause the
calling thread to sleep and lost Giant protection. User-visible
result was the missed wakeup.

New implementation uses one sx lock per futex. The sx protects
the futex structures and allows to sleep while copyin or copyout
are performed.

Unlike linux, we return EINVAL when FUTEX_CMP_REQUEUE operation
is requested and either caller specified futexes are equial or
second futex already exists. This is acceptable since the situation
can only occur from the application error, and glibc falls back to
old FUTEX_WAKE operation when FUTEX_CMP_REQUEUE returns an error.

Approved by:	kib (mentor)
MFC after:	1 month
2009-05-01 15:36:02 +00:00
Dmitry Chagin
cd899aad76 Fix KBI breakage by r190520 which affects older linux.ko binaries:
1) Move the new field (brand_note) to the end of the Brandinfo structure.
2) Add a new flag BI_BRAND_NOTE that indicates that the brand_note pointer
   is valid.
3) Use the brand_note field if the flag BI_BRAND_NOTE is set and as old
   modules won't have the flag set, so the new field brand_note would be
   ignored.

Suggested by:	jhb
Reviewed by:	jhb
Approved by:	kib (mentor)
MFC after:	6 days
2009-04-05 09:27:19 +00:00
Dmitry Chagin
32c01de21c Implement new way of branding ELF binaries by looking to a
".note.ABI-tag" section.

The search order of a brand is changed, now first of all the
".note.ABI-tag" is looked through.

Move code which fetch osreldate for ELF binary to check_note() handler.

PR:		118473
Approved by:	kib (mentor)
2009-03-13 16:40:51 +00:00
John Baldwin
2ee8325f42 A better fix for handling different FPU initial control words for different
ABIs:
- Store the FPU initial control word in the pcb for each thread.
- When first using the FPU, load the initial control word after restoring
  the clean state if it is not the standard control word.
- Provide a correct control word for Linux/i386 binaries under
  FreeBSD/amd64.
- Adjust the control word returned for fpugetregs()/npxgetregs() when a
  thread hasn't used the FPU yet to reflect the real initial control
  word for the current ABI.
- The Linux/i386 ABI for FreeBSD/i386 now properly sets the right control
  word instead of trashing whatever the current state of the FPU is.

Reviewed by:	bde
2009-03-05 19:42:11 +00:00
Dmitry Chagin
4d7c2e8a48 Add AT_PLATFORM, AT_HWCAP and AT_CLKTCK auxiliary vector entries which
are used by glibc. This silents the message "2.4+ kernel w/o ELF notes?"
from some programs at start, among them are top and pkill.

Do the assignment of the vector entries in elf_linux_fixup()
as it is done in glibc.

Fix some minor style issues.

Submitted by:	Marcin Cieslak <saper at SYSTEM PL>
Approved by:	kib (mentor)
MFC after:	1 week
2009-03-04 12:14:33 +00:00
Konstantin Belousov
99b7f1a10b Adapt linux emulation to use cv for vfork wait.
Submitted by:	Takahiro Kurosawa <takahiro.kurosawa gmail com>
PR:	kern/131506
2009-02-18 16:11:39 +00:00
David E. O'Brien
e6493bbebf Change some movl's to mov's. Newer GAS no longer accept 'movl' instructions
for moving between a segment register and a 32-bit memory location.

Looked at by:	jhb
2009-01-31 11:37:21 +00:00
Warner Losh
0e7faf3934 Remove obsolete AT_DEBUG stuff. It never should have been committed
in the first place, let alone migrated to linux emulation.

Reviewed by:	peter, rdivacky
2008-12-17 06:11:42 +00:00
Konstantin Belousov
74f5d68011 Make linux_sendmsg() and linux_recvmsg() work on linux32/amd64.
Change types used in the linux' struct msghdr and struct cmsghdr
definitions to the properly-sized architecture-specific types.
Move ancillary data handler from linux_sendit() to linux_sendmsg().

Submitted by:	dchagin
2008-11-29 17:14:06 +00:00
Konstantin Belousov
b4cf0e62f4 Add sv_flags field to struct sysentvec with intention to provide description
of the ABI of the currently executing image. Change some places to test
the flags instead of explicit comparing with address of known sysentvec
structures to determine ABI features.

Discussed with:	dchagin, imp, jhb, peter
2008-11-22 12:36:15 +00:00
Konstantin Belousov
62162dfc94 In the robust futexes list head, futex_offset shall be signed,
and glibc actually supplies negative offsets. Change l_ulong to l_long.

Submitted by:	dchagin
2008-11-16 15:45:41 +00:00
Ed Schouten
ab0d10f68e Several cleanups related to pipe(2).
- Use `fildes[2]' instead of `*fildes' to make more clear that pipe(2)
  fills an array with two descriptors.

- Remove EFAULT from the manual page. Because of the current calling
  convention, pipe(2) raises a segmentation fault when an invalid
  address is passed.

- Introduce kern_pipe() to make it easier for binary emulations to
  implement pipe(2).

- Make Linux binary emulation use kern_pipe(), which means we don't have
  to recover td_retval after calling the FreeBSD system call.

Approved by:	rdivacky
Discussed on:	arch
2008-11-11 14:55:59 +00:00
Ed Schouten
ebb45b0620 Regenerate system call tables for r184789. 2008-11-09 10:48:06 +00:00
Ed Schouten
a1b5a8955e Mark uname(), getdomainname() and setdomainname() with COMPAT_FREEBSD4.
Looking at our source code history, it seems the uname(),
getdomainname() and setdomainname() system calls got deprecated
somewhere after FreeBSD 1.1, but they have never been phased out
properly. Because we don't have a COMPAT_FREEBSD1, just use
COMPAT_FREEBSD4.

Also fix the Linuxolator to build without the setdomainname() routine by
just making it call userland_sysctl on kern.domainname. Also replace the
setdomainname()'s implementation to use this approach, because we're
duplicating code with sysctl_domainname().

I wasn't able to keep these three routines working in our
COMPAT_FREEBSD32, because that would require yet another keyword for
syscalls.master (COMPAT4+NOPROTO). Because this routine is probably
unused already, this won't be a problem in practice. If it turns out to
be a problem, we'll just restore this functionality.

Reviewed by:	rdivacky, kib
2008-11-09 10:45:13 +00:00
Konstantin Belousov
aa8b201112 Correctly fill siginfo for the signals delivered by linux tkill/tgkill.
It is required for async cancellation to work.

Fix PROC_LOCK leak in linux_tgkill when signal delivery attempt is made
to not linux process.

Do not call em_find(p, ...) with p unlocked.

Move common code for linux_tkill() and linux_tgkill() into
linux_do_tkill().

Change linux siginfo_t definition to match actual linux one. Extend
uid fields to 4 bytes from 2. The extension does not change structure
layout and is binary compatible with previous definition, because i386
is little endian, and each uid field has 2 byte padding after it.

Reported by:	Nicolas Joly <njoly pasteur fr>
Submitted by:	dchangin
MFC after:	1 month
2008-10-19 10:02:26 +00:00
Konstantin Belousov
175c6c319b Make robust futexes work on linux32/amd64. Use PTRIN to read
user-mode pointers. Change types used in the structures definitions to
properly-sized architecture-specific types.

Submitted by:	dchagin
MFC after:	1 week
2008-10-14 07:59:23 +00:00
Konstantin Belousov
a8d403e102 Change the static struct sysentvec and struct Elf_Brandinfo initializers
to the C99 style. At least, it is easier to read sysent definitions
that way, and search for the actual instances of sigcode etc.

Explicitely initialize sysentvec.sv_maxssiz that was missed in most
sysvecs.

No objection from:	jhb
MFC after:	1 month
2008-09-24 10:14:37 +00:00
Konstantin Belousov
9dee707cf0 Segment registers are stored in the uc_mcontext member of the struct
l_ucontext. To restore the registers content, trampoline needs to
dereference uc_mcontext instead of taking some undefined values from
l_ucontext.

Submitted by:	Dmitry Chagin <dchagin@>
MFC after:	1 week
2008-09-07 16:39:21 +00:00
Roman Divacky
7c0cc5f941 Regen.
Approved by:	kib (mentor)
2008-05-13 20:02:26 +00:00
Roman Divacky
4732e446fb Implement robust futexes. Most of the code is modelled after
what Linux does. This is because robust futexes are mostly
userspace thing which we cannot alter. Two syscalls maintain
pointer to userspace list and when process exits a routine
walks this list waking up processes sleeping on futexes
from that list.

Reviewed by:	kib (mentor)
MFC after:	1 month
2008-05-13 20:01:27 +00:00
Roman Divacky
a6d043e30d Implement linux_truncate64() syscall.
Tested by:	Aline de Freitas <aline@riseup.net>
Approved by:	kib (mentor)
2008-04-23 15:56:33 +00:00
Jung-uk Kim
01c3b1b200 Regenerate. 2008-04-16 19:27:36 +00:00
Jung-uk Kim
26833f3f9a Add stubs for syscalls introduced in Linux 2.6.17 kernel.
Some GNU libc version started using them before 2.6.17 was officially out.

MFC after:	3 days
2008-04-16 19:25:39 +00:00
Konstantin Belousov
50ad4fc65c Regenerate 2008-04-08 09:51:19 +00:00
Konstantin Belousov
48b05c3f82 Implement the linux syscalls
openat, mkdirat, mknodat, fchownat, futimesat, fstatat, unlinkat,
    renameat, linkat, symlinkat, readlinkat, fchmodat, faccessat.

Submitted by:	rdivacky
Sponsored by:	Google Summer of Code 2007
Tested by:	pho
2008-04-08 09:45:49 +00:00
Konstantin Belousov
57b4252e45 Add the support for the AT_FDCWD and fd-relative name lookups to the
namei(9).

Based on the submission by rdivacky,
	sponsored by Google Summer of Code 2007
Reviewed by:	rwatson, rdivacky
Tested by:	pho
2008-03-31 12:01:21 +00:00
Roman Divacky
d8653dd986 Regen. 2008-03-16 16:29:37 +00:00
Roman Divacky
5dfb688191 Implement sched_setaffinity and get_setaffinity using
real cpu affinity setting primitives.

Reviewed by:	jeff
Approved by:	kib (mentor)
2008-03-16 16:27:44 +00:00
Konstantin Belousov
22eca0bf45 Since version 4.3, gcc changed its behaviour concerning the i386/amd64
ABI and the direction flag, that is it now assumes that the direction
flag is cleared at the entry of a function and it doesn't clear once
more if needed. This new behaviour conforms to the i386/amd64 ABI.

Modify the signal handler frame setup code to clear the DF {e,r}flags
bit on the amd64/i386 for the signal handlers.

jhb@ noted that it might break old apps if they assumed DF == 1 would be
preserved in the signal handlers, but that such apps should be rare and
that older versions of gcc would not generate such apps.

Submitted by:	Aurelien Jarno <aurelien aurel32 net>
PR:	121422
Reviewed by:	jhb
MFC after:	2 weeks
2008-03-13 10:54:38 +00:00
Jeff Roberson
6617724c5f Remove kernel support for M:N threading.
While the KSE project was quite successful in bringing threading to
FreeBSD, the M:N approach taken by the kse library was never developed
to its full potential.  Backwards compatibility will be provided via
libmap.conf for dynamically linked binaries and static binaries will
be broken.
2008-03-12 10:12:01 +00:00
Jung-uk Kim
865df544c6 Fix Linux mmap with MAP_GROWSDOWN flag.
Reported by:	Andriy Gapon (avg at icyb dot net dot ua)
Tested by:	Andriy Gapon (avg at icyb dot net dot ua)
Pointyhat:	me
MFC after:	3 days
2008-02-11 19:35:03 +00:00
Attilio Rao
22db15c06f VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in
conjuction with 'thread' argument passing which is always curthread.
Remove the unuseful extra-argument and pass explicitly curthread to lower
layer functions, when necessary.

KPI results broken by this change, which should affect several ports, so
version bumping and manpage update will be further committed.

Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>
2008-01-13 14:44:15 +00:00
Attilio Rao
cb05b60a89 vn_lock() is currently only used with the 'curthread' passed as argument.
Remove this argument and pass curthread directly to underlying
VOP_LOCK1() VFS method. This modify makes the code cleaner and in
particular remove an annoying dependence helping next lockmgr() cleanup.
KPI results, obviously, changed.

Manpage and FreeBSD_version will be updated through further commits.

As a side note, would be valuable to say that next commits will address
a similar cleanup about VFS methods, in particular vop_lock1 and
vop_unlock.

Tested by:	Diego Sardina <siarodx at gmail dot com>,
		Andrea Di Pasquale <whyx dot it at gmail dot com>
2008-01-10 01:10:58 +00:00
Konstantin Belousov
6259969d36 Implement read_default_ldt in linux_modify_ldt(). It copies out zeroed
descriptor, like real Linux does.

Tested by: Yuriy Tsibizov <yuriy.tsibizov at gmail com>
Submitted by:	rdivacky
MFC after:	1 week
2007-11-26 11:06:19 +00:00
Konstantin Belousov
89b57fcf01 Fix for the panic("vm_thread_new: kstack allocation failed") and
silent NULL pointer dereference in the i386 and sparc64 pmap_pinit()
when the kmem_alloc_nofault() failed to allocate address space. Both
functions now return error instead of panicing or dereferencing NULL.

As consequence, vmspace_exec() and vmspace_unshare() returns the errno
int. struct vmspace arg was added to vm_forkproc() to avoid dealing
with failed allocation when most of the fork1() job is already done.

The kernel stack for the thread is now set up in the thread_alloc(),
that itself may return NULL. Also, allocation of the first process
thread is performed in the fork1() to properly deal with stack
allocation failure. proc_linkup() is separated into proc_linkup()
called from fork1(), and proc_linkup0(), that is used to set up the
kernel process (was known as swapper).

In collaboration with:	Peter Holm
Reviewed by:	jhb
2007-11-05 11:36:16 +00:00
Konstantin Belousov
96a2b63525 Fill in cr2 in the signal context from ksi->ksi_addr.
Together with the sys/i386/i386/trap.c rev. 1.306 it fixes the PR.

Submitted by:	rdivacky
Suggested by:	jhb
Sponsored by:	Google Summer of Code 2007
PR:		kern/77710
Approved by:	re (kensmith)
2007-09-20 13:46:26 +00:00
David Malone
e4a76144d0 regen.
Approved by: re (kensmith)
2007-09-18 19:51:49 +00:00