Commit Graph

669 Commits

Author SHA1 Message Date
dchagin
f248585449 Sort include files in the alphabetical order.
Approved by:	kib (mentor)
MFC after:	2 weeks
2009-03-16 05:39:37 +00:00
dchagin
e488f4df7a Ignore FUTEX_FD op, as it is done by linux.
Approved by:	kib (mentor)
MFC after:	2 weeks
2009-03-15 19:38:34 +00:00
dchagin
09af73f25f Include linux_futex.h before linux_emul.h
Approved by:	kib (mentor)
MFC after:	6 days
2009-03-15 19:16:12 +00:00
jhb
e1b708897e A better fix for handling different FPU initial control words for different
ABIs:
- Store the FPU initial control word in the pcb for each thread.
- When first using the FPU, load the initial control word after restoring
  the clean state if it is not the standard control word.
- Provide a correct control word for Linux/i386 binaries under
  FreeBSD/amd64.
- Adjust the control word returned for fpugetregs()/npxgetregs() when a
  thread hasn't used the FPU yet to reflect the real initial control
  word for the current ABI.
- The Linux/i386 ABI for FreeBSD/i386 now properly sets the right control
  word instead of trashing whatever the current state of the FPU is.

Reviewed by:	bde
2009-03-05 19:42:11 +00:00
dchagin
45cda70b8f Add AT_PLATFORM, AT_HWCAP and AT_CLKTCK auxiliary vector entries which
are used by glibc. This silents the message "2.4+ kernel w/o ELF notes?"
from some programs at start, among them are top and pkill.

Do the assignment of the vector entries in elf_linux_fixup()
as it is done in glibc.

Fix some minor style issues.

Submitted by:	Marcin Cieslak <saper at SYSTEM PL>
Approved by:	kib (mentor)
MFC after:	1 week
2009-03-04 12:14:33 +00:00
bz
df2be82cec For all files including net/vnet.h directly include opt_route.h and
net/route.h.

Remove the hidden include of opt_route.h and net/route.h from net/vnet.h.

We need to make sure that both opt_route.h and net/route.h are included
before net/vnet.h because of the way MRT figures out the number of FIBs
from the kernel option. If we do not, we end up with the default number
of 1 when including net/vnet.h and array sizes are wrong.

This does not change the list of files which depend on opt_route.h
but we can identify them now more easily.
2009-02-27 14:12:05 +00:00
ed
72727e8d9f Don't make Linux stat() open character devices to resolve its name.
The existing code calls kern_open() to resolve the vnode of a pathname
right after a stat(). This is not correct, because it causes random
character devices to be opened in /dev. This means ls'ing a tape
streamer will cause it to rewind, for example. Changes I have made:

- Add kern_statat_vnhook() to allow binary emulators to `post-process'
  struct stat, using the proper vnode.

- Remove unneeded printf's from stat() and statfs().

- Make the Linuxolator use kern_statat_vnhook(), replacing
  translate_path_major_minor_at().

- Let translate_fd_major_minor() use vp->v_rdev instead of
  vp->v_un.vu_cdev.

Result:

	crw-rw-rw- 1 root root   0, 14 Feb 20 13:54 /dev/ptmx
	crw--w---- 1 root adm  136,  0 Feb 20 14:03 /dev/pts/0
	crw--w---- 1 root adm  136,  1 Feb 20 14:02 /dev/pts/1
	crw--w---- 1 ed   tty  136,  2 Feb 20 14:03 /dev/pts/2

Before this commit, ptmx also had a major number of 136, because it
silently allocated and deallocated a pseudo-terminal. Device nodes that
cannot be opened now have proper major/minor-numbers.

Reviewed by:	kib, netchild, rdivacky (thanks!)
2009-02-20 13:05:29 +00:00
jhb
26e338d6fc Use shared vnode locks when invoking VOP_READDIR().
MFC after:	1 month
2009-02-13 18:18:14 +00:00
netchild
810bd8f924 Fix an edge-case of the linux readdir: We need the size of a linux dirent
structure, not the size of a pointer to it.

PR:		131099
Submitted by:	Andreas Kies <andikies@gmail.com>
MFC after:	2 weeks
2009-02-13 11:55:19 +00:00
ed
a964306db9 Last step of splitting up minor and unit numbers: remove minor().
Inside the kernel, the minor() function was responsible for obtaining
the device minor number of a character device. Because we made device
numbers dynamically allocated and independent of the unit number passed
to make_dev() a long time ago, it was actually a misnomer. If you really
want to obtain the device number, you should use dev2udev().

We already converted all the drivers to use dev2unit() to obtain the
device unit number, which is still used by a lot of drivers. I've
noticed not a single driver passes NULL to dev2unit(). Even if they
would, its behaviour would make little sense. This is why I've removed
the NULL check.

Ths commit removes minor(), minor2unit() and unit2minor() from the
kernel. Because there was a naming collision with uminor(), we can
rename umajor() and uminor() back to major() and minor(). This means
that the makedev(3) manual page also applies to kernel space code now.

I suspect umajor() and uminor() isn't used that often in external code,
but to make it easier for other parties to port their code, I've
increased __FreeBSD_version to 800062.
2009-01-28 17:57:16 +00:00
ed
f3a9a195cb Push down Giant inside sysctl. Also add some more assertions to the code.
In the existing code we didn't really enforce that callers hold Giant
before calling userland_sysctl(), even though there is no guarantee it
is safe. Fix this by just placing Giant locks around the call to the oid
handler. This also means we only pick up Giant for a very short period
of time. Maybe we should add MPSAFE flags to sysctl or phase it out all
together.

I've also added SYSCTL_LOCK_ASSERT(). We have to make sure sysctl_root()
and name2oid() are called with the sysctl lock held.

Reviewed by:	Jille Timmermans <jille quis cx>
2008-12-29 12:58:45 +00:00
bz
604d89458a Rather than using hidden includes (with cicular dependencies),
directly include only the header files needed. This reduces the
unneeded spamming of various headers into lots of files.

For now, this leaves us with very few modules including vnet.h
and thus needing to depend on opt_route.h.

Reviewed by:	brooks, gnn, des, zec, imp
Sponsored by:	The FreeBSD Foundation
2008-12-02 21:37:28 +00:00
kib
8ffb383318 Make linux_sendmsg() and linux_recvmsg() work on linux32/amd64.
Change types used in the linux' struct msghdr and struct cmsghdr
definitions to the properly-sized architecture-specific types.
Move ancillary data handler from linux_sendit() to linux_sendmsg().

Submitted by:	dchagin
2008-11-29 17:14:06 +00:00
rdivacky
b213864d66 Document that all the other commands are either
identical to the FreeBSD ones or rejected by
kern_msgctl().

Found with:	Coverity Prevent(tm)
CID:	3456
Approved by:	kib (mentor)
2008-11-26 16:38:43 +00:00
kib
f5d16a4d66 In the robust futexes list head, futex_offset shall be signed,
and glibc actually supplies negative offsets. Change l_ulong to l_long.

Submitted by:	dchagin
2008-11-16 15:45:41 +00:00
ed
9d3703b842 Mark uname(), getdomainname() and setdomainname() with COMPAT_FREEBSD4.
Looking at our source code history, it seems the uname(),
getdomainname() and setdomainname() system calls got deprecated
somewhere after FreeBSD 1.1, but they have never been phased out
properly. Because we don't have a COMPAT_FREEBSD1, just use
COMPAT_FREEBSD4.

Also fix the Linuxolator to build without the setdomainname() routine by
just making it call userland_sysctl on kern.domainname. Also replace the
setdomainname()'s implementation to use this approach, because we're
duplicating code with sysctl_domainname().

I wasn't able to keep these three routines working in our
COMPAT_FREEBSD32, because that would require yet another keyword for
syscalls.master (COMPAT4+NOPROTO). Because this routine is probably
unused already, this won't be a problem in practice. If it turns out to
be a problem, we'll just restore this functionality.

Reviewed by:	rdivacky, kib
2008-11-09 10:45:13 +00:00
kib
288874a97d The code in linux_proc_exit() contains a race when multiple linux based
processes exits at the same time.  The linux_emuldata structure is freed
but p->p_emuldata is left as a dangling pointer to the just freed memory.

The check for W_EXIT in the loop scanning the child processes isn't safe
since the state of the child process can change right afterwards. Lock
the process and check the W_EXIT before delivering signal.

Submitted by:	tegge
Reviewed by:	davidxu
MFC after:	1 week
2008-10-31 10:38:30 +00:00
trasz
0ad8692247 Introduce accmode_t. This is required for NFSv4 ACLs - it will be neccessary
to add more V* constants, and the variables changed by this patch were often
being assigned to mode_t variables, which is 16 bit.

Approved by:	rwatson (mentor)
2008-10-28 13:44:11 +00:00
des
66f807ed8b Retire the MALLOC and FREE macros. They are an abomination unto style(9).
MFC after:	3 months
2008-10-23 15:53:51 +00:00
kib
29ccf7d166 Correctly fill siginfo for the signals delivered by linux tkill/tgkill.
It is required for async cancellation to work.

Fix PROC_LOCK leak in linux_tgkill when signal delivery attempt is made
to not linux process.

Do not call em_find(p, ...) with p unlocked.

Move common code for linux_tkill() and linux_tgkill() into
linux_do_tkill().

Change linux siginfo_t definition to match actual linux one. Extend
uid fields to 4 bytes from 2. The extension does not change structure
layout and is binary compatible with previous definition, because i386
is little endian, and each uid field has 2 byte padding after it.

Reported by:	Nicolas Joly <njoly pasteur fr>
Submitted by:	dchangin
MFC after:	1 month
2008-10-19 10:02:26 +00:00
kib
faae1c0f2f Make robust futexes work on linux32/amd64. Use PTRIN to read
user-mode pointers. Change types used in the structures definitions to
properly-sized architecture-specific types.

Submitted by:	dchagin
MFC after:	1 week
2008-10-14 07:59:23 +00:00
kib
d7ec3f21ab Current linux_fooaffinity() emulation fails, as the FreeBSD affinity
syscalls expect the bitmap size in the range from 32 to 128. Old glibc
always assumed size 1024, while newer glibc searches for approriate
size, starting from 1024 and going up.

For now, use FreeBSD size of cpuset_t for bitmap size parameter and
return EINVAL if length of user space bitmap less than our size of
cpuset_t.

Submitted by:	dchagin
MFC after:	1 week
	[This requires MFC of the actual linux affinity syscalls]
2008-10-04 19:23:30 +00:00
zec
8797d4caec Step 1.5 of importing the network stack virtualization infrastructure
from the vimage project, as per plan established at devsummit 08/08:
http://wiki.freebsd.org/Image/Notes200808DevSummit

Introduce INIT_VNET_*() initializer macros, VNET_FOREACH() iterator
macros, and CURVNET_SET() context setting macros, all currently
resolving to NOPs.

Prepare for virtualization of selected SYSCTL objects by introducing a
family of SYSCTL_V_*() macros, currently resolving to their global
counterparts, i.e. SYSCTL_V_INT() == SYSCTL_INT().

Move selected #defines from sys/sys/vimage.h to newly introduced header
files specific to virtualized subsystems (sys/net/vnet.h,
sys/netinet/vinet.h etc.).

All the changes are verified to have zero functional impact at this
point in time by doing MD5 comparision between pre- and post-change
object files(*).

(*) netipsec/keysock.c did not validate depending on compile time options.

Implemented by:	julian, bz, brooks, zec
Reviewed by:	julian, bz, brooks, kris, rwatson, ...
Approved by:	julian (mentor)
Obtained from:	//depot/projects/vimage-commit2/...
X-MFC after:	never
Sponsored by:	NLnet Foundation, The FreeBSD Foundation
2008-10-02 15:37:58 +00:00
trasz
d1f0654ba4 Fix usage of mac_vnode_check_open() in linuxulator - last argument
should be VREAD, not FREAD.

Approved by:	rwatson (mentor)
2008-09-22 18:59:24 +00:00
rdivacky
74e5140a73 The ERESTART to EINTR conversion is already done in
kern_select so there is no need to repeat it in
linux_select().

Submitted by:	Dmitry Chagin <dchagin@>
MFC after:	1 week
Approved by:	kib (mentor)
2008-09-11 15:28:28 +00:00
rdivacky
817c519713 Getdents requires padding with 2 bytes instead of 1 byte
as with getdents64. The last byte is used for storing
the d_type, add this to plain getdents case where it was
missing before. Also change the code to use strlcpy instead
of plain strcpy. This changes fix the getdents crash we
had reports about (hl2 server etc.)

PR:		kern/117010
MFC after:	1 week
Submitted by:	Dmitry Chagin (dchagin@)
Tested by:	MITA Yoshio <mita ee.t.u-tokyo.ac jp>
Approved by:	kib (mentor)
2008-09-09 16:00:17 +00:00
kib
626be4984b Remove superfluous copyin() of args, structures are already in kernel space.
Submitted by:	dchagin
MFC after:	1 week
2008-09-09 13:01:14 +00:00
attilio
dbf35e279f Decontextualize the couplet VOP_GETATTR / VOP_SETATTR as the passed thread
was always curthread and totally unuseful.

Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
2008-08-28 15:23:18 +00:00
julian
18137ef251 All opt_x.h includes go at the top of other includes. 2008-08-25 04:55:29 +00:00
ed
cc3116a938 Integrate the new MPSAFE TTY layer to the FreeBSD operating system.
The last half year I've been working on a replacement TTY layer for the
FreeBSD kernel. The new TTY layer was designed to improve the following:

- Improved driver model:

  The old TTY layer has a driver model that is not abstract enough to
  make it friendly to use. A good example is the output path, where the
  device drivers directly access the output buffers. This means that an
  in-kernel PPP implementation must always convert network buffers into
  TTY buffers.

  If a PPP implementation would be built on top of the new TTY layer
  (still needs a hooks layer, though), it would allow the PPP
  implementation to directly hand the data to the TTY driver.

- Improved hotplugging:

  With the old TTY layer, it isn't entirely safe to destroy TTY's from
  the system. This implementation has a two-step destructing design,
  where the driver first abandons the TTY. After all threads have left
  the TTY, the TTY layer calls a routine in the driver, which can be
  used to free resources (unit numbers, etc).

  The pts(4) driver also implements this feature, which means
  posix_openpt() will now return PTY's that are created on the fly.

- Improved performance:

  One of the major improvements is the per-TTY mutex, which is expected
  to improve scalability when compared to the old Giant locking.
  Another change is the unbuffered copying to userspace, which is both
  used on TTY device nodes and PTY masters.

Upgrading should be quite straightforward. Unlike previous versions,
existing kernel configuration files do not need to be changed, except
when they reference device drivers that are listed in UPDATING.

Obtained from:		//depot/projects/mpsafetty/...
Approved by:		philip (ex-mentor)
Discussed:		on the lists, at BSDCan, at the DevSummit
Sponsored by:		Snow B.V., the Netherlands
dcons(4) fixed by:	kan
2008-08-20 08:31:58 +00:00
bz
1021d43b56 Commit step 1 of the vimage project, (network stack)
virtualization work done by Marko Zec (zec@).

This is the first in a series of commits over the course
of the next few weeks.

Mark all uses of global variables to be virtualized
with a V_ prefix.
Use macros to map them back to their global names for
now, so this is a NOP change only.

We hope to have caught at least 85-90% of what is needed
so we do not invalidate a lot of outstanding patches again.

Obtained from:	//depot/projects/vimage-commit2/...
Reviewed by:	brooks, des, ed, mav, julian,
		jamie, kris, rwatson, zec, ...
		(various people I forgot, different versions)
		md5 (with a bit of help)
Sponsored by:	NLnet Foundation, The FreeBSD Foundation
X-MFC after:	never
V_Commit_Message_Reviewed_By:	more people than the patch
2008-08-17 23:27:27 +00:00
ed
6b178cd86d Add TIOCPKT and TIOCSPTLCK to the Linuxolator.
We're very lucky, because the flags used by our TIOCPKT implementation
are the same as flags used by Linux. We can safely enable TIOCPKT,
assuming EXTPROC is not used.

TIOCSPTLCK is used by unlockpt(). Because we don't need unlockpt() in
our implementation, make this ioctl a no-op.

Approved by:	philip (mentor, implicit), rdivacky
Obtained from:	P4 (//depot/projects/mpsafetty/...)
2008-07-23 17:47:44 +00:00
rdivacky
32f23bcf55 Fix linux_alarm, the linux behaviour is to limit the
secs to INT_MAX when the passed in parameter is bigger
than INT_MAX.

Submitted by:	Dmitry Chagin <chagin.dmitry gmail com>
Approved by:	kib (mentor)
2008-07-23 17:19:02 +00:00
rwatson
051819b847 Introduce a new lock, hostname_mtx, and use it to synchronize access
to global hostname and domainname variables.  Where necessary, copy
to or from a stack-local buffer before performing copyin() or
copyout().  A few uses, such as in cd9660 and daemon_saver, remain
under-synchronized and will require further updates.

Correct a bug in which a failed copyin() of domainname would leave
domainname potentially corrupted.

MFC after:	3 weeks
2008-07-05 13:10:10 +00:00
rdivacky
7fba368b69 d_ino member of linux_dirent structure should be unsigned long.
Submitted by:	Chagin Dmitry <chagin.dmitry@gmail.com>
Approved by:	kib (mentor)
2008-06-08 11:09:25 +00:00
rdivacky
25a34fb524 Switch to emulating Linux 2.6 on default.
Approved by:	kib (mentor)
2008-06-03 17:50:13 +00:00
ed
ff609f1187 Push down the major/minor conversion for pts/%u to improve consistency.
In the mpsafetty branch, Linux sshd seems to work properly inside a
jail. Some small modifications had to be made to the Linux compatibility
layer.

The Linux PTY routines always expect the device major number to be 136
or higher. Our code always set the major/minor number pair to 136:0.
This makes routines like ttyname() and ptsname() fail, because we'll end
up having ambiguous device numbers.

The conversion was not performed on all *stat() routines, which meant in
some cases the numbers didn't get transformed. By pushing the conversion
into linux_driver_get_major_minor(), the transformation will take place
on all calls.

Approved by:	philip (mentor), rdivacky
2008-06-02 08:40:06 +00:00
rdivacky
13cbd9c97e Implement robust futexes. Most of the code is modelled after
what Linux does. This is because robust futexes are mostly
userspace thing which we cannot alter. Two syscalls maintain
pointer to userspace list and when process exits a routine
walks this list waking up processes sleeping on futexes
from that list.

Reviewed by:	kib (mentor)
MFC after:	1 month
2008-05-13 20:01:27 +00:00
rdivacky
dd1e82ea4d Implement linux_truncate64() syscall.
Tested by:	Aline de Freitas <aline@riseup.net>
Approved by:	kib (mentor)
2008-04-23 15:56:33 +00:00
rdivacky
69ec9a439c Remove using magic value of -1 to distinguish between linux_open()
and linux_openat(). Instead just pass AT_FDCWD into linux_common_open()
for the linux_open() case. This prevents passing -1 as a dirfd to
openat() from succeeding which is wrong.

Suggested by:	rwatson, kib
Approved by:	kib (mentor)
2008-04-09 16:42:50 +00:00
kib
eb77b477b4 Implement the linux syscalls
openat, mkdirat, mknodat, fchownat, futimesat, fstatat, unlinkat,
    renameat, linkat, symlinkat, readlinkat, fchmodat, faccessat.

Submitted by:	rdivacky
Sponsored by:	Google Summer of Code 2007
Tested by:	pho
2008-04-08 09:45:49 +00:00
kib
eff8c6d35e Add the support for the AT_FDCWD and fd-relative name lookups to the
namei(9).

Based on the submission by rdivacky,
	sponsored by Google Summer of Code 2007
Reviewed by:	rwatson, rdivacky
Tested by:	pho
2008-03-31 12:01:21 +00:00
dfr
79d2dfdaa6 Add the new kernel-mode NFS Lock Manager. To use it instead of the
user-mode lock manager, build a kernel with the NFSLOCKD option and
add '-k' to 'rpc_lockd_flags' in rc.conf.

Highlights include:

* Thread-safe kernel RPC client - many threads can use the same RPC
  client handle safely with replies being de-multiplexed at the socket
  upcall (typically driven directly by the NIC interrupt) and handed
  off to whichever thread matches the reply. For UDP sockets, many RPC
  clients can share the same socket. This allows the use of a single
  privileged UDP port number to talk to an arbitrary number of remote
  hosts.

* Single-threaded kernel RPC server. Adding support for multi-threaded
  server would be relatively straightforward and would follow
  approximately the Solaris KPI. A single thread should be sufficient
  for the NLM since it should rarely block in normal operation.

* Kernel mode NLM server supporting cancel requests and granted
  callbacks. I've tested the NLM server reasonably extensively - it
  passes both my own tests and the NFS Connectathon locking tests
  running on Solaris, Mac OS X and Ubuntu Linux.

* Userland NLM client supported. While the NLM server doesn't have
  support for the local NFS client's locking needs, it does have to
  field async replies and granted callbacks from remote NLMs that the
  local client has contacted. We relay these replies to the userland
  rpc.lockd over a local domain RPC socket.

* Robust deadlock detection for the local lock manager. In particular
  it will detect deadlocks caused by a lock request that covers more
  than one blocking request. As required by the NLM protocol, all
  deadlock detection happens synchronously - a user is guaranteed that
  if a lock request isn't rejected immediately, the lock will
  eventually be granted. The old system allowed for a 'deferred
  deadlock' condition where a blocked lock request could wake up and
  find that some other deadlock-causing lock owner had beaten them to
  the lock.

* Since both local and remote locks are managed by the same kernel
  locking code, local and remote processes can safely use file locks
  for mutual exclusion. Local processes have no fairness advantage
  compared to remote processes when contending to lock a region that
  has just been unlocked - the local lock manager enforces a strict
  first-come first-served model for both local and remote lockers.

Sponsored by:	Isilon Systems
PR:		95247 107555 115524 116679
MFC after:	2 weeks
2008-03-26 15:23:12 +00:00
ru
e9ab62a9ff Fix build.
Reported by:	ache, tinderbox
2008-03-25 13:20:52 +00:00
rdivacky
4a8a8b1c08 o Add stub support for some new futex operations,
so the annoying message is not printed.

	o	Don't warn about FUTEX_FD not being implemented
		and return ENOSYS instead of 0 (eg. success).

	o	Clear FUTEX_PRIVATE_FLAG as we actually implement
		only private futexes so there is no reason to
		return ENOSYS when app asks for a private futex.
		We don't reject shared futexes because they worked
		just fine with our implementation so far.

Approved by:	kib (mentor)
Tested by:	bsam
MFC after:	1 week
2008-03-20 17:03:55 +00:00
rdivacky
b13a84dcb7 Implement sched_setaffinity and get_setaffinity using
real cpu affinity setting primitives.

Reviewed by:	jeff
Approved by:	kib (mentor)
2008-03-16 16:27:44 +00:00
kib
86936eba80 Return ENOSYS instead of 0 for the unknown futex operations.
Submitted by: rdivacky
Reported and tested by: Gary Stanley <gary velocity-servers net>
2008-03-02 14:00:50 +00:00
kib
7ad2fb2ee1 Sanitize arguments to linux_mremap().
Check that only MREMAP_FIXED and MREMAP_MAYMOVE flags are specified.
Check for the page alignment of the addr argument.

Submitted by:	rdivacky
MFC after:	1 week
2008-02-22 11:47:56 +00:00
attilio
71b7824213 VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in
conjuction with 'thread' argument passing which is always curthread.
Remove the unuseful extra-argument and pass explicitly curthread to lower
layer functions, when necessary.

KPI results broken by this change, which should affect several ports, so
version bumping and manpage update will be further committed.

Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>
2008-01-13 14:44:15 +00:00
attilio
18d0a0dd51 vn_lock() is currently only used with the 'curthread' passed as argument.
Remove this argument and pass curthread directly to underlying
VOP_LOCK1() VFS method. This modify makes the code cleaner and in
particular remove an annoying dependence helping next lockmgr() cleanup.
KPI results, obviously, changed.

Manpage and FreeBSD_version will be updated through further commits.

As a side note, would be valuable to say that next commits will address
a similar cleanup about VFS methods, in particular vop_lock1 and
vop_unlock.

Tested by:	Diego Sardina <siarodx at gmail dot com>,
		Andrea Di Pasquale <whyx dot it at gmail dot com>
2008-01-10 01:10:58 +00:00