Commit Graph

1593 Commits

Author SHA1 Message Date
Marko Zec
29b02909eb Introduce a new virtualization container, provisionally named vprocg, to hold
virtualized instances of hostname and domainname, as well as a new top-level
virtualization struct vimage, which holds pointers to struct vnet and struct
vprocg.  Struct vprocg is likely to become replaced in the near future with
a new jail management API import.

As a consequence of this change, change struct ucred to point to a struct
vimage, instead of directly pointing to a vnet.

Merge vnet / vimage / ucred refcounting infrastructure from p4 / vimage
branch.

Permit kldload / kldunload operations to be executed only from the default
vimage context.

This change should have no functional impact on nooptions VIMAGE kernel
builds.

Reviewed by:	bz
Approved by:	julian (mentor)
2009-05-08 14:11:06 +00:00
Jamie Gritton
e03d223bd4 Give vfs_getopt the type it's expecting.
Write 100 times: "32 bits is so twentieth century."

Noticed by:	dchagin
2009-05-07 19:46:29 +00:00
Jamie Gritton
7ae27ff49f Move the per-prison Linux MIB from a private one-off pointer to the new
OSD-based jail extensions.  This allows the Linux MIB to accessed via
jail_set and jail_get, and serves as a demonstration of adding jail support
to a module.

Reviewed by:	dchagin, kib
Approved by:	bz (mentor)
2009-05-07 18:36:47 +00:00
Dmitry Chagin
ca8c3e7bba Add KTR(9) tracing for futex emulation.
Approved by:	kib (mentor)
MFC after:	1 month
2009-05-07 16:14:31 +00:00
Dmitry Chagin
c65b9bfa3c Linux exports HZ value to user space via AT_CLKTCK auxiliary vector entry,
which is available for Glibc as sysconf(_SC_CLK_TCK). If AT_CLKTCK entry is
not exported, Glibc uses 100.

linux_times() shall use the value that is exported to user space.

Pointyhat to:	dchagin

PR:		kern/134251
Approved by:	kib (mentor)
MFC after:	2 weeks
2009-05-07 14:24:50 +00:00
Dmitry Chagin
4d706dcc08 Change linux struct tms definition to match actual linux one.
Approved by:	kib (mentor)
MFC after:	2 weeks
2009-05-07 12:55:58 +00:00
Dmitry Chagin
4ec3ea90eb Add preliminary KTR(9) support to the linux emulation layer.
Approved by:	kib (mentor)
MFC after:	1 month
2009-05-07 10:01:05 +00:00
Dmitry Chagin
13f20d7e86 To avoid excessive code duplication move MI definitions to the MI
header file. As it is defined in Linux.

Approved by:	kib (mentor)
MFC after:	1 month
2009-05-07 09:39:20 +00:00
Dmitry Chagin
d9b063cc9d Return EAFNOSUPPORT instead of EINVAL in case when the incorrect or
unsupported domain argument is specified.

Approved by:	kib (mentor)
2009-05-07 09:34:02 +00:00
Dmitry Chagin
1a52a4abf7 Rework r191742.
Use the protocol family constants for the domain argument validation.

Return EAFNOSUPPORT in case when the incorrect domain argument
is specified.

Return EPROTONOSUPPORT instead of passing values that are not 0
to the BSD layer.

Suggested by:   rwatson

Approved by:	kib (mentor)
MFC after:	1 month
2009-05-07 03:23:22 +00:00
Jamie Gritton
84a8cad0f6 Mark Linux MIB sysctls MPSAFE.
Reviewed by:	dchagin, kib
Approved by:	bz (mentor)
2009-05-04 19:06:05 +00:00
Dmitry Chagin
40092d93b4 Linux socketpair() call expects explicit specified protocol for
AF_LOCAL domain unlike FreeBSD which expects 0 in this case.

Approved by:	kib (mentor)
MFC after:	1 month
2009-05-02 10:51:40 +00:00
Dmitry Chagin
d789bfd562 Move extern variable definitions to the header file.
Approved by:	kib (mentor)
MFC after:	1 month
2009-05-02 10:06:49 +00:00
Dmitry Chagin
79262bf1f0 Reimplement futexes.
Old implemention used Giant to protect the kernel data structures,
but at the same time called malloc(M_WAITOK), that could cause the
calling thread to sleep and lost Giant protection. User-visible
result was the missed wakeup.

New implementation uses one sx lock per futex. The sx protects
the futex structures and allows to sleep while copyin or copyout
are performed.

Unlike linux, we return EINVAL when FUTEX_CMP_REQUEUE operation
is requested and either caller specified futexes are equial or
second futex already exists. This is acceptable since the situation
can only occur from the application error, and glibc falls back to
old FUTEX_WAKE operation when FUTEX_CMP_REQUEUE returns an error.

Approved by:	kib (mentor)
MFC after:	1 month
2009-05-01 15:36:02 +00:00
Jamie Gritton
fe2f3c651f Regen for new jail system calls in r191673.
Approved by:	bz (mentor)
2009-04-29 21:50:13 +00:00
Jamie Gritton
b38ff370e4 Introduce the extensible jail framework, using the same "name=value"
interface as nmount(2).  Three new system calls are added:
* jail_set, to create jails and change the parameters of existing jails.
  This replaces jail(2).
* jail_get, to read the parameters of existing jails.  This replaces the
  security.jail.list sysctl.
* jail_remove to kill off a jail's processes and remove the jail.
Most jail parameters may now be changed after creation, and jails may be
set to exist without any attached processes.  The current jail(2) system
call still exists, though it is now a stub to jail_set(2).

Approved by:	bz (mentor)
2009-04-29 21:14:15 +00:00
Marko Zec
093f25f8c8 In preparation for turning on options VIMAGE in next commits,
rearrange / replace / adjust several INIT_VNET_* initializer
macros, all of which currently resolve to whitespace.

Reviewed by:	bz (an older version of the patch)
Approved by:	julian (mentor)
2009-04-26 22:06:42 +00:00
Dmitry Chagin
b1121623d2 Remove support for FUTEX_REQUEUE operation.
Glibc does not use this operation since 2.3.3 version (Jun 2004),
as it is racy and replaced by FUTEX_CMP_REQUEUE operation.
Glibc versions prior to 2.3.3 fall back to FUTEX_WAKE when
FUTEX_REQUEUE returned EINVAL.

Any application directly using FUTEX_REQUEUE without return
value checking are definitely broken.

Limit quantity of messages per process about unsupported
operation.

Approved by:	kib (mentor)
MFC after:	1 month
2009-04-19 13:48:42 +00:00
Andrew Thompson
4eae601ebd MFp4 //depot/projects/usb@159909
- make usb2_power_mask_t 16-bit
- remove "usb2_config_sub" structure from "usb2_config". To compensate for this
  "usb2_config" has a new field called "usb_mode" which select for which mode
  the current xfer entry is active. Options are: a) Device mode only b) Host
  mode only (default-by-zero) c) Both modes.  This change was scripted using
  the following sed script: "s/\.mh\././g".
- the standard packet size table in "usb_transfer.c" is now a function, hence
  the code for the function uses less memory than the table itself.

Submitted by:	Hans Petter Selasky
2009-04-05 18:20:38 +00:00
Dmitry Chagin
cd899aad76 Fix KBI breakage by r190520 which affects older linux.ko binaries:
1) Move the new field (brand_note) to the end of the Brandinfo structure.
2) Add a new flag BI_BRAND_NOTE that indicates that the brand_note pointer
   is valid.
3) Use the brand_note field if the flag BI_BRAND_NOTE is set and as old
   modules won't have the flag set, so the new field brand_note would be
   ignored.

Suggested by:	jhb
Reviewed by:	jhb
Approved by:	kib (mentor)
MFC after:	6 days
2009-04-05 09:27:19 +00:00
Konstantin Belousov
9da6804056 Regen 2009-04-01 13:12:40 +00:00
Konstantin Belousov
088b38ac84 Rename implementation function for freebsd32 sysarch(2) to allow for
the arguments translations. Provide ABI-compatible definition of the
struct i386_ldt_args for freebsd32 compat layer.

In collaboration with:	pho
Reviewed by:	jhb
2009-04-01 13:11:50 +00:00
Konstantin Belousov
0cdf4ffabc Add all segment registers for the amd64 CPU to struct reg and mcontext.
To keep these structures ABI-compatible, half the size of r_trapno,
r_err, mc_trapno, mc_flags.

Add fsbase and gsbase to mcontext on both amd64 and i386.
Add flags to amd64 mcontext to indicate that it contains valid segments
or bases.

In collaboration with:	pho
Discussed with:	peter
Reviewed by:	jhb
2009-04-01 12:44:17 +00:00
Ed Schouten
7eac36ed3c Emulate the FIODGNAME ioctl in our 32-bit emulator.
It's quite strange that nobody reported this issue before. It turns out
functions like ttyname(), ptsname() and fdevname() don't work in
compat32. This means it't not even possible to run applications like
script(1) inside a 32-bit FreeBSD jail.

Fix this by converting 32-bit fiodgname_arg structures to their 64-bit
equivalent.

Reported by:	kris
Tested by:	kris
2009-03-29 20:09:51 +00:00
Jamie Gritton
8571af59e5 Whitespace/spelling fixes in advance of upcoming functional changes.
Approved by:	bz (mentor)
2009-03-27 13:13:59 +00:00
Doug Ambrisko
d2b2128a28 Add stuff to support upcoming BMC/IPMI flashing of newer Dell machine
via the Linux tool.
     -  Add Linux shim to ipmi(4)
     -  Create a partitions file to linprocfs to make Linux fdisk see
        disks.  This file is dynamic so we can see disks come and go.
     -  Convert msdosfs to vfat in mtab since Linux uses that for
        msdosfs.
     -  In the Linux mount path convert vfat passed in to msdosfs
        so Linux mount works on FreeBSD.  Note that tasting works
        so that if da0 is a msdos file system
                /compat/linux/bin/mount /dev/da0 /mnt
        works.
     -  fix a 64it bug for l_off_t.
Grabing sh, mount, fdisk, df from Linux, creating a symlink of mtab to
/compat/linux/etc/mtab and then some careful unpacking of the Linux bmc
update tool and hacking makes it work on newer Dell boxes.  Note, probably
if you can't figure out how to do this, then you probably shouldn't be
doing it :-)
2009-03-26 17:14:22 +00:00
Weongyo Jeong
577b9fa3f8 Some NDIS USB drivers try to call URB funcs like URB_FUNCTION_VENDOR_xxx
or URB_FUNCTION_CLASS_xxx with HAL preemption lock that means it's
non-sleepable during USB requests though usb2_do_request() requires a
sleep so it needs to send queries to the default pipe without those
interfaces to avoid sleep.
2009-03-18 02:38:35 +00:00
Weongyo Jeong
08e06b60c1 If the caller sets irp_usriostat or irp_usrevent it try to process it
whatever the IRP flag is because some drivers (eg. RTL8187L NDIS driver)
call IoCompleteRequest() without setting flags.  It will prevent waiting
a event forever at attach.
2009-03-18 01:57:54 +00:00
Konstantin Belousov
3ff063577b Supply AT_EXECPATH auxinfo entry to the interpreter, both for native and
compat32 binaries.

Tested by:	pho
Reviewed by:	kan
2009-03-17 12:53:28 +00:00
Weongyo Jeong
8d3fb5e2bc grab NDIS USB lock instead of HAL preemption. This change should be
happened in the previous.
2009-03-17 05:57:43 +00:00
Weongyo Jeong
d3947a869c use usb2_desc_foreach() to iterate the USB config descriptor instread of
accessing structures directly to check some invalid descriptors.

Pointed by:	hps
2009-03-16 11:19:07 +00:00
Dmitry Chagin
731aded86d Sort include files in the alphabetical order.
Approved by:	kib (mentor)
MFC after:	2 weeks
2009-03-16 05:39:37 +00:00
Dmitry Chagin
b41a7787e1 Ignore FUTEX_FD op, as it is done by linux.
Approved by:	kib (mentor)
MFC after:	2 weeks
2009-03-15 19:38:34 +00:00
Dmitry Chagin
3b8cbbded3 Include linux_futex.h before linux_emul.h
Approved by:	kib (mentor)
MFC after:	6 days
2009-03-15 19:16:12 +00:00
Dmitry Chagin
32c01de21c Implement new way of branding ELF binaries by looking to a
".note.ABI-tag" section.

The search order of a brand is changed, now first of all the
".note.ABI-tag" is looked through.

Move code which fetch osreldate for ELF binary to check_note() handler.

PR:		118473
Approved by:	kib (mentor)
2009-03-13 16:40:51 +00:00
Weongyo Jeong
2c964f43b6 o change a lock model based on HAL preemption lock to a normal mtx.
Based on the HAL preemption lock there is a problem on SMP machines
  and causes a panic.
o When a device detached the current tactic to detach NDIS USB driver is
  to call SURPRISE_REMOVED event.  So it don't need to call
  ndis_halt_nic() again.  This fixes some page faults when some drivers
  work abnormal.
o it assumes now that URB_FUNCTION_BULK_OR_INTERRUPT_TRANSFER is in
  DISPATCH_LEVEL (non-sleepable) and as further work
  URB_FUNCTION_VENDOR_XXX and URB_FUNCTION_CLASS_XXX should be.

Reviewed by:	Hans Petter Selasky <hselasky_at_freebsd.org>
Tested by:	Paul B. Mahol <onemda_at_gmail.com>
2009-03-12 02:51:55 +00:00
Weongyo Jeong
6affafd098 o port NDIS USB support from USB1 to the new usb(USB2).
o implement URB_FUNCTION_ABORT_PIPE handling.
o remove unused code related with canceling the timer list for USB
  drivers.
o whitespace cleanup and style(9)

Obtained from:	hps's original patch
2009-03-07 07:26:22 +00:00
John Baldwin
2ee8325f42 A better fix for handling different FPU initial control words for different
ABIs:
- Store the FPU initial control word in the pcb for each thread.
- When first using the FPU, load the initial control word after restoring
  the clean state if it is not the standard control word.
- Provide a correct control word for Linux/i386 binaries under
  FreeBSD/amd64.
- Adjust the control word returned for fpugetregs()/npxgetregs() when a
  thread hasn't used the FPU yet to reflect the real initial control
  word for the current ABI.
- The Linux/i386 ABI for FreeBSD/i386 now properly sets the right control
  word instead of trashing whatever the current state of the FPU is.

Reviewed by:	bde
2009-03-05 19:42:11 +00:00
Dmitry Chagin
4d7c2e8a48 Add AT_PLATFORM, AT_HWCAP and AT_CLKTCK auxiliary vector entries which
are used by glibc. This silents the message "2.4+ kernel w/o ELF notes?"
from some programs at start, among them are top and pkill.

Do the assignment of the vector entries in elf_linux_fixup()
as it is done in glibc.

Fix some minor style issues.

Submitted by:	Marcin Cieslak <saper at SYSTEM PL>
Approved by:	kib (mentor)
MFC after:	1 week
2009-03-04 12:14:33 +00:00
Jamie Gritton
f86bce5ed0 Extend the "vfsopt" mount options for more general use. Make struct
vfsopt and the vfs_buildopts function public, and add some new fields
to struct vfsopt (pos and seen), and new functions vfs_getopt_pos and
vfs_opterror.

Further extend the interface to allow reading options from the kernel
in addition to sending them to the kernel, with vfs_setopt and related
functions.

While this allows the "name=value" option interface to be used for more
than just FS mounts (planned use is for jails), it retains the current
"vfsopt" name and <sys/mount.h> requirement.

Approved by:	bz (mentor)
2009-03-02 23:26:30 +00:00
Bjoern A. Zeeb
33553d6e99 For all files including net/vnet.h directly include opt_route.h and
net/route.h.

Remove the hidden include of opt_route.h and net/route.h from net/vnet.h.

We need to make sure that both opt_route.h and net/route.h are included
before net/vnet.h because of the way MRT figures out the number of FIBs
from the kernel option. If we do not, we end up with the default number
of 1 when including net/vnet.h and array sizes are wrong.

This does not change the list of files which depend on opt_route.h
but we can identify them now more easily.
2009-02-27 14:12:05 +00:00
Roman Divacky
af83f5d77c Change the functions to ANSI in those cases where it breaks promotion
to int rule. See ISO C Standard: SS6.7.5.3:15.

Approved by:	kib (mentor)
Reviewed by:	warner
Tested by:	silence on -current
2009-02-24 18:09:31 +00:00
Andrew Thompson
3975e3a1ea Move usb to a graveyard location under sys/legacy/dev, it is intended that the
new USB2 stack will fully replace this for 8.0.

Remove kernel modules, a subsequent commit will update conf/files. Unhook
usbdevs from the build.
2009-02-23 18:16:17 +00:00
Ed Schouten
0eee862a54 Don't make Linux stat() open character devices to resolve its name.
The existing code calls kern_open() to resolve the vnode of a pathname
right after a stat(). This is not correct, because it causes random
character devices to be opened in /dev. This means ls'ing a tape
streamer will cause it to rewind, for example. Changes I have made:

- Add kern_statat_vnhook() to allow binary emulators to `post-process'
  struct stat, using the proper vnode.

- Remove unneeded printf's from stat() and statfs().

- Make the Linuxolator use kern_statat_vnhook(), replacing
  translate_path_major_minor_at().

- Let translate_fd_major_minor() use vp->v_rdev instead of
  vp->v_un.vu_cdev.

Result:

	crw-rw-rw- 1 root root   0, 14 Feb 20 13:54 /dev/ptmx
	crw--w---- 1 root adm  136,  0 Feb 20 14:03 /dev/pts/0
	crw--w---- 1 root adm  136,  1 Feb 20 14:02 /dev/pts/1
	crw--w---- 1 ed   tty  136,  2 Feb 20 14:03 /dev/pts/2

Before this commit, ptmx also had a major number of 136, because it
silently allocated and deallocated a pseudo-terminal. Device nodes that
cannot be opened now have proper major/minor-numbers.

Reviewed by:	kib, netchild, rdivacky (thanks!)
2009-02-20 13:05:29 +00:00
John Baldwin
ea77ff0a15 Use shared vnode locks when invoking VOP_READDIR().
MFC after:	1 month
2009-02-13 18:18:14 +00:00
John Baldwin
3d744ccdd2 Fix a bug in the previous change to the mtab handler: use the path returned
by vn_fullpath() when vn_fullpath() succeeds instead of when it fails.

Submitted by:	Artem Belevich  fbsdlist of src.cx
MFC after:	3 days
2009-02-13 15:32:03 +00:00
Alexander Leidinger
0134f12f7a Fix an edge-case of the linux readdir: We need the size of a linux dirent
structure, not the size of a pointer to it.

PR:		131099
Submitted by:	Andreas Kies <andikies@gmail.com>
MFC after:	2 weeks
2009-02-13 11:55:19 +00:00
David E. O'Brien
e6493bbebf Change some movl's to mov's. Newer GAS no longer accept 'movl' instructions
for moving between a segment register and a 32-bit memory location.

Looked at by:	jhb
2009-01-31 11:37:21 +00:00
Ed Schouten
a4611ab612 Last step of splitting up minor and unit numbers: remove minor().
Inside the kernel, the minor() function was responsible for obtaining
the device minor number of a character device. Because we made device
numbers dynamically allocated and independent of the unit number passed
to make_dev() a long time ago, it was actually a misnomer. If you really
want to obtain the device number, you should use dev2udev().

We already converted all the drivers to use dev2unit() to obtain the
device unit number, which is still used by a lot of drivers. I've
noticed not a single driver passes NULL to dev2unit(). Even if they
would, its behaviour would make little sense. This is why I've removed
the NULL check.

Ths commit removes minor(), minor2unit() and unit2minor() from the
kernel. Because there was a naming collision with uminor(), we can
rename umajor() and uminor() back to major() and minor(). This means
that the makedev(3) manual page also applies to kernel space code now.

I suspect umajor() and uminor() isn't used that often in external code,
but to make it easier for other parties to port their code, I've
increased __FreeBSD_version to 800062.
2009-01-28 17:57:16 +00:00
Jung-uk Kim
34fe89473f Replace couple of strcmp(cpu_vendor, "foo") with cpu_vendor_id for i386
and hide i386-specific code under #ifdef.
2009-01-22 17:06:33 +00:00