490 Commits

Author SHA1 Message Date
jeff
9e2412596d MFC Rev 1.21
VFS SMP fixes, stack api, softupdates fixes.

Sponsored by:	Isilon Systems, Inc.
Approved by:	re (scottl)
2006-03-13 03:04:04 +00:00
delphij
19aa2b762e MFC revision 1.131
date: 2005/12/13 15:32:52;  author: delphij;  state: Exp;  lines: +5 -5
In Linux, kernel parameters passed to ioctl are by value, while in FreeBSD
they are passed by reference.  Handle the difference within the
linux_ioctl_termio on the LINUX_TCFLSH path.

Submitted by:   Jaroslav Drzik <jaro_AT_coop-voz_dot_sk>
Reminded by:	glebius
2006-01-11 15:40:00 +00:00
glebius
df3c4d7b56 MFC 1.62:
Add \n to log() message.
2006-01-10 10:13:43 +00:00
jhb
c240b8f56c MFC: Remove linux_mib_destroy() since MTX_SYSINIT's gaining of a SYSUNINIT
that called mtx_destroy() made it obsolete.
2005-12-22 21:25:20 +00:00
glebius
5b6819c0d6 MFC 1.26:
Suppress logging about unimplemented syscalls to one time per process. This
  prevents hard flood of the system console.

  Reviewed by:	bde
2005-12-19 17:06:51 +00:00
rodrigc
05573dfb34 MFC 1.29, 1.30:
Rewrite linux_ifconf() to be more like ifconf() in net/if.c
  so that we do not call uiomove() while IFNET_RLOCK() is held.
  This eliminates the witness warning:

  Calling uiomove() with the following non-sleepable locks held:
  exclusive sleep mutex ifnet r = 0 (0xc096dd60) locked @
  /usr/src/sys/modules/linux/../../compat/linux/linux_ioctl.c:2170

Approved by:	re (scottl)
2005-09-02 03:52:28 +00:00
rwatson
e13b2df854 Merge linux_ioctl.c:1.128 svr4_sockio.c:1.17 altq_cbq.c:1.3 if_oltr.c:1.38
if_pflog.c:1.14 if_pfsync.c:1.21 if_an.c:1.70 if_ar.c:1.72 if_arl.c:1.11
amrr.c:1.10 onoe.c:1.10 if_ath.c:1.101 awi.c:1.41 if_bfe.c:1.27
if_bge.c:1.93 if_cm_isa.c:1.7 smc90cx6.c:1.16 if_cnw.c:1.20 if_cp.c:1.25
if_cs.c:1.42 if_ct.c:1.26 if_cx.c:1.46 if_ed.c:1.256 if_em.c:1.68
if_en_pci.c:1.37 midway.c:1.66 if_ep.c:1.143 if_ex.c:1.58 if_fatm.c:1.20
if_fe.c:1.93 if_fwe.c:1.38 if_fwip.c:1.8 if_fxp.c:1.244 if_gem.c:1.33
if_hatm.c:1.25 if_hatm_intr.c:1.20 if_hatm_ioctl.c:1.13 if_hatm_rx.c:1.10
if_hatm_tx.c:1.14 if_hme.c:1.39 if_ie.c:1.104 if_ndis.c:1.101
if_ic.c:1.24 if_ipw.c:1.10 if_iwi.c:1.10 if_ixgb.c:1.13 if_lge.c:1.41
if_lnc.c:1.113 if_my.c:1.31 if_nge.c:1.77 if_nve.c:1.10 if_owi.c:1.12
if_patm.c:1.9 if_patm_intr.c:1.6 if_patm_ioctl.c:1.10 if_patm_tx.c:1.10
pdq_ifsubr.c:1.28 if_plip.c:1.38 if_ral.c:1.12 if_ral_pci.c:1.2
if_ray.c:1.81 if_rayvar.h:1.22 if_re.c:1.49 if_sbni.c:1.21 if_sbsh.c:1.14
if_sn.c:1.48 dp83932.c:1.21 if_snc_pccard.c:1.9 if_sr.c:1.70 if_tx.c:1.91
if_txp.c:1.33 if_aue.c:1.92 if_axe.c:1.32 if_cdce.c:1.8 if_cue.c:1.59
if_kue.c:1.66 if_rue.c:1.23 if_udav.c:1.16 if_ural.c:1.12 if_vge.c:1.16
if_vx.c:1.58 if_wi.c:1.185 if_wi_pci.c:1.26 if_wl.c:1.68 if_xe.c:1.60
if_xe_pccard.c:1.30 if_el.c:1.68 i4b_ipr.c:1.35 i4b_isppp.c:1.31
kern_poll.c:1.20 bridge.c:1.94 bridgestp.c:1.4 if_arcsubr.c:1.27
if_atm.h:1.24 if_atmsubr.c:1.40 if_bridge.c:1.16 if_ef.c:1.35
if_ethersubr.c:1.196 if_faith.c:1.37 if_fddisubr.c:1.100 if_fwsubr.c:1.14
if_gif.c:1.54 if_gre.c:1.34 if_iso88025subr.c:1.70 if_loop.c:1.107
if_ppp.c:1.106 if_spppsubr.c:1.121 if_tap.c:1.57 if_tun.c:1.154
if_vlan.c:1.80 ppp_tty.c:1.67 ieee80211_ioctl.c:1.32 atm_if.c:1.31
ng_eiface.c:1.33 ng_ether.c:1.50 ng_fec.c:1.19 ng_iface.c:1.44
ng_sppp.c:1.9 ip_carp.c:1.30 ip_fastfwd.c:1.30 in6.c:1.53 nd6_nbr.c:1.31
natm.c:1.40 if_dc.c:1.162 if_de.c:1.168 if_pcn.c:1.72 if_rl.c:1.154
if_sf.c:1.84 if_sis.c:1.135 if_sk.c:1.108 if_ste.c:1.86 if_ti.c:1.109
if_tl.c:1.101 if_vr.c:1.106 if_wb.c:1.81 if_xl.c:1.194 from HEAD to
RELENG_6:

  Propagate rename of IFF_OACTIVE and IFF_RUNNING to IFF_DRV_OACTIVE and
  IFF_DRV_RUNNING, as well as the move from ifnet.if_flags to
  ifnet.if_drv_flags.  Device drivers are now responsible for
  synchronizing access to these flags, as they are in if_drv_flags.  This
  helps prevent races between the network stack and device driver in
  maintaining the interface flags field.

  Many __FreeBSD__ and __FreeBSD_version checks maintained and continued;
  some less so.

  Reviewed by:    pjd, bz

Approved by:	re (scottl)
2005-08-25 05:01:24 +00:00
jhb
d7eebc79f5 Add Giant around linux_getcwd_common() in linux_getcwd().
Approved by:	re (scottl)
2005-07-09 12:34:49 +00:00
jhb
8816876fa9 Add missing locking to linux_connect() so that it can be marked MP safe:
- Conditionally grab Giant around the EISCONN hack at the end based on
  debug.mpsafenet.
- Protect access to so_emuldata via SOCK_LOCK.

Reviewed by:	rwatson
Approved by:	re (scottl)
2005-07-09 12:26:22 +00:00
jhb
d7828dd231 Fix the computation of uptime for linux_sysinfo(). Before it was returning
the uptime in seconds mod 60 which wasn't very useful.

Approved by:	re (scottl)
2005-07-07 19:17:55 +00:00
pjd
a99a8a69bd Actually only protect mount-point if security.jail.enforce_statfs is set to 2.
If we don't return statistics about requested file systems, system tools
may not work correctly or at all.

Approved by:	re (scottl)
2005-06-23 22:13:29 +00:00
pjd
47f442bcb9 Rename sysctl security.jail.getfsstatroot_only to security.jail.enforce_statfs
and extend its functionality:

value	policy
0	show all mount-points without any restrictions
1	show only mount-points below jail's chroot and show only part of the
	mount-point's path (if jail's chroot directory is /jails/foo and
	mount-point is /jails/foo/usr/home only /usr/home will be shown)
2	show only mount-point where jail's chroot directory is placed.

Default value is 2.

Discussed with:	rwatson
2005-06-09 18:49:19 +00:00
sobomax
307c6bb149 Properly convert FreeBSD priority values into Linux values in the
getpriority(2) syscall.

PR:		kern/81951
Submitted by:	Andriy Gapon <avg@icyb.net.ua>
2005-06-08 20:41:28 +00:00
pjd
311d5e1182 Remove (now) unused argument 'td' from bsd_to_linux_statfs(). 2005-05-27 19:25:39 +00:00
pjd
abf2289872 The code is under '#ifdef not_that_way', but anyway:
- Add missing prison_check_mount() check.
2005-05-22 22:30:31 +00:00
pjd
a6e0e217b2 If we need to hide fsid, kern_statfs()/kern_fstatfs() will do it for us,
so do not duplicate the code in cvtstatfs().
Note, that we now need to clear fsid in freebsd4_getfsstat().

This moves all security related checks from functions like cvtstatfs()
and will allow to add more security related stuff (like statfs(2), etc.
protection for jails) a bit easier.
2005-05-22 21:52:30 +00:00
jeff
f869be5c72 - Pass the ISOPEN flag to namei so filesystems will know we're about to
open them or otherwise access the data.
2005-04-27 09:05:19 +00:00
jeff
afab3762a0 - Change all filesystems and vfs_cache to relock the dvp once the child is
locked in the ISDOTDOT case.  Se vfs_lookup.c r1.79 for details.

Sponsored by:	Isilon Systems, Inc.
2005-04-13 10:59:09 +00:00
mdodd
91ee5f450f Implement SOUND_MIXER_INFO ioctl in compat layer. 2005-04-13 04:33:06 +00:00
mdodd
6f940cf20f Add support for O_NOFOLLOW and O_DIRECT to Linux fcntl() F_GETFL/F_SETFL. 2005-04-13 04:31:43 +00:00
jhb
a3c6b782c3 - Change the vm_mmap() function to accept an objtype_t parameter specifying
the type of object represented by the handle argument.
- Allow vm_mmap() to map device memory via cdev objects in addition to
  vnodes and anonymous memory.  Note that mmaping a cdev directly does not
  currently perform any MAC checks like mapping a vnode does.
- Unbreak the DRM getbufs ioctl by having it call vm_mmap() directly on the
  cdev the ioctl is acting on rather than trying to find a suitable vnode
  to map from.

Reviewed by:	alc, arch@
2005-04-01 20:00:11 +00:00
jeff
bcbda3d771 - Initial cn_lkflags to LK_EXCLUSIVE.
Sponsored by:	Isilon Systems, Inc.
2005-03-29 10:16:12 +00:00
brooks
f16c448930 Use the CTASSERT() macro instead of rolling my own, non-portable one
using #error.

Suggested by:	jhb
2005-03-24 19:26:50 +00:00
brooks
b25337dcb4 Compile errors are way more useful then panics later.
Replace a KASSERT of LINUX_IFNAMSIZ == IFNAMSIZ with a preprocessor
check and #error message.  This will prevent nasty suprises if users
change IFNAMSIZ without updating the linux code appropriatly.
2005-03-24 17:51:15 +00:00
das
fbf7a9b2ee Reject packets larger than IP_MAXPACKET in linux_sendto() for sockets
with the IP_HDRINCL option set.  Without this change, a Linux process
with access to a raw socket could cause a kernel panic.  Raw sockets
must be created by root, and are generally not consigned to untrusted
applications; hence, the security implications of this bug are
minimal.  I believe this only affects 6-CURRENT on or after 2005-01-30.

Found by:	Coverity Prevent analysis tool
Security:	Local DOS
2005-03-23 08:28:00 +00:00
phk
9cea99e06b Neuter the duplicated disk-device magic code for now. Somebody with
serious linux-clue is necessary to fix this properly.
2005-03-15 11:58:40 +00:00
sobomax
b795e2430a Add kernel-only flag MSG_NOSIGNAL to be used in emulation layers to surpress
SIGPIPE signal for the duration of the sento-family syscalls. Use it to
replace previously added hack in Linux layer based on temporarily setting
SO_NOSIGPIPE flag.

Suggested by:	alfred
2005-03-08 16:11:41 +00:00
sobomax
a5d845fec6 Handle MSG_NOSIGNAL flag in linux_send() by setting SO_NOSIGPIPE on socket
for the duration of the send() call. Such approach may be less than ideal
in threading environment, when several threads share the same socket and it
might happen that several of them are calling linux_send() at the same time
with and without SO_NOSIGPIPE set.

However, such race condition is very unlikely in practice, therefore this
change provides practical improvement compared to the previous behaviour.

PR:		kern/76426
Submitted by:	Steven Hartland <killing@multiplay.co.uk>
MFC after:	3 days
2005-03-07 07:26:42 +00:00
sobomax
f706f4bce8 Handle unimplemented syscall by instantly returning ENOSYS instead of sending
signal first and only then returning ENOSYS to match what real linux does.

PR:		kern/74302
Submitted by:	Travis Poppe <tlp@LiquidX.org>
2005-03-07 00:18:06 +00:00
jhb
407a285a6f Remove linux_emul_find() and the CHECKALT*() macros as they are no longer
used.
2005-03-01 17:57:45 +00:00
phk
5bbf7f6810 Neuter linux_ustat() until somebody finds time to try to fix it.
The fundamental problem is that we get only the lower 8 bits of the
minor device number so there is no guarantee that we can actually
find the disk device in question at all.

This was probably a bigger issue pre-GEOM where the upper bits
signaled which slice were in use.

The secondary problem is how we get from (partial) dev_t to vnode.

The correct implementation will involve traversing the mount list
looking for a perfect match or a possible match (for truncated
minor).
2005-02-22 13:39:46 +00:00
njl
25c48f7867 Unbreak the kernel build. Pointy hat to: sobomax. 2005-02-13 19:50:57 +00:00
sobomax
52ae2ac0b9 Backout previous change (disabling of security checks for signals delivered
in emulation layers), since it appears to be too broad.

Requested by:   rwatson
2005-02-13 17:37:20 +00:00
sobomax
1d558007d0 Split out kill(2) syscall service routine into user-level and kernel part, the
former is callable from user space and the latter from the kernel one. Make
kernel version take additional argument which tells if the respective call
should check for additional restrictions for sending signals to suid/sugid
applications or not.

Make all emulation layers using non-checked version, since signal numbers in
emulation layers can have different meaning that in native mode and such
protection can cause misbehaviour.

As a result remove LIBTHR from the signals allowed to be delivered to a
suid/sugid application.

Requested (sorta) by:	rwatson
MFC after:	2 weeks
2005-02-13 16:42:08 +00:00
sobomax
22b03e0f5d Semctl with IPC_STAT command should return zero in case of success.
PR:		73778
Submitted by:	Andriy Gapon <avg@icyb.net.ua>
MFC after:	2 weeks
2005-02-11 13:46:55 +00:00
jhb
3c3db95194 - Use kern_{l,f,}stat() and kern_{f,}statfs() functions rather than
duplicating the contents of the same functions inline.
- Consolidate common code to convert a BSD statfs struct to a Linux struct
  into a static worker function.
2005-02-07 18:47:28 +00:00
jhb
6fab308776 Make linux_emul_convpath() a simple wrapper for kern_alternate_path(). 2005-02-07 18:46:05 +00:00
jhb
71c05d27c0 - Tweak kern_msgctl() to return a copy of the requested message queue id
structure in the struct pointed to by the 3rd argument for IPC_STAT and
  get rid of the 4th argument.  The old way returned a pointer into the
  kernel array that the calling function would then access afterwards
  without holding the appropriate locks and doing non-lock-safe things like
  copyout() with the data anyways.  This change removes that unsafeness and
  resulting race conditions as well as simplifying the interface.
- Implement kern_foo wrappers for stat(), lstat(), fstat(), statfs(),
  fstatfs(), and fhstatfs().  Use these wrappers to cut out a lot of
  code duplication for freebsd4 and netbsd compatability system calls.
- Add a new lookup function kern_alternate_path() that looks up a filename
  under an alternate prefix and determines which filename should be used.
  This is basically a more general version of linux_emul_convpath() that
  can be shared by all the ABIs thus allowing for further reduction of
  code duplication.
2005-02-07 18:44:55 +00:00
jhb
6e2f7d4c8e Use kern_setitimer() to implement linux_alarm() instead of fondling the
real interval timer directly.
2005-02-07 18:36:21 +00:00
sobomax
69aa6843ef Boot away another stackgap (one of the lest ones in linuxlator/i386) by
providing special version of CDIOCREADSUBCHANNEL ioctl(), which assumes that
result has to be placed into kernel space not user space. In the long run
more generic solution has to be designed WRT emulating various ioctl()s
that operate on userspace buffers, but right now there is only one such
ioctl() is emulated, so that it makes little sense.

MFC after:	2 weeks
2005-01-30 08:12:37 +00:00
sobomax
68d0bd2186 Extend kern_sendit() to take another enum uio_seg argument, which specifies
where the buffer to send lies and use it to eliminate yet another stackgap
in linuxlator.

MFC after:	2 weeks
2005-01-30 07:20:36 +00:00
sobomax
896df27c1a Split out kernel side of msgctl(2) into two parts: the first that pops data
from the userland and pushes results back and the second which does
actual processing. Use the latter to eliminate stackgap in the linux wrapper
of that syscall.

MFC after:      2 weeks
2005-01-26 00:46:36 +00:00
sobomax
35611d3699 Split out kernel side of {get,set}itimer(2) into two parts: the first that
pops data from the userland and pushes results back and the second which does
actual processing. Use the latter to eliminate stackgap in the linux wrappers
of those syscalls.

MFC after:	2 weeks
2005-01-25 21:28:28 +00:00
obrien
98e2482a94 Match the LINUX32's style with existing style
Submitted by:	Jung-uk Kim <jkim@niksun.com>

Use positive, not negative logic.
2005-01-14 04:44:56 +00:00
obrien
98c3a8a894 Fix Linux compat 'uname -m' on AMD64.
Submitted by:	Jung-uk Kim <jkim@niksun.com>
		(patch reworked by me)
2005-01-14 03:45:26 +00:00
imp
362fcfc1e2 Start each of the license/copyright comments with /*- 2005-01-05 22:34:37 +00:00
phk
b0e48f2258 Do not blindly pass linux filesystem specific mount data across. 2004-12-03 18:14:22 +00:00
phk
e2512dff3e Ignore MNT_NODEV option, it is implicit in choice of filesystem. 2004-11-26 07:39:20 +00:00
dwmalone
d52e344f9f Rename thread args to be called "td" rather than "p" to be
consistent with other bits of this file. There should be no
functional change.

Submitted by:	Andrea Campi (many moons ago)
MFC after:	2 month
2004-10-10 18:34:30 +00:00
jhb
ce2d3f89af Rework how we store process times in the kernel such that we always store
the raw values including for child process statistics and only compute the
system and user timevals on demand.

- Fix the various kern_wait() syscall wrappers to only pass in a rusage
  pointer if they are going to use the result.
- Add a kern_getrusage() function for the ABI syscalls to use so that they
  don't have to play stackgap games to call getrusage().
- Fix the svr4_sys_times() syscall to just call calcru() to calculate the
  times it needs rather than calling getrusage() twice with associated
  stackgap, etc.
- Add a new rusage_ext structure to store raw time stats such as tick counts
  for user, system, and interrupt time as well as a bintime of the total
  runtime.  A new p_rux field in struct proc replaces the same inline fields
  from struct proc (i.e. p_[isu]ticks, p_[isu]u, and p_runtime).  A new p_crux
  field in struct proc contains the "raw" child time usage statistics.
  ruadd() has been changed to handle adding the associated rusage_ext
  structures as well as the values in rusage.  Effectively, the values in
  rusage_ext replace the ru_utime and ru_stime values in struct rusage.  These
  two fields in struct rusage are no longer used in the kernel.
- calcru() has been split into a static worker function calcru1() that
  calculates appropriate timevals for user and system time as well as updating
  the rux_[isu]u fields of a passed in rusage_ext structure.  calcru() uses a
  copy of the process' p_rux structure to compute the timevals after updating
  the runtime appropriately if any of the threads in that process are
  currently executing.  It also now only locks sched_lock internally while
  doing the rux_runtime fixup.  calcru() now only requires the caller to
  hold the proc lock and calcru1() only requires the proc lock internally.
  calcru() also no longer allows callers to ask for an interrupt timeval
  since none of them actually did.
- calcru() now correctly handles threads executing on other CPUs.
- A new calccru() function computes the child system and user timevals by
  calling calcru1() on p_crux.  Note that this means that any code that wants
  child times must now call this function rather than reading from p_cru
  directly.  This function also requires the proc lock.
- This finishes the locking for rusage and friends so some of the Giant locks
  in exit1() and kern_wait() are now gone.
- The locking in ttyinfo() has been tweaked so that a shared lock of the
  proctree lock is used to protect the process group rather than the process
  group lock.  By holding this lock until the end of the function we now
  ensure that the process/thread that we pick to dump info about will no
  longer vanish while we are trying to output its info to the console.

Submitted by:	bde (mostly)
MFC after:	1 month
2004-10-05 18:51:11 +00:00