Commit Graph

1802 Commits

Author SHA1 Message Date
John Baldwin
06ad42b2f7 Close some races between procfs/ptrace and exit(2):
- Reorder the events in exit(2) slightly so that we trigger the S_EXIT
  stop event earlier.  After we have signalled that, we set P_WEXIT and
  then wait for any processes with a hold on the vmspace via PHOLD to
  release it.  PHOLD now KASSERT()'s that P_WEXIT is clear when it is
  invoked, and PRELE now does a wakeup if P_WEXIT is set and p_lock drops
  to zero.
- Change proc_rwmem() to require that the processing read from has its
  vmspace held via PHOLD by the caller and get rid of all the junk to
  screw around with the vmspace reference count as we no longer need it.
- In ptrace() and pseudofs(), treat a process with P_WEXIT set as if it
  doesn't exist.
- Only do one PHOLD in kern_ptrace() now, and do it earlier so it covers
  FIX_SSTEP() (since on alpha at least this can end up calling proc_rwmem()
  to clear an earlier single-step simualted via a breakpoint).  We only
  do one to avoid races.  Also, by making the EINVAL error for unknown
  requests be part of the default: case in the switch, the various
  switch cases can now just break out to return which removes a _lot_ of
  duplicated PRELE and proc unlocks, etc.  Also, it fixes at least one bug
  where a LWP ptrace command could return EINVAL with the proc lock still
  held.
- Changed the locking for ptrace_single_step(), ptrace_set_pc(), and
  ptrace_clear_single_step() to always be called with the proc lock
  held (it was a mixed bag previously).  Alpha and arm have to drop
  the lock while the mess around with breakpoints, but other archs
  avoid extra lock release/acquires in ptrace().  I did have to fix a
  couple of other consumers in kern_kse and a few other places to
  hold the proc lock and PHOLD.

Tested by:	ps (1 mostly, but some bits of 2-4 as well)
MFC after:	1 week
2006-02-22 18:57:50 +00:00
John Baldwin
f8e3eeb519 Change pfs_visible() to optionally return a pointer to the process
associated with the passed in pfs_node.  If it does return a pointer, it
keeps the process locked.  This allows a lot of places that were calling
pfind() again right after pfs_visible() to not have to do that and avoids
races since we don't drop the proc lock just to turn around and lock it
again.  This will become more important with future changes to fix races
between procfs/ptrace and exit(2).  Also, removed a duplicate pfs_visible()
call in pfs_getextattr().

Reviewed by:	des
MFC after:	1 week
2006-02-22 17:24:54 +00:00
John Baldwin
7a61c1a3cb Hold the proc lock while calling proc_sstep() since the function asserts
it and remove a PRELE() that didn't have a matching PHOLD().  The calling
code already has a PHOLD anyway.

MFC after:	1 week
2006-02-22 17:20:37 +00:00
Jeff Roberson
f50b03bfd6 - We must hold a reference to a vnode before calling vgone() otherwise
it may not be removed from the freelist.

MFC After:	1 week
Found by:	kris
2006-02-22 09:05:40 +00:00
Jeff Roberson
f5cacb3964 - spell VOP_LOCK(vp, LK_RELEASE... VOP_UNLOCK(vp,... so that asserts in
vop_lock_post do not trigger.
 - Rearrange null_inactive to null_hashrem earlier so there is no chance
   of finding the null node on the hash list after the locks have been
   switched.
 - We should never have a NULL lowervp in null_reclaim() so there is
   no need to handle this situation.  panic instead.

MFC After:	1 week
2006-02-22 06:17:31 +00:00
Jeff Roberson
9c12e63100 - Assert that the lowervp is locked in null_hashget().
- Simplify the logic dealing with recycled vnodes in null_hashget() and
   null_hashins().  Since we hold the lower node locked in both cases
   the null node can not be undergoing recycling unless reclaim somehow
   called null_nodeget().  The logic that was in place was not safe and
   was essentially dead code.

MFC After:	1 week
2006-02-22 06:15:12 +00:00
Jeff Roberson
578abc8e54 - Deadfs should not use the std GETWRITEMOUNT routine. Add one that always
returns NULL.

MFC After:	1 week
2006-02-22 06:11:59 +00:00
John Baldwin
ccabcacb30 Correctly set MNTK_MPSAFE flag from the lower vnode's mount rather than
always turning it on along with any flags set in the lower mount.

Tested by:	kris
Reviewed by:	jeff
MFC after:	3 days
2006-02-10 18:06:49 +00:00
Jeff Roberson
fbf586bd40 - No need to WANTPARENT when we're just going to vrele it in a deadlock
prone way later.

Reported by:	kkenn
MFC After:	3 days
2006-02-07 11:31:32 +00:00
Will Andrews
937a238777 Make UDF endian-safe.
Submitted by:	Pedro Martelletto <pedro@ambientworks.net> (via scottl)
Tested on:	sparc64
2006-02-03 15:25:52 +00:00
Jeff Roberson
89b0e10910 - Reorder calls to vrele() after calls to vput() when the vrele is a
directory.  vrele() may lock the passed vnode, which in these cases would
   give an invalid lock order of child -> parent.  These situations are
   deadlock prone although do not typically deadlock because the vrele
   is typically not releasing the last reference to the vnode.  Users of
   vrele must consider it as a call to vn_lock() and order it appropriately.

MFC After: 	1 week
Sponsored by:	Isilon Systems, Inc.
Tested by:	kkenn
2006-02-01 00:25:26 +00:00
Jeff Roberson
3b77d80cdd - Remove a stale comment. This function was rewritten to be SMP safe some
time ago.

Sponsored by:	Isilon Systems, Inc.
2006-01-30 08:24:14 +00:00
Tom Rhodes
9fc31f8a5f Update incorrect comments here, there should not be a call to panic()
over fs corruption.

Discussed with:	alfred, phk
2006-01-23 17:45:57 +00:00
Max Khon
710a9accfe Do not assume that `char direntry::deExtension[3]' starts right after
`char direntry::deName[8]' and access deExtension[] explicitly.

Found by:	Coverity Prevent(tm)
CID:		350, 351, 352
2006-01-22 21:09:38 +00:00
Robert Watson
0bdfeca765 Convert last four functions in coda_vnops.c to ANSI C function
declarations.  I knew I would get to fix something in Coda
eventually.

MFC after:	1 week
2006-01-21 19:51:47 +00:00
Alfred Perlstein
92e73f5711 I ran into an nfs client panic a couple of times in a row over the
last few days.  I tracked it down to the fact that nfs_reclaim()
is setting vp->v_data to NULL _before_ calling vnode_destroy_object().
After silence from the mailing list I checked further and discovered
that ufs_reclaim() is unique among FreeBSD filesystems for calling
vnode_destroy_object() early, long before tossing v_data or much
of anything else, for that matter.  The rest, including NFS, appear
to be identical, as if they were just clones of one original routine.

The enclosed patch fixes all file systems in essentially the same
way, by moving the call to vnode_destroy_object() to early in the
routine (before the call to vfs_hash_remove(), if any).  I have
only tested NFS, but I've now run for over eighteen hours with the
patch where I wouldn't get past four or five without it.

Submitted by: Frank Mayhar
Requested by: Mohan Srinivasan
MFC After: 1 week
2006-01-17 17:29:03 +00:00
Tor Egge
82be0a5a24 Add marker vnodes to ensure that all vnodes associated with the mount point are
iterated over when using MNT_VNODE_FOREACH.

Reviewed by:	truckman
2006-01-09 20:42:19 +00:00
Maxim Konovalov
036cd12a8d o Fix typo in the define: s/MRAK_INT_GEN/MARK_INT_GEN/. The typo
was harmless because the define is not used in coda_vfsops.c.

Submitted by:	Hugo Meiland
2006-01-09 18:07:06 +00:00
Maxim Konovalov
98a95f61fa o Typo in the debug message: s/skiped/skipped.
PR:		kern/91346
Submitted by:	Gavin Atkinson
2006-01-05 13:39:23 +00:00
Robert Watson
8f0d99d790 When returning EIO from DEVFSIO_RADD ioctl, drop the exclusive rule
lock.  Otherwise the system comes to a rather sudden and grinding
halt.

MFC after:	1 week
2006-01-03 09:49:10 +00:00
Tom Rhodes
09c00166e4 Make tv_sec a time_t on all platforms but alpha. Brings us more in line with
POSIX.  This also makes the struct correct we ever implement an i386-time64
architecture.  Not that we need too.

Reviewed by:	imp, brooks
Approved by:	njl (acpica), des (no objects, touches procfs)
Tested with:	make universe
2005-12-24 22:22:17 +00:00
Dag-Erling Smørgrav
0430a5e289 Eradicate caddr_t from the VFS API. 2005-12-14 00:49:52 +00:00
Tai-hwa Liang
8bfc230455 Recent nmount(2) adoption in mount_smbfs(8) did not flag the "long" option
since mount_smbfs(8) assumed long name mounting by default unless "-n long"
was explicitly specified.

Rather than supplying a "long" option in mount_smbfs(8), this commit brings
back the original behaviour by associating SMBFS_MOUNT_NO_LONG with the
"nolong" option.  This should fix the broken long file names on smbfs people
observed recently.

Reported by:	Vladimir Grebenschikov <vova at fbsd dot ru>
Reviewed by:	phk
Tested by:	Slawa Olhovchenkov <slw at zxy dot spb dot ru>
2005-12-05 19:05:06 +00:00
Ruslan Ermilov
342ed5d948 Fix -Wundef warnings found when compiling i386 LINT, GENERIC and
custom kernels.
2005-12-05 11:58:35 +00:00
Ruslan Ermilov
3238c6bd33 Fix -Wundef from compiling the amd64 LINT. 2005-12-04 10:06:06 +00:00
Ruslan Ermilov
f4e9888107 Fix -Wundef. 2005-12-04 02:12:43 +00:00
Boris Popov
cc518d3b67 Fix interaction with Windows 2000/XP based servers:
If the complete reply on the TRANS2_FIND_FIRST2 request fits exactly
into one responce packet, then next call to TRANS2_FIND_NEXT2 will return
zero entries and server will close current transaction.  To avoid
subsequent errors we should not perform FIND_CLOSE2 request.

PR:		kern/78953
Submitted by:	Jim Carroll
2005-11-22 07:13:00 +00:00
Craig Rodrigues
d75b2048db Properly parse the nowin95 mount option.
Tested by:	Rainer Hurling <rhurlin at gwdg dot de>
2005-11-19 16:38:39 +00:00
Craig Rodrigues
4ab125739b Add "shortnames" and "longnames" mount options which are
synonyms for "shortname" and "longname" mount options.  The old
(before nmount()) mount_msdosfs program accepted "shortnames" and "longnames",
but the kernel nmount() checked for "shortname" and "longname".
So, make the kernel accept "shortnames", "longnames", "shortname", "longname"
for forwards and backwarsd compatibility.

Discovered by:	Rainer Hurling <rhurlin at gwdg dot de>
2005-11-18 22:34:31 +00:00
Craig Rodrigues
43fa5bf534 - Add errmsg to the list of smbfs mount options.
- Use vfs_mount_error() to propagate smbfs mount errors back to userspace.

Reviewed by:	bp (smbfs maintainer)
2005-11-16 02:26:25 +00:00
Doug White
16e35dcc39 This is a workaround for a complicated issue involving VFS cookies and devfs.
The PR and patch have the details. The ultimate fix requires architectural
changes and clarifications to the VFS API, but this will prevent the system
from panicking when someone does "ls /dev" while running in a shell under the
linuxulator.

This issue affects HEAD and RELENG_6 only.

PR:		88249
Submitted by:	"Devon H. O'Dell" <dodell@ixsystems.com>
MFC after:	3 days
2005-11-09 22:03:50 +00:00
Robert Watson
5bb84bc84b Normalize a significant number of kernel malloc type names:
- Prefer '_' to ' ', as it results in more easily parsed results in
  memory monitoring tools such as vmstat.

- Remove punctuation that is incompatible with using memory type names
  as file names, such as '/' characters.

- Disambiguate some collisions by adding subsystem prefixes to some
  memory types.

- Generally prefer lower case to upper case.

- If the same type is defined in multiple architecture directories,
  attempt to use the same name in additional cases.

Not all instances were caught in this change, so more work is required to
finish this conversion.  Similar changes are required for UMA zone names.
2005-10-31 15:41:29 +00:00
Poul-Henning Kamp
3b72f38b5e Use correct cirteria for determining which directory entries we can
purge right away and which we merely can hide.

Beaten into my skull by:	kris
2005-10-18 20:21:25 +00:00
Dag-Erling Smørgrav
a92fef8afc Implement the full range of ISO9660 number conversion routines in iso.h.
MFC after:	2 weeks
2005-10-18 13:35:08 +00:00
Craig Rodrigues
c583f369a7 Unconditionally mount a CD9660 filesystem as read-only, instead of
returning EROFS if we forget to mount it as read-only.
2005-10-17 03:29:53 +00:00
Craig Rodrigues
b137e1c8ba Use the actual sector size of the media instead of hard-coding it to 2048.
This eliminates KASSERTs in GEOM if we accidentally mount an audio CD
as a cd9660 filesystem.
2005-10-17 03:27:35 +00:00
Craig Rodrigues
073833a420 Unconditionally mount a UDF filesystem as read-only, instead of
returning an EROFS if we forget to mount it as read-only.
2005-10-17 03:07:36 +00:00
Florent Thoumie
86391603da - Fix typo.
Approved by:	ssouhlal
MFC after:	1 week
2005-10-17 00:04:35 +00:00
Don Lewis
8bcc0d3f95 Update nwfs_lookup() to match the current cache_lookup() API.
cache_lookup() has returned a ref'ed and locked vnode since
vfs_cache.c:1.96, dated Tue Mar 29 12:59:06 2005 UTC.  This change
is similar to the change made to smbfs_lookup() in smbfs_vnops.c:1.58.

Tested by:	"Antony Mawer" ant AT mawer.org
MFC after:	2 weeks
2005-10-16 21:54:35 +00:00
Kris Kennaway
3554cddbfa Reflect mpsafety of the underlying filesystem in the nullfs image.
I benchmarked this by simultaneously extracting 4 large tarballs (basically
world images) on a 4-processor AMD64 system, in a malloc-backed md.

With this patch, system time was reduced by 43%, and wall clock time by 33%.

Submitted by:	jeff
MFC after: 	1 week
2005-10-16 21:45:25 +00:00
Don Lewis
d31c91fbcf Apply the same fix to a potential race in the ISDOTDOT code in
cd9660_lookup() that was used to fix an actual race in ufs_lookup.c:1.78.
This is not currently a hazard, but the bug would be activated by
marking cd9660 as MPSAFE.

Requested by:	bde
2005-10-16 21:41:54 +00:00
Yaroslav Tykhiy
10d645b7e5 In preparation for making the modules actually use opt_*.h files
provided in the kernel build directory, fix modules that were
failing to build this way due to not quite correct kernel option
usage.  In particular:

ng_mppc.c uses two complementary options, both of which are listed
in sys/conf/files.  Ideally, there should be a separate option for
including ng_mppc.c in kernel build, but now only
NETGRAPH_MPPC_ENCRYPTION is usable anyway, the other one requires
proprietary files.

nwfs and smbfs were trying to ensure they were built with proper
network components, but the check was rather questionable.

Discussed with:	ru
2005-10-14 23:17:45 +00:00
David Xu
9104847f21 1. Change prototype of trapsignal and sendsig to use ksiginfo_t *, most
changes in MD code are trivial, before this change, trapsignal and
   sendsig use discrete parameters, now they uses member fields of
   ksiginfo_t structure. For sendsig, this change allows us to pass
   POSIX realtime signal value to user code.

2. Remove cpu_thread_siginfo, it is no longer needed because we now always
   generate ksiginfo_t data and feed it to libpthread.

3. Add p_sigqueue to proc structure to hold shared signals which were
   blocked by all threads in the proc.

4. Add td_sigqueue to thread structure to hold all signals delivered to
   thread.

5. i386 and amd64 now return POSIX standard si_code, other arches will
   be fixed.

6. In this sigqueue implementation, pending signal set is kept as before,
   an extra siginfo list holds additional siginfo_t data for signals.
   kernel code uses psignal() still behavior as before, it won't be failed
   even under memory pressure, only exception is when deleting a signal,
   we should call sigqueue_delete to remove signal from sigqueue but
   not SIGDELSET. Current there is no kernel code will deliver a signal
   with additional data, so kernel should be as stable as before,
   a ksiginfo can carry more information, for example, allow signal to
   be delivered but throw away siginfo data if memory is not enough.
   SIGKILL and SIGSTOP have fast path in sigqueue_add, because they can
   not be caught or masked.
   The sigqueue() syscall allows user code to queue a signal to target
   process, if resource is unavailable, EAGAIN will be returned as
   specification said.
   Just before thread exits, signal queue memory will be freed by
   sigqueue_flush.
   Current, all signals are allowed to be queued, not only realtime signals.

Earlier patch reviewed by: jhb, deischen
Tested on: i386, amd64
2005-10-14 12:43:47 +00:00
Craig Rodrigues
a3d7f575c0 - Do not hardcode the bsize to a sectorsize of 2048, even though
the UDF specification specifies a logical sectorsize of 2048.
  Instead, get it from GEOM.
- When reading the UDF Anchor Volume Descriptor, use the logical
  sectorsize of 2048 when calculating the offset to read from, but
  use the actual sectorsize to determine how much to read.

- works with reading a DVD disk and a DVD disk image file via mdconfig
- correctly returns EINVAL if we try to mount_udf an audio CD, instead
  of panicking inside GEOM when INVARIANTS is set
2005-10-09 04:45:33 +00:00
Pawel Jakub Dawidek
8597a1c5b2 We don't need 'imp' here. 2005-10-07 10:30:47 +00:00
Robert Watson
2affdbee3e Second attempt at a work-around for fifo-related socket panics during
make -j with high levels of parallelism: acquire Giant in fifo I/O
routines.

Discussed with:	ups
MFC after:	3 days
2005-10-01 20:15:41 +00:00
Poul-Henning Kamp
73a2c3a32e The NWFS code in RELENG_6 is broken due to a typo in
sys/fs/nwfs/nwfs_vfsop= s.c, introduced with the conversion to
nmount with revision 1.38. This causes mount_nwfs to fail with
the error message:

  mount_nwfs: mount error: /mnt/netware: syserr = No such file or directo=
ry

This is caused by a typo on line 178, which specifies "nwfw_args"
rather than "nwfs_args".

Submitted by:	Antony Mawer <gnats@mawer.org>
Fat fingers:	phk
PR:		86757
MFC:		3 days
2005-09-30 18:21:05 +00:00
Peter Edwards
20c5ba3685 Remove checks for BOOTSIG[23] from FAT32 bootblocks.
There seems to be very little documentary evidence outside this
implementation to suggest a these checks are neccessary, and more
than one camera-formatted flash disk fails the check, but mounts
successfully on most other systems.

Reviewed By: bde@
2005-09-29 14:09:46 +00:00
Robert Watson
a0e81bce69 Back out fifo_vnops.c:1.127, which introduced an sx lock around I/O on
a fifo.  While this did indeed close the race, confirming suspicions
about the nature of the problem, it causes difficulties with blocking
I/O on fifos.

Discussed with:		ups
Also spotted by:	Peter Holm <peter at holm dot cc>
2005-09-27 16:45:22 +00:00
Robert Watson
454c3d13be Assert v_fifoinfo is non-NULL in fifo_close() in order to catch
non-conforming cases sooner.

MFC after:	3 days
Reported by:	Peter Holm <peter at holm dot cc>
2005-09-26 08:17:03 +00:00