Commit Graph

10691 Commits

Author SHA1 Message Date
rwatson
2d03779951 Various cleanups for soreceive_dgram():
- Update or remove comments that were left over from the original
  soreceive_generic() implementation.  Quite a few were misleading in the
  context of the new code.
- Since soreceive_dgram() has a simpler structure, replace several gotos
  with a while loop making the invariants more clear.
- In the blocking while loop, don't try to handle cases incompatible with
  the loop invariant (since m is always NULL, don't check for and handle
  non-NULL).
- Don't drop and re-acquire the socket buffer lock unnecessarily after
  sbwait() returns, which may help reduce lock contention (etc).
- Assume PR_ATOMIC since we assert it at the top of the function.

MFC after:	3 days
2008-10-01 13:26:52 +00:00
jhb
3eb652b1c7 Update the function name in several assertions in soreceive_dgram().
Approved by:	rwatson
MFC after:	3 days
2008-09-30 18:44:26 +00:00
kib
1fb31bd167 If the panic thread is preempted after setting panicstr but before
setting TDF_INPANIC then it will never be rescheduled again. Wrap
setting the panic condition with the critical section.

Noted and reviewed by:	tegge
MFC after:	1 week
2008-09-27 15:45:54 +00:00
ed
e40b7c4704 Move uminor() and umajor() to the same place as userspace minor() and major().
The uminor() and umajor() functions have the same use in kernel space as
the minor() and major() functions in userspace. If we ever get rid of
the minor() function in kernel space, we could decide to just expose
minor() and major() to kernel space, making uminor() and umajor()
redundant.

There are two reasons why we want to have uminor() and umajor() in
<sys/types.h>:

- Having them close together prevents them from diverting. Even though
  it's unlikely the definitions will change, it's a good habit to have
  them at the same place.

- They don't really belong in kern_conf.c. kern_conf.c has been
  liberated from dealing with device major and minor number handling.

The device_ids(9) manpage now lists the wrong #include's, because it
should only list <sys/types.h> now. I'm leaving it as it is now, because
I wonder if we should document them anyway. We're probably better off
documenting minor(3) and major(3).
2008-09-27 13:19:09 +00:00
ed
4efdef565f Replace all calls to minor() with dev2unit().
After I removed all the unit2minor()/minor2unit() calls from the kernel
yesterday, I realised calling minor() everywhere is quite confusing.
Character devices now only have the ability to store a unit number, not
a minor number. Remove the confusion by using dev2unit() everywhere.

This commit could also be considered as a bug fix. A lot of drivers call
minor(), while they should actually be calling dev2unit(). In -CURRENT
this isn't a problem, but it turns out we never had any problem reports
related to that issue in the past. I suspect not many people connect
more than 256 pieces of the same hardware.

Reviewed by:	kib
2008-09-27 08:51:18 +00:00
ed
2e07c6d916 Don't forget to initialize `int error' in ttydev_open().
I've had some reports in the past that opening an already opened TTY
through, for example, /dev/tty can fail with random error codes. Looking
at ttydev_open(), I can see there is a way `error' is returned without
initialising it. Even though I haven't had any confirmation this fixes
the bug, I'll fix it anyway.

Reported by:	Andrzej Tobola <ato iem pw edu pl>
2008-09-26 18:17:04 +00:00
ed
d421caabf9 Rename the minor' argument of make_dev(9) to unit'.
To prevent any further confusion about device minor and unit numbers,
we'd better just refer to device unit numbers. Many people still think
the numbers we show inside devfs have any relation to the numbers passed
to make_dev(9), which is not the case.

Discussed with:	kib
2008-09-26 14:31:24 +00:00
ed
4212d51a7d Remove unit2minor() use from kernel code.
When I changed kern_conf.c three months ago I made device unit numbers
equal to (unneeded) device minor numbers. We used to require
bitshifting, because there were eight bits in the middle that were
reserved for a device major number. Not very long after I turned
dev2unit(), minor(), unit2minor() and minor2unit() into macro's.
The unit2minor() and minor2unit() macro's were no-ops.

We'd better not remove these four macro's from the kernel, because there
is a lot of (external) code that may still depend on them. For now it's
harmless to remove all invocations of unit2minor() and minor2unit().

Reviewed by:	kib
2008-09-26 14:19:52 +00:00
jhb
6ccb676bf2 Regen. 2008-09-25 20:08:36 +00:00
jhb
00776aeb58 Tidy up a few things with syscall generation:
- Instead of using a syscall slot (370) just to get a function prototype
  for lkmressys(), add an explicit function prototype to <sys/sysent.h>.
  This also removes unused special case checks for 'lkmressys' from
  makesyscalls.sh.
- Instead of having magic logic in makesyscalls.sh to only generate a
  function prototype the first time 'lkmnosys' is seen, make 'NODEF'
  always not generate a function prototype and include an explicit
  prototype for 'lkmnosys' in <sys/sysent.h>.
- As a result of the fix in (2), update the LKM syscall entries in
  the freebsd32 syscall table to use 'lkmnosys' rather than 'nosys'.
- Use NOPROTO for the __syscall() entry (198) in the native ABI.  This
  avoids the need for magic logic in makesyscalls.h to only generate
  a function prototype the first time 'nosys' is encountered.
2008-09-25 20:07:42 +00:00
jhb
9c5408c4f9 - Don't do a WITNESS_SAVE() on the interlock if it is Giant in the condition
variable wait routines.  DROP_GIANT() already manages that state in the
  Giant interlock case.
- Assert that Giant is held when it is passed as a sleep interlock.
2008-09-25 13:42:19 +00:00
jhb
1161006cb6 Part 1 of making shared lookups more resilient with respect to forced
unmounts.  When we upgrade a vnode lock from shared to exclusive during
a name cache lookup, fail the lookup with EBADF if the vnode is invalidated
while we are waiting for the exclusive lock.

Also, for correctness (though I'm not sure it can occur in practice),
downgrade an exclusively locked vnode if it should be share locked.

Tested by:	pho
2008-09-24 18:51:33 +00:00
jhb
8d01b3e526 Update description of witness_watch. 2008-09-24 18:47:24 +00:00
ed
7993f2b835 Fix a crash when calling tty_rel_free() while draining during closure.
Yesterday I got two reports of potential crashes, related to TTY
deallocation during device closure. When a thread is in TF_OPENCLOSE,
draining its output upon closure, we should not allow calls to
tty_rel_free() to happen at the same time. This could cause the TTY to
be torn down twice.

PR:		kern/127561
Reported by:	KOIE Hidetaka <koie suri co jp>
Discussed with:	thompsa
2008-09-24 11:16:09 +00:00
kib
c500808674 Change the static struct sysentvec and struct Elf_Brandinfo initializers
to the C99 style. At least, it is easier to read sysent definitions
that way, and search for the actual instances of sigcode etc.

Explicitely initialize sysentvec.sv_maxssiz that was missed in most
sysvecs.

No objection from:	jhb
MFC after:	1 month
2008-09-24 10:14:37 +00:00
ed
94258be421 Track state to determine if the associated TTY device node has been used.
It turns out our old TTY layer (and other implementations) block when
you read() on a PTY master device of which the slave device node has not
been opened yet. Our new implementation just returned 0. This caused
applications like telnetd to die in a very subtle way (when child
processes would open the TTY later than the first call to select()).

Introduce a new flag called PTS_FINISHED, which indicates whether we
should block or bail out of a read() or write() occurs.

Reported by:	Claude Buisson <clbuisson orange fr>
2008-09-23 17:12:25 +00:00
obrien
219d6d1626 style(9) 2008-09-23 14:25:56 +00:00
obrien
8cb3aed24c Reverse if() logic to improve readability.
Reviewed by:	ru
2008-09-23 14:25:38 +00:00
ed
1475e942ed Introduce a hooks layer for the MPSAFE TTY layer.
One of the features that prevented us from fixing some of the TTY
consumers to work once again, was an interface that allowed consumers to
do the following:

- `Sniff' incoming data, which is used by the snp(4) driver.

- Take direct control of the input and output paths of a TTY, which is
  used by ng_tty(4), ppp(4), sl(4), etc.

There's no practical advantage in committing a hooks layer without
having any consumers. In P4 there is a preliminary port of snp(4) and
thompsa@ is busy porting ng_tty(4) to this interface. I already want to
have it in the tree, because this may stimulate others to work on the
remaining modules.

Discussed with:	thompsa
Obtained from:	//depot/projects/mpsafetty/...
2008-09-22 19:25:14 +00:00
ed
d49d04e133 Fix style(9) issue in TTY header files: document function argument names.
According to style(9), function argument names should only be omitted
for prototypes that are exported to userspace. This means we should
document the function arguments in the TTY header files, because they
are only used in userspace.

While there, change the type of the buffer argument of
ttydisc_rint_bypass() to `const void *' instead of `char *'.

Requested by:	attilio
Obtained from:	//depot/projects/mpsafetty/...
2008-09-22 18:44:09 +00:00
jkoshy
9d661b5bf6 Support sparsely numbered CPUs.
Requested by:	obrien, alfred (long ago)
2008-09-22 10:37:02 +00:00
ed
acdad30e01 Make fstat() on a pseudo-terminal master return sane timestamps.
Because pseudo-terminal master file descriptors no longer have a vnode
underneath, we have to fill in fstat() values ourselves. Make our
implementation somewhat sane by returning the timestamps of the TTY
device node that corresponds with our file descriptor.

Obtained from:	//depot/projects/mpsafettty/...
2008-09-21 19:24:15 +00:00
ed
3c13ffd7b5 Now that the number of clist consumers have dropped massively, trim down
the code to prevent useless waste of space.

- Remove support for quote bits. There is not a single driver that needs
  these bits anymore. This means putc() now accepts a char instead of an
  int.

- Remove the unneeded catq() and nextc() routines. They were only used
  by the old TTY layer.

- Convert the clist code to use ANSI C prototypes.
2008-09-21 18:12:18 +00:00
kib
a127656dea fdescfs, devfs, mqueuefs, nfs, portalfs, pseudofs, tmpfs and xfs
initialize the vattr structure in VOP_GETATTR() with VATTR_NULL(),
vattr_null() or by zeroing it. Remove these to allow preinitialization
of fields work in vn_stat(). This is needed to get birthtime initialized
correctly.

Submitted by:   Jaakko Heinonen <jh saunalahti fi>
Discussed on:   freebsd-fs
MFC after:	1 month
2008-09-20 19:50:52 +00:00
kib
ef4f1dc9c7 Initialize va_rdev to NODEV instead of 0 or VNOVAL in VOP_GETATTR().
NODEV is more appropriate when va_rdev doesn't have a meaningful value.

Submitted by:   Jaakko Heinonen <jh saunalahti fi>
Suggested by:   bde
Discussed on:   freebsd-fs
MFC after:	1 month
2008-09-20 19:49:15 +00:00
kib
570463af80 Initialize va_rdev to NODEV and va_fsid to VNOVAL before the
VOP_GETATTR() call in vn_stat(). Thus if a file system doesn't
initialize those fields in VOP_GETATTR() they will have a sane default
value.

Submitted by:   Jaakko Heinonen <jh saunalahti fi>
Discussed on:   freebsd-fs
MFC after:	1 month
2008-09-20 19:48:24 +00:00
kib
81d455e702 Initialize va_flags and va_filerev properly in VOP_GETATTR(). Don't
initialize va_vaflags and va_spare because they are not part of the
VOP_GETATTR() API. Also don't initialize birthtime to ctime or zero.

Submitted by:   Jaakko Heinonen <jh saunalahti fi>
Reviewed by:	bde
Discussed on:   freebsd-fs
MFC after:	1 month
2008-09-20 19:46:45 +00:00
kib
c6232cabbe Initialize birthtime fields in vn_stat() to prevent stat(2) from
returning uninitialized birthtime. Most file systems don't initialize
birthtime properly in their VOP_GETTATTR().

Submitted by:   Jaakko Heinonen <jh saunalahti fi>
Reviewed by:	bde
Discussed on:   freebsd-fs
MFC after:	1 month
2008-09-20 19:43:22 +00:00
obrien
0c0da6bba7 Add freebsd32 compat shim for nmount(2).
(and quiet some compiler warnings for vfs_donmount)
2008-09-19 15:17:32 +00:00
jhb
a0cba7a231 Various style fixes. 7 space indent is just odd. 2008-09-18 20:10:11 +00:00
jhb
2ba0911fd8 Sort includes. 2008-09-18 20:04:22 +00:00
attilio
23ff3dbeb8 Remove the suser(9) interface from the kernel. It has been replaced from
years by the priv_check(9) interface and just very few places are left.
Note that compatibility stub with older FreeBSD version
(all above the 8 limit though) are left in order to reduce diffs against
old versions. It is responsibility of the maintainers for any module, if
they think it is the case, to axe out such cases.

This patch breaks KPI so __FreeBSD_version will be bumped into a later
commit.

This patch needs to be credited 50-50 with rwatson@ as he found time to
explain me how the priv_check() works in detail and to review patches.

Tested by:      Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
Reviewed by:    rwatson
2008-09-17 15:49:44 +00:00
ed
0f8f4f624b Fix minor TTY API inconsistency.
Unlike tty_rel_gone() and tty_rel_sess(), the tty_rel_pgrp() routine
does not unlock the TTY. I once had the idea to make the code call
tty_rel_pgrp() and tty_rel_sess(), picking up the TTY lock once. This
turned out a little harder than I expected, so this is how it works now.

It's a lot easier if we just let tty_rel_pgrp() unlock the TTY, because
the other routines do this anyway.
2008-09-16 14:57:23 +00:00
kib
f6863a9ef7 When attempt is made to suspend a filesystem that is already syspended,
wait until the current suspension is lifted instead of silently returning
success immediately. The consequences of calling vfs_write() resume when
not owning the suspension are not well-defined at best.

Add the vfs_susp_clean() mount method to be called from
vfs_write_resume(). Set it to process_deferred_inactive() for ffs, and
stop calling it manually.

Add the thread flag TDP_IGNSUSP that allows to bypass the suspension
point in the vn_start_write. It is intended for use by VFS in the
situations where the suspender want to do some i/o requiring calls to
vn_start_write(), and this i/o cannot be done later.

Reviewed by:	tegge
In collaboration with:	pho
MFC after:	 1 month
2008-09-16 11:51:06 +00:00
kib
0488506405 Add the ffs structures introspection functions for ddb.
Show the b_dep value for the buffer in the show buffer command.
Add a comand to dump the dirty/clean buffer list for vnode.

Reviewed by:	tegge
Tested and used by:   pho
MFC after:   1 month
2008-09-16 11:19:38 +00:00
kib
039c5da1b2 Garbage-collect vn_write_suspend_wait().
Suggested and reviewed by:	tegge
Tested by:	pho
MFC after:	1 month
2008-09-16 11:09:26 +00:00
sam
05a7094fc1 Make ddb command registration dynamic so modules can extend
the command set (only so long as the module is present):
o add db_command_register and db_command_unregister to add and remove
  commands, respectively
o replace linker sets with SYSINIT's (and SYSUINIT's) that register
  commands
o expose 3 list heads: db_cmd_table, db_show_table, and db_show_all_table
  for registering top-level commands, show operands, and show all operands,
  respectively

While here also:
o sort command lists
o add DB_ALIAS, DB_SHOW_ALIAS, and DB_SHOW_ALL_ALIAS to add aliases
  for existing commands
o add "show all trace" as an alias for "show alltrace"
o add "show all locks" as an alias for "show alllocks"

Submitted by:	Guillaume Ballet <gballet@gmail.com> (original version)
Reviewed by:	jhb
MFC after:	1 month
2008-09-15 22:45:14 +00:00
jhb
a55e334c2b Expose a new public routine intr_event_execute_handlers() which executes
all the non-filter handlers attached to an interrupt event.  This can be
used by device drivers which multiplex their interrupt onto the interrupt
handlers for child devices.
2008-09-15 22:19:44 +00:00
attilio
00ea27d0c3 - For any lock list we hold the head in order to reduce allocation from
the free list and in this way avoid contention on the w_mtx.
  In order to make the code simple, we rely on the rule that when the head
  has not a child it also doesn't have other subsequent entries.
  Actually this assertion is broken because we can free all the head
  children and quit witness_unlock() with the head still allocated, with no
  children and subsequent entries present.
  Fix this by shifting the head if other entries are present and still
  freeing the object, but leaving always an head.
- Fix witness_thread_has_locks() in order to report, correctly, if the
  lock list linked to a specific thread has children or not based on the
  above explained rule.
- Fix a printout into DDB's "show alllocks" command in order to show,
  correctly, the process name that is really what we want.
- Fix style(9) for a comment.

Tested by:	Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
Reported by:	Marko Kiiskila <marko dot kiiskila at nokia dot com>
Sponsored by:	Nokia
2008-09-12 21:44:01 +00:00
csjp
1fa65beb80 Make sure the TTY has not disappeared out from under us before calling
ttydevsw_outwakeup().  This should fix panics which occur after remote
login sessions timeout during moderate TTY activity.  An example of
where this might occur is where a pending write to the terminal is
occurring while sshd(8) is shutting down the TTY after a TCP timeout.

Submitted by:	ed
2008-09-10 20:12:10 +00:00
jhb
af0471aaec Teach WITNESS about the interlocks used with lockmgr. This removes a bunch
of spurious witness warnings since lockmgr grew witness support.  Before
this, every time you passed an interlock to a lockmgr lock WITNESS treated
it as a LOR.

Reviewed by:	attilio
2008-09-10 19:13:30 +00:00
jhb
fd768740df Various whitespace fixes. 2008-09-10 17:59:21 +00:00
trasz
9303940ffe Remove VSVTX, VSGID and VSUID. This should be a no-op,
as VSVTX == S_ISVTX, VSGID == S_ISGID and VSUID == S_ISUID.

Approved by:	rwatson (mentor)
2008-09-10 13:16:41 +00:00
jhb
7851995759 - Reduce scope of #ifdef's in uma_zcreate() call in init_turnstile0().
- Set UMA_ZONE_NOFREE so that the per-turnstile spin locks are type stable
  to avoid a race where one thread might dereference a lock in a free'd
  turnstile that was previously used by another thread.

Theorized by:	tegge (2)
MFC after:	1 week
2008-09-08 21:40:15 +00:00
jhb
f147e876b7 Close a race in sleepq_broadcast() where the sleepq could be reused after
it had been assigned to the last sleeping thread.  That thread might have
started running on another CPU and have reused that sleep queue.  Fix it
by just walking the thread queue using TAILQ_FOREACH_SAFE() rather than
a while loop.

PR:		amd64/124200
Discovered by:	tegge
Tested by:	benjsc
MFC after:	1 week
2008-09-08 19:44:57 +00:00
bz
cb1cd5ee09 Catch a possible NULL pointer deref in case the offsets got mangled
somehow.
As a consequence we may now get an unexpected result(*).
Catch that error cases with a well defined panic giving appropriate
pointers to ease debugging.

(*) While the concensus was that the case should never happen unless
    there was a bug, noone was definitively sure.

Discussed with:		kmacy (about 8 months back)
Reviewed by:		silby (as part of a larger patch in March)
MFC after:		2 months
2008-09-07 13:09:04 +00:00
ed
e4b90a03d0 Make TIOCCONS use priv_check() instead of checking /dev/console permissions.
As discussed with Robert on IRC, checking the permissions on
/dev/console to see if we can call TIOCCONS could be unreliable. When we
run a chroot() without a devfs instance mounted inside, it won't
actually check the permissions on the device node inside the devfs
instance.

Using the already existing PRIV_TTY_CONSOLE for this seems like a better
idea.

Approved by:	rwatson
2008-09-06 14:43:32 +00:00
ed
bf17d6c233 Fix a small typo in a comment in calcru1().
The word "happene" should read "happened".

Submitted by:	Jille Timmermans <jille quis cx>
2008-09-05 15:55:06 +00:00
davidxu
40259861b6 Fix LOR between vnode lock and internal mqueue locks. 2008-09-05 07:32:57 +00:00
thompsa
6beefd0e39 Remove the alignment of the align parameter. This is up to the caller to pass
in and it breaks tap(4) on strict alignment machines as m_uiotombuf is called
with ETHER_ALIGN.

Found by:	Jared Go
Reviewed by:	emax
MFC after:	3 days
2008-09-05 04:05:31 +00:00