and ffs_lock. This cannot catch situations where holdcnt is incremented
not by curthread, but I think it is useful.
Reviewed by: tegge, attilio
Tested by: pho
MFC after: 2 weeks
MNTK_UNMOUNT before, and mnt_mtx is used as interlock. vfs_busy() always
tries to obtain a shared lock on mnt_lock, the other user is unmount who
tries to drain it, setting MNTK_UNMOUNT before.
Reviewed by: tegge, attilio
Tested by: pho
MFC after: 2 weeks
realtimer_expire() to not rearm the timer, otherwise there is a chance
that a callout will be left there and be tiggered in future unexpectly.
Bug reported by: tegge@
not the string formatted at the time of CTRX() call. Stack_ktr(9) uses
an on-stack buffer for the symbol name, that is supplied as an argument
to ktr. As result, stack_ktr() traces show garbage or cause page faults.
Fix stack_ktr() by using pointer to module symbol table that is supposed
to have a longer lifetime.
Tested by: pho
MFC after: 1 week
credentials from inp_cred which is also available after the
socket is gone.
Switch cr_canseesocket consumers to cr_canseeinpcb.
This removes an extra acquisition of the socket lock.
Reviewed by: rwatson
MFC after: 3 months (set timer; decide then)
PCPU_PTR() curthread can migrate on another CPU and get incorrect
results.
- Fix a similar race into witness_warn().
- Fix the interlock's checks bypassing by correctly using the appropriate
children even when the lock_list chunk to be explored is not the first
one.
- Allow witness_warn() to work with spinlocks too.
Bugs found by: tegge
Submitted by: jhb, tegge
Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
- Change the ddb(4) commands to be more useful (by thompsa@):
- `show ttys' is now called `show all ttys'. This command will now
also display the address where the TTY data structure resides.
- Add `show tty <addr>', which dumps the TTY in a readable form.
- Place an upper bound on the TTY buffer sizes. Some drivers do not want
to care about baud rates. Protect these drivers by preventing the TTY
buffers from getting enormous. Right now we'll just clamp it to 64K,
which is pretty high, taking into account that these buffers are only
used by the built-in discipline.
- Only call ttydev_leave() when needed. Back in April/May the TTY
reference counting mechanism was a little different, which required us
to call ttydev_leave() each time we finished a cdev operation.
Nowadays we only need to call ttydev_leave() when we really mark it as
being closed.
- Improve return codes of read() and write() on TTY device nodes.
- Make sure we really wake up all blocked threads when the driver calls
tty_rel_gone(). There were some possible code paths where we didn't
properly wake up any readers/writers.
- Add extra assertions to prevent sleeping on a TTY that has been
abandoned by the driver.
- Use ttydev_cdevsw as a more reliable method to figure out whether a
device node is a real TTY device node.
Obtained from: //depot/projects/mpsafetty/...
Reviewed by: thompsa
this eliminates some problems of locking, e.g, a thread lock is needed
but can not be used at that time. Only the process lock is needed now
for new field.
In particular following functions KPI results modified:
- bufobj_invalbuf()
- bufsync()
and BO_SYNC() "virtual method" of the buffer objects set.
Main consumers of bufobj functions are affected by this change too and,
in particular, functions which changed their KPI are:
- vinvalbuf()
- g_vfs_close()
Due to the KPI breakage, __FreeBSD_version will be bumped in a later
commit.
As a side note, please consider just temporary the 'curthread' argument
passing to VOP_SYNC() (in bufsync()) as it will be axed out ASAP
Reviewed by: kib
Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
designed drivers would never hit, but was exposed in diving into
another problem...
When expanding the devclass array, free the old memory after updating
the pointer to the new memory. For the following single race case,
this helps:
allocate new memory
copy to new memory
free old memory
<interrupt> read pointer to freed memory
update pointer to new memory
Now we do
allocate new memory
copy to new memory
update pointer to new memory
free old memory
Which closes this problem, but doesn't even begin to address the
multicpu races, which all should be covered by Giant at the moment,
but likely aren't completely.
Note: reviewers were ok with this fix, but suggested the use case
wasn't one we wanted to encourage.
Reviewed by: jhb, scottl.
have_interp to TRUE. This allows the code in image activator to try
/libexec/ld-elf.so.1 as interpreter when newinterp is not found to
execute.
Reviewed by: peter
MFC after: 2 weeks (together with r175105)
descriptor pointer in unp_freerights: we can no longer recurse into
unp_gc due to unp_gc being invoked in a deferred way, but it's still
a good idea.
MFC after: 3 days
no data is ready, return 0 rather than blocking or returning EAGAIN.
This is consistent with the behavior of soreceive_generic (soreceive)
in earlier versions of FreeBSD, and restores this behavior for UDP.
Discussed with: jhb, sam
MFC after: 3 days
improperly invoking sosend(), soreceive(), and sopoll() instead of
attach either specialized or _generic() versions of those functions
to their pru_sosend, pru_soreceive, and pru_sopoll protosw methods.
MFC after: 3 days
booting from an MFS root (e.g. from an install CD) firmware_mountroot
can be called twice with the second call happening before the task
callback occurs; this results in the task structure contents being
corrupted because it was declared static.
Submitted by: marius (original version)
- Staticize and locally prototype functions uipc_ctloutput(), unp_dispose(),
unp_init(), and unp_externalize(), none of which have been required
outside of uipc_usrreq.c since uipc_proto.c was removed.
- Remove stale prototype for uipc_usrreq(), which has not existed in the
code since 1997
- Forward declare and staticize uipc_usrreqs structure in uipc_usrreq.c and
not un.h.
- Comment on why uipc_connect2() is still non-static -- it is used directly
by fifofs.
- Remove stale comments, tidy up whitespace.
MFC after: 3 days (where applicable)
from the vimage project, as per plan established at devsummit 08/08:
http://wiki.freebsd.org/Image/Notes200808DevSummit
Introduce INIT_VNET_*() initializer macros, VNET_FOREACH() iterator
macros, and CURVNET_SET() context setting macros, all currently
resolving to NOPs.
Prepare for virtualization of selected SYSCTL objects by introducing a
family of SYSCTL_V_*() macros, currently resolving to their global
counterparts, i.e. SYSCTL_V_INT() == SYSCTL_INT().
Move selected #defines from sys/sys/vimage.h to newly introduced header
files specific to virtualized subsystems (sys/net/vnet.h,
sys/netinet/vinet.h etc.).
All the changes are verified to have zero functional impact at this
point in time by doing MD5 comparision between pre- and post-change
object files(*).
(*) netipsec/keysock.c did not validate depending on compile time options.
Implemented by: julian, bz, brooks, zec
Reviewed by: julian, bz, brooks, kris, rwatson, ...
Approved by: julian (mentor)
Obtained from: //depot/projects/vimage-commit2/...
X-MFC after: never
Sponsored by: NLnet Foundation, The FreeBSD Foundation
to store the socket address stored in the first mbuf in a packet chain.
This reduces contention on the lock and CPU system time in certain UDP
workloads.
Tested by: ps
Reviewed by: rwatson
MFC after: 1 week
- Update or remove comments that were left over from the original
soreceive_generic() implementation. Quite a few were misleading in the
context of the new code.
- Since soreceive_dgram() has a simpler structure, replace several gotos
with a while loop making the invariants more clear.
- In the blocking while loop, don't try to handle cases incompatible with
the loop invariant (since m is always NULL, don't check for and handle
non-NULL).
- Don't drop and re-acquire the socket buffer lock unnecessarily after
sbwait() returns, which may help reduce lock contention (etc).
- Assume PR_ATOMIC since we assert it at the top of the function.
MFC after: 3 days
setting TDF_INPANIC then it will never be rescheduled again. Wrap
setting the panic condition with the critical section.
Noted and reviewed by: tegge
MFC after: 1 week
The uminor() and umajor() functions have the same use in kernel space as
the minor() and major() functions in userspace. If we ever get rid of
the minor() function in kernel space, we could decide to just expose
minor() and major() to kernel space, making uminor() and umajor()
redundant.
There are two reasons why we want to have uminor() and umajor() in
<sys/types.h>:
- Having them close together prevents them from diverting. Even though
it's unlikely the definitions will change, it's a good habit to have
them at the same place.
- They don't really belong in kern_conf.c. kern_conf.c has been
liberated from dealing with device major and minor number handling.
The device_ids(9) manpage now lists the wrong #include's, because it
should only list <sys/types.h> now. I'm leaving it as it is now, because
I wonder if we should document them anyway. We're probably better off
documenting minor(3) and major(3).
After I removed all the unit2minor()/minor2unit() calls from the kernel
yesterday, I realised calling minor() everywhere is quite confusing.
Character devices now only have the ability to store a unit number, not
a minor number. Remove the confusion by using dev2unit() everywhere.
This commit could also be considered as a bug fix. A lot of drivers call
minor(), while they should actually be calling dev2unit(). In -CURRENT
this isn't a problem, but it turns out we never had any problem reports
related to that issue in the past. I suspect not many people connect
more than 256 pieces of the same hardware.
Reviewed by: kib
I've had some reports in the past that opening an already opened TTY
through, for example, /dev/tty can fail with random error codes. Looking
at ttydev_open(), I can see there is a way `error' is returned without
initialising it. Even though I haven't had any confirmation this fixes
the bug, I'll fix it anyway.
Reported by: Andrzej Tobola <ato iem pw edu pl>
To prevent any further confusion about device minor and unit numbers,
we'd better just refer to device unit numbers. Many people still think
the numbers we show inside devfs have any relation to the numbers passed
to make_dev(9), which is not the case.
Discussed with: kib
When I changed kern_conf.c three months ago I made device unit numbers
equal to (unneeded) device minor numbers. We used to require
bitshifting, because there were eight bits in the middle that were
reserved for a device major number. Not very long after I turned
dev2unit(), minor(), unit2minor() and minor2unit() into macro's.
The unit2minor() and minor2unit() macro's were no-ops.
We'd better not remove these four macro's from the kernel, because there
is a lot of (external) code that may still depend on them. For now it's
harmless to remove all invocations of unit2minor() and minor2unit().
Reviewed by: kib
- Instead of using a syscall slot (370) just to get a function prototype
for lkmressys(), add an explicit function prototype to <sys/sysent.h>.
This also removes unused special case checks for 'lkmressys' from
makesyscalls.sh.
- Instead of having magic logic in makesyscalls.sh to only generate a
function prototype the first time 'lkmnosys' is seen, make 'NODEF'
always not generate a function prototype and include an explicit
prototype for 'lkmnosys' in <sys/sysent.h>.
- As a result of the fix in (2), update the LKM syscall entries in
the freebsd32 syscall table to use 'lkmnosys' rather than 'nosys'.
- Use NOPROTO for the __syscall() entry (198) in the native ABI. This
avoids the need for magic logic in makesyscalls.h to only generate
a function prototype the first time 'nosys' is encountered.
variable wait routines. DROP_GIANT() already manages that state in the
Giant interlock case.
- Assert that Giant is held when it is passed as a sleep interlock.
unmounts. When we upgrade a vnode lock from shared to exclusive during
a name cache lookup, fail the lookup with EBADF if the vnode is invalidated
while we are waiting for the exclusive lock.
Also, for correctness (though I'm not sure it can occur in practice),
downgrade an exclusively locked vnode if it should be share locked.
Tested by: pho
Yesterday I got two reports of potential crashes, related to TTY
deallocation during device closure. When a thread is in TF_OPENCLOSE,
draining its output upon closure, we should not allow calls to
tty_rel_free() to happen at the same time. This could cause the TTY to
be torn down twice.
PR: kern/127561
Reported by: KOIE Hidetaka <koie suri co jp>
Discussed with: thompsa
to the C99 style. At least, it is easier to read sysent definitions
that way, and search for the actual instances of sigcode etc.
Explicitely initialize sysentvec.sv_maxssiz that was missed in most
sysvecs.
No objection from: jhb
MFC after: 1 month
It turns out our old TTY layer (and other implementations) block when
you read() on a PTY master device of which the slave device node has not
been opened yet. Our new implementation just returned 0. This caused
applications like telnetd to die in a very subtle way (when child
processes would open the TTY later than the first call to select()).
Introduce a new flag called PTS_FINISHED, which indicates whether we
should block or bail out of a read() or write() occurs.
Reported by: Claude Buisson <clbuisson orange fr>
One of the features that prevented us from fixing some of the TTY
consumers to work once again, was an interface that allowed consumers to
do the following:
- `Sniff' incoming data, which is used by the snp(4) driver.
- Take direct control of the input and output paths of a TTY, which is
used by ng_tty(4), ppp(4), sl(4), etc.
There's no practical advantage in committing a hooks layer without
having any consumers. In P4 there is a preliminary port of snp(4) and
thompsa@ is busy porting ng_tty(4) to this interface. I already want to
have it in the tree, because this may stimulate others to work on the
remaining modules.
Discussed with: thompsa
Obtained from: //depot/projects/mpsafetty/...
According to style(9), function argument names should only be omitted
for prototypes that are exported to userspace. This means we should
document the function arguments in the TTY header files, because they
are only used in userspace.
While there, change the type of the buffer argument of
ttydisc_rint_bypass() to `const void *' instead of `char *'.
Requested by: attilio
Obtained from: //depot/projects/mpsafetty/...
Because pseudo-terminal master file descriptors no longer have a vnode
underneath, we have to fill in fstat() values ourselves. Make our
implementation somewhat sane by returning the timestamps of the TTY
device node that corresponds with our file descriptor.
Obtained from: //depot/projects/mpsafettty/...
the code to prevent useless waste of space.
- Remove support for quote bits. There is not a single driver that needs
these bits anymore. This means putc() now accepts a char instead of an
int.
- Remove the unneeded catq() and nextc() routines. They were only used
by the old TTY layer.
- Convert the clist code to use ANSI C prototypes.
initialize the vattr structure in VOP_GETATTR() with VATTR_NULL(),
vattr_null() or by zeroing it. Remove these to allow preinitialization
of fields work in vn_stat(). This is needed to get birthtime initialized
correctly.
Submitted by: Jaakko Heinonen <jh saunalahti fi>
Discussed on: freebsd-fs
MFC after: 1 month
NODEV is more appropriate when va_rdev doesn't have a meaningful value.
Submitted by: Jaakko Heinonen <jh saunalahti fi>
Suggested by: bde
Discussed on: freebsd-fs
MFC after: 1 month
VOP_GETATTR() call in vn_stat(). Thus if a file system doesn't
initialize those fields in VOP_GETATTR() they will have a sane default
value.
Submitted by: Jaakko Heinonen <jh saunalahti fi>
Discussed on: freebsd-fs
MFC after: 1 month
initialize va_vaflags and va_spare because they are not part of the
VOP_GETATTR() API. Also don't initialize birthtime to ctime or zero.
Submitted by: Jaakko Heinonen <jh saunalahti fi>
Reviewed by: bde
Discussed on: freebsd-fs
MFC after: 1 month
returning uninitialized birthtime. Most file systems don't initialize
birthtime properly in their VOP_GETTATTR().
Submitted by: Jaakko Heinonen <jh saunalahti fi>
Reviewed by: bde
Discussed on: freebsd-fs
MFC after: 1 month
years by the priv_check(9) interface and just very few places are left.
Note that compatibility stub with older FreeBSD version
(all above the 8 limit though) are left in order to reduce diffs against
old versions. It is responsibility of the maintainers for any module, if
they think it is the case, to axe out such cases.
This patch breaks KPI so __FreeBSD_version will be bumped into a later
commit.
This patch needs to be credited 50-50 with rwatson@ as he found time to
explain me how the priv_check() works in detail and to review patches.
Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
Reviewed by: rwatson
Unlike tty_rel_gone() and tty_rel_sess(), the tty_rel_pgrp() routine
does not unlock the TTY. I once had the idea to make the code call
tty_rel_pgrp() and tty_rel_sess(), picking up the TTY lock once. This
turned out a little harder than I expected, so this is how it works now.
It's a lot easier if we just let tty_rel_pgrp() unlock the TTY, because
the other routines do this anyway.
wait until the current suspension is lifted instead of silently returning
success immediately. The consequences of calling vfs_write() resume when
not owning the suspension are not well-defined at best.
Add the vfs_susp_clean() mount method to be called from
vfs_write_resume(). Set it to process_deferred_inactive() for ffs, and
stop calling it manually.
Add the thread flag TDP_IGNSUSP that allows to bypass the suspension
point in the vn_start_write. It is intended for use by VFS in the
situations where the suspender want to do some i/o requiring calls to
vn_start_write(), and this i/o cannot be done later.
Reviewed by: tegge
In collaboration with: pho
MFC after: 1 month
Show the b_dep value for the buffer in the show buffer command.
Add a comand to dump the dirty/clean buffer list for vnode.
Reviewed by: tegge
Tested and used by: pho
MFC after: 1 month
the command set (only so long as the module is present):
o add db_command_register and db_command_unregister to add and remove
commands, respectively
o replace linker sets with SYSINIT's (and SYSUINIT's) that register
commands
o expose 3 list heads: db_cmd_table, db_show_table, and db_show_all_table
for registering top-level commands, show operands, and show all operands,
respectively
While here also:
o sort command lists
o add DB_ALIAS, DB_SHOW_ALIAS, and DB_SHOW_ALL_ALIAS to add aliases
for existing commands
o add "show all trace" as an alias for "show alltrace"
o add "show all locks" as an alias for "show alllocks"
Submitted by: Guillaume Ballet <gballet@gmail.com> (original version)
Reviewed by: jhb
MFC after: 1 month
all the non-filter handlers attached to an interrupt event. This can be
used by device drivers which multiplex their interrupt onto the interrupt
handlers for child devices.
the free list and in this way avoid contention on the w_mtx.
In order to make the code simple, we rely on the rule that when the head
has not a child it also doesn't have other subsequent entries.
Actually this assertion is broken because we can free all the head
children and quit witness_unlock() with the head still allocated, with no
children and subsequent entries present.
Fix this by shifting the head if other entries are present and still
freeing the object, but leaving always an head.
- Fix witness_thread_has_locks() in order to report, correctly, if the
lock list linked to a specific thread has children or not based on the
above explained rule.
- Fix a printout into DDB's "show alllocks" command in order to show,
correctly, the process name that is really what we want.
- Fix style(9) for a comment.
Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
Reported by: Marko Kiiskila <marko dot kiiskila at nokia dot com>
Sponsored by: Nokia
ttydevsw_outwakeup(). This should fix panics which occur after remote
login sessions timeout during moderate TTY activity. An example of
where this might occur is where a pending write to the terminal is
occurring while sshd(8) is shutting down the TTY after a TCP timeout.
Submitted by: ed
of spurious witness warnings since lockmgr grew witness support. Before
this, every time you passed an interlock to a lockmgr lock WITNESS treated
it as a LOR.
Reviewed by: attilio
- Set UMA_ZONE_NOFREE so that the per-turnstile spin locks are type stable
to avoid a race where one thread might dereference a lock in a free'd
turnstile that was previously used by another thread.
Theorized by: tegge (2)
MFC after: 1 week
it had been assigned to the last sleeping thread. That thread might have
started running on another CPU and have reused that sleep queue. Fix it
by just walking the thread queue using TAILQ_FOREACH_SAFE() rather than
a while loop.
PR: amd64/124200
Discovered by: tegge
Tested by: benjsc
MFC after: 1 week
somehow.
As a consequence we may now get an unexpected result(*).
Catch that error cases with a well defined panic giving appropriate
pointers to ease debugging.
(*) While the concensus was that the case should never happen unless
there was a bug, noone was definitively sure.
Discussed with: kmacy (about 8 months back)
Reviewed by: silby (as part of a larger patch in March)
MFC after: 2 months
As discussed with Robert on IRC, checking the permissions on
/dev/console to see if we can call TIOCCONS could be unreliable. When we
run a chroot() without a devfs instance mounted inside, it won't
actually check the permissions on the device node inside the devfs
instance.
Using the already existing PRIV_TTY_CONSOLE for this seems like a better
idea.
Approved by: rwatson
As reported by several users on the mailing lists, applications like
screen(1) fail to properly handle ^S and ^Q characters. This was because
MPSAFE TTY didn't implement packet mode (TIOCPKT) yet. Add basic packet
mode support to make these applications work again.
Obtained from: //depot/projects/mpsafetty/...
When I migrated tty_compat.c to MPSAFE TTY, I just hooked it up to the
build and fixed it until it compiled and somewhat worked. It turns out
this was not the smartest thing, because the old TTY layer also had a
field called t_flags, which contained a set of sgtty flags.
This means our current COMPAT_43TTY code overwrites the TTY flags,
causing all strange problems to occur. Fix this code to use a new struct
member called t_compatflags. This commit may cause kern/127054 to be
fixed, but this still has to be tested/confirmed by the originator. It
has to be fixed anyway.
PR: kern/127054
The ttydisc_getc() routine obtains a read length from ttyoutq_read().
For no valid reason, the current code stores this value in an int, and
returns a size_t. There is no need to perform this useless conversion.
Obtained from: //depot/projects/mpsafetty/...
lock tracking and checks, doing just the former ones.
- Fix a bug where sysctl utility was printing crazy values when setting a
new value for debug.witness.watch [0]
[0] Reported by: yongari
- In the current design, when a TTY decreases its baud rate, it tries to
shrink the queues. This may not always be possible, because it will
not free any blocks that are still filled with data.
Change the TTY queues to store a `quota' value as well, which means it
will not free any blocks when changing the baud rate, but when placing
blocks back into the queue. When the amount of blocks exceeds the
quota, they get freed.
It also fixes some edge cases, where TIOCSETA during read()/
write()-calls could actually make the queue a tiny bit bigger than in
normal cases.
- Don't leak blocks of memory when calling TIOCSETA when the device
driver abandons the TTY while allocating memory.
- Create ttyoutq_init() and ttyinq_init() to initialize the queues,
instead of initializing them by hand. The new TTY snoop driver also
creates an outq, so it's good to have a proper interface to do this.
Obtained from: //depot/projects/mpsafetty/...
1 means that witness is up and running.
0 means that witness is disabled but that it can be established later
again in effective way.
-1 means that witness is disabled permanently
- Fix a bug causing kernel to panic on witness disabling through
witness_watch. lock lists queues were still full of entries and this was
causing throubles with debugging stubs (like witness_thread_exit()).
Reported by: kris, yongari
Sponsored by: Nokia
- Implement IMAXBEL. It turned out the IMAXBEL termios switch was marked
as supported, while it had not been implemented.
- Don't go into the high watermark when in canonical mode, no data has
been canonicalized and the input buffer is full. This caused the
terminal to lock up. This prevented users from pressing
backspace/^U/etc in such cases.
This could easily be simulated by pasting a very big amount of data in
a shell with sh(1) in canonical mode.
Obtained from: //depot/projects/mpsafetty/...
A couple of months ago I was quite impressed, because when I was writing
code, I discovered that uiomove() would not allow any locks to be held,
while ureadc() did, mainly because ureadc() is implemented using the
same building blocks as uiomove().
Let's see if this triggers any aditional witness warnings on our source
tree.
Reviewed by: atillio