it is initialized; use path instead.
This change fixes a panic when using atapicam in conjunction with CAMDEBUG,
which has been described under kern/103602.
Thanks to Josh Carroll <josh.carroll@gmail.com> for providing the traces
that allowed identifying this problem.
PR: kern/103602
MFC after: 1 week
- Fix missing initialization in kern_rwlock.c causing bogus times to be collected
- Move updates to the lock hash to after the lock is released for spin mutexes,
sleep mutexes, and sx locks
- Add new kernel build option LOCK_PROFILE_FAST - only update lock profiling
statistics when an acquisition is contended. This reduces the overhead of
LOCK_PROFILING to increasing system time by 20%-25% which on
"make -j8 kernel-toolchain" on a dual woodcrest is unmeasurable in terms
of wall-clock time. Contrast this to enabling lock profiling without
LOCK_PROFILE_FAST and I see a 5x-6x slowdown in wall-clock time.
arrangement that has no intrinsic internal knowledge of whether devices
it is given are truly multipath devices. As such, this is a simplistic
approach, but still a useful one.
The basic approach is to (at present- this will change soon) use camcontrol
to find likely identical devices and and label the trailing sector of the
first one. This label contains both a full UUID and a name. The name is
what is presented in /dev/multipath, but the UUID is used as a true
distinguishor at g_taste time, thus making sure we don't have chaos
on a shared SAN where everyone names their data multipath as "Fred".
The first of N identical devices (and N *may* be 1!) becomes the active
path until a BIO request is failed with EIO or ENXIO. When this occurs,
the active disk is ripped away and the next in a list is picked to
(retry and) continue with.
During g_taste events new disks that meet the match criteria for existing
multipath geoms get added to the tail end of the list.
Thus, this active/passive setup actually does work for devices which
go away and come back, as do (now) mpt(4) and isp(4) SAN based disks.
There is still a lot to do to improve this- like about 5 of the 12
recommendations I've received about it, but it's been functional enough
for a while that it deserves a broader test base.
Reviewed by: pjd
Sponsored by: IronPort Systems
MFC: 2 months
Linux does not check file descriptor when MAP_ANONYMOUS is set.
This should fix recent LTP test regressions.
Reported by: Scot Hetzel (swhetzel at gmail dot com)
netchild
case where it asynchronously exits burst mode on its own. Handle different
values of hz in sleep loop. Provide more debugging options to tune EC
behavior. These tunables/sysctls may be temporary and are not for user
access if the EC is working properly. Burst mode is now on by default for
testing and the poll interval has been increased from 100 to 500 us and
total timeout from 100 to 500 ms.
Hopefully this should be the first step of addressing reports of timeout
errors during battery or thermal access, especially on HP/Compaq laptops.
It is reasonably stable and should not cause a loss of functionality or
performance on systems that were previously working. Testing shows an
increase of responsiveness by ~75% on one system.
PR: kern/98171
potential issues where the peer does not close, potentially leaving
thousands of connections in FIN_WAIT_2. This is controlled by a new sysctl
fast_finwait2_recycle, which is disabled by default.
Reviewed by: gnn, silby.
- BIOCGDIRECTION and BIOCSDIRECTION get or set the setting determining
whether incoming, outgoing, or all packets on the interface should be
returned by BPF. Set to BPF_D_IN to see only incoming packets on the
interface. Set to BPF_D_INOUT to see packets originating locally and
remotely on the interface. Set to BPF_D_OUT to see only outgoing
packets on the interface. This setting is initialized to BPF_D_INOUT
by default. BIOCGSEESENT and BIOCSSEESENT are obsoleted by these but
kept for backward compatibility.
- BIOCFEEDBACK sets packet feedback mode. This allows injected packets
to be fed back as input to the interface when output via the interface is
successful. When BPF_D_INOUT direction is set, injected outgoing packet
is not returned by BPF to avoid duplication. This flag is initialized to
zero by default.
Note that libpcap has been modified to support BPF_D_OUT direction for
pcap_setdirection(3) and PCAP_D_OUT direction is functional now.
Reviewed by: rwatson
concurrency:
- Add per-unpcb mutexes protecting unpcb connection state, fields, etc.
- Replace global UNP mutex with a global UNP rwlock, which will protect the
UNIX domain socket connection topology, v_socket, and be acquired
exclusively before acquiring more than per-unpcb at a time in order to
avoid lock order issues.
In performance measurements involving MySQL, this change has little or no
overhead on UP (+/- 1%), but leads to a significant (5%-30%) improvement in
multi-processor measurements using the sysbench and supersmack benchmarks.
Much testing by: kris
Approved by: re (kensmith)
determine if it holds an exclusive rwlock reference or not. This is
non-ideal, but recursion scenarios in the network stack currently
require it.
Approved by: jhb
call which can easily lock up a system otherwise; instead,
return ENOBUFS as documented in a manpage, thus reverting
us to the FreeBSD 4.x behavior.
Reviewed by: rwatson
MFC after: 2 weeks
- only collect timestamps when a lock is contested - this reduces the overhead
of collecting profiles from 20x to 5x
- remove unused function from subr_lock.c
- generalize cnt_hold and cnt_lock statistics to be kept for all locks
- NOTE: rwlock profiling generates invalid statistics (and most likely always has)
someone familiar with that should review
attribute. Also define some macros to manipulate one of these
structures. Explain their use in the extattr.9 manual page.
The next step will be to make a sweep through the kernel replacing
the old pointer manipulation code. To get an idea of how they would
be used, the ffs_findextattr() function in ufs/ffs/ffs_vnops.c is
currently written as follows:
/*
* Vnode operating to retrieve a named extended attribute.
*
* Locate a particular EA (nspace:name) in the area (ptr:length), and return
* the length of the EA, and possibly the pointer to the entry and to the data.
*/
static int
ffs_findextattr(u_char *ptr, u_int length, int nspace, const char *name,
u_char **eap, u_char **eac)
{
u_char *p, *pe, *pn, *p0;
int eapad1, eapad2, ealength, ealen, nlen;
uint32_t ul;
pe = ptr + length;
nlen = strlen(name);
for (p = ptr; p < pe; p = pn) {
p0 = p;
bcopy(p, &ul, sizeof(ul));
pn = p + ul;
/* make sure this entry is complete */
if (pn > pe)
break;
p += sizeof(uint32_t);
if (*p != nspace)
continue;
p++;
eapad2 = *p++;
if (*p != nlen)
continue;
p++;
if (bcmp(p, name, nlen))
continue;
ealength = sizeof(uint32_t) + 3 + nlen;
eapad1 = 8 - (ealength % 8);
if (eapad1 == 8)
eapad1 = 0;
ealength += eapad1;
ealen = ul - ealength - eapad2;
p += nlen + eapad1;
if (eap != NULL)
*eap = p0;
if (eac != NULL)
*eac = p;
return (ealen);
}
return(-1);
}
After applying the structure and macros, it would look like this:
/*
* Vnode operating to retrieve a named extended attribute.
*
* Locate a particular EA (nspace:name) in the area (ptr:length), and return
* the length of the EA, and possibly the pointer to the entry and to the data.
*/
static int
ffs_findextattr(u_char *ptr, u_int length, int nspace, const char *name,
u_char **eapp, u_char **eac)
{
struct extattr *eap, *eaend;
eaend = (struct extattr *)(ptr + length);
for (eap = (struct extattr *)ptr; eap < eaend; eap = EXTATTR_NEXT(eap)){
/* make sure this entry is complete */
if (EXTATTR_NEXT(eap) > eaend)
break;
if (eap->ea_namespace != nspace ||
eap->ea_namelength != length ||
bcmp(eap->ea_name, name, length))
continue;
if (eapp != NULL)
*eapp = eap;
if (eac != NULL)
*eac = EXTATTR_CONTENT(eap);
return (EXTATTR_CONTENT_SIZE(eap));
}
return(-1);
}
Not only is it considerably shorter, but it hopefully more readable :-)
PRIO_USER case, possibly also other places that deferences
p_ucred.
In the past, we insert a new process into the allproc list right
after PID allocation, and release the allproc_lock sx. Because
most content in new proc's structure is not yet initialized,
this could lead to undefined result if we do not handle PRS_NEW
with care.
The problem with PRS_NEW state is that it does not provide fine
grained information about how much initialization is done for a
new process. By defination, after PRIO_USER setpriority(), all
processes that belongs to given user should have their nice value
set to the specified value. Therefore, if p_{start,end}copy
section was done for a PRS_NEW process, we can not safely ignore
it because p_nice is in this area. On the other hand, we should
be careful on PRS_NEW processes because we do not allow non-root
users to lower their nice values, and without a successful copy
of the copy section, we can get stale values that is inherted
from the uninitialized area of the process structure.
This commit tries to close the race condition by grabbing proc
mutex *before* we release allproc_lock xlock, and do copy as
well as zero immediately after the allproc_lock xunlock. This
guarantees that the new process would have its p_copy and p_zero
sections, as well as user credential informaion initialized. In
getpriority() case, instead of grabbing PROC_LOCK for a PRS_NEW
process, we just skip the process in question, because it does
not affect the final result of the call, as the p_nice value
would be copied from its parent, and we will see it during
allproc traverse.
Other potential solutions are still under evaluation.
Discussed with: davidxu, jhb, rwatson
PR: kern/108071
MFC after: 2 weeks
semi-automatic style(9)
The futex stuff already differs a lot (only a small part does not differ)
from NetBSD, so we are already way off and can't apply changes from NetBSD
automatically. As we need to merge everything by hand already, we can even
make the files comply to our world order.
(external) microphone pin tend to screw it. Internal microphone (found
on several laptops) still need high VRef.
Tested by: Pietro Cerutti <pietro.cerutti@gmail.com>
lenix <irc.freenode.net>
immediately flag any page that is allocated to a OBJT_PHYS object as
unmanaged in vm_page_alloc() rather than waiting for a later call to
vm_page_unmanage(). This allows for the elimination of some uses of
the page queues lock.
Change the type of the kernel and kmem objects from OBJT_DEFAULT to
OBJT_PHYS. This allows us to take advantage of the above change to
simplify the allocation of unmanaged pages in kmem_alloc() and
kmem_malloc().
Remove vm_page_unmanage(). It is no longer used.
It is built in the same module as IPv4 multicast forwarding, i.e. ip_mroute.ko,
if and only if IPv6 support is enabled for loadable modules.
Export IPv6 forwarding structs to userland netstat(1) via sysctl(9).