using the o32 ABI. This mostly follows nwhitehorn's lead in implementing
COMPAT_FREEBSD32 on powerpc64.
o) Add a new type to the freebsd32 compat layer, time32_t, which is time_t in the
32-bit ABI being used. Since the MIPS port is relatively-new, even the 32-bit
ABIs use a 64-bit time_t.
o) Because time{spec,val}32 has the same size and layout as time{spec,val} on MIPS
with 32-bit compatibility, then, disable some code which assumes otherwise
wrongly when built for MIPS. A more general macro to check in this case would
seem like a good idea eventually. If someone adds support for using n32
userland with n64 kernels on MIPS, then they will have to add a variety of
flags related to each piece of the ABI that can vary. That's probably the
right time to generalize further.
o) Add MIPS to the list of architectures which use PAD64_REQUIRED in the
freebsd32 compat code. Probably this should be generalized at some point.
Reviewed by: gonzo
and not asynchronously. This fixes problems related to USB system
suspend and resume. It is assumed that we are always allowed to sleep
from the device_suspend() method.
MFC after: 1 week
Submitted by: jkim
significantly. Upon investigation this was caused by name cache
misses for lookups of "..". For name cache entries for non-".."
directories, the cache entry serves double duty. It maps both the
named directory plus ".." for the parent of the directory. As such,
two ctime values (one for each of the directory and its parent) need
to be saved in the name cache entry.
This patch adds an entry for ctime of the parent directory to the
name cache. It also adds an additional uma zone for large entries
with this time value, in order to minimize memory wastage.
As well, it fixes a couple of cases where the mtime of the parent
directory was being saved instead of ctime for positive name cache
entries. With this patch, Lookup RPC counts return to values similar
to pre-r230394 kernels.
Reported by: bde
Discussed with: kib
Reviewed by: jhb
MFC after: 2 weeks
cards that should be handled by the mfi(4) driver.
The root of the problem is that the mpt(4) driver was masking off the
bottom bit of the PCI device ID when deciding which cards to attach to.
It appears that a number of the mpt(4) Fibre Channel cards had a LAN
variant whose PCI device ID was just one bit off from the FC card's device
ID. The FC cards were even and the LAN cards were odd.
The problem was that this pattern wasn't carried over on the SAS and
parallel SCSI mpt(4) cards. Luckily the SAS and parallel SCSI PCI device
IDs were either even numbers, or they would get masked to a supported
adjacent PCI device ID, and everything worked well.
Now LSI is using some of the odd-numbered PCI device IDs between the 3Gb
SAS device IDs for their new MegaRAID cards. This is causing the mpt(4)
driver to attach to the RAID cards instead of the mfi(4) driver.
The solution is to stop masking off the bottom bit of the device ID, and
explicitly list the PCI device IDs of all supported cards.
This change should be a no-op for mpt(4) hardware. The only intended
functional change is that for the 929X, the is_fc variable gets set. It
wasn't being set previously, but needs to be because the 929X is a Fibre
Channel card.
Reported by: Kashyap Desai <Kashyap.Desai@lsi.com>
MFC After: 3 days
handle address, where we're using handles as raw addresses.
This fixes devices with subregions on Octeon PCI specifically, and likely also on
MIPS more generally, where there isn't another bus_space in use that was doing the
math already.
The tag enforces a single restriction that all DMA transactions must not
cross a 4GB boundary. Note that while this restriction technically only
applies to PCI-express, this change applies it to all PCI devices as it
is simpler to implement that way and errs on the side of caution.
- Add a softc structure for PCI bus devices to hold the bus_dma tag and
a new pci_attach_common() routine that performs actions common to the
attach phase of all PCI bus drivers. Right now this only consists of
a bootverbose printf and the allocate of a bus_dma tag if necessary.
- Adjust all PCI bus drivers to allocate a PCI bus softc and to call
pci_attach_common() from their attach routines.
MFC after: 2 weeks
associated with the previous vnode (if any) associated with the target of
a rename(). Otherwise, a lookup of the target pathname concurrent with a
rename() could re-add a name cache entry after the namei(RENAME) lookup
in kern_renameat() had purged the target pathname.
MFC after: 2 weeks
interface supported by mvs(4) are 88SX, while AHCI-like chips are 88SE.
PR: kern/165271
Submitted by: Jia-Shiun Li <jiashiun@gmail.com>
MFC after: 1 week
The scan code unlocks the comlock and calls into the driver. It then
assumes the state hasn't changed from underneath it.
Although I haven't seen this particular condition trigger, I'd like to
be informed if I or anyone else sees it.
What I'm thinking may occur:
* A cancellation comes in during the scan_end call;
* the cancel flag is set;
* but it's never checked, so scandone isn't updated;
* .. and the interface stays in the STA power save mode.
It's a subtle race, if it even exists.
PR: kern/163318
clients.
These are helful when making certain drivers work on both Linux
and FreeBSD without changing the code flow too much.
Reviewed by: kib, wlosh
MFC after: 1 month
pci_cfg_save() and pci_cfg_restore() for device drivers to use when
saving and restoring state (e.g. to handle device-specific resets).
Reviewed by: imp
MFC after: 2 weeks
long for specifying a boundary constraint.
- Change bus_dma tags to use bus_addr_t instead of bus_size_t for boundary
constraints.
These allow boundary constraints to be fully expressed for cases where
sizeof(bus_addr_t) != sizeof(bus_size_t). Specifically, it allows a
driver to properly specify a 4GB boundary in a PAE kernel.
Note that this cannot be safely MFC'd without a lot of compat shims due
to KBI changes, so I do not intend to merge it.
Reviewed by: scottl
snapshots on UFS filesystems running with journaled soft updates.
This is the first of several bugs that need to be fixed before
removing the restriction added in -r230250 to prevent the use
of snapshots on filesystems running with journaled soft updates.
The deadlock occurs when holding the snapshot lock (snaplk)
and then trying to flush an inode via ffs_update(). We become
blocked by another process trying to flush a different inode
contained in the same inode block that we need. It holds the
inode block for which we are waiting locked. When it tries to
write the inode block, it gets blocked waiting for the our
snaplk when it calls ffs_copyonwrite() to see if the inode
block needs to be copied in our snapshot.
The most obvious place that this deadlock arises is in the
ffs_copyonwrite() routine when it updates critical metadata
in a snapshot and tries to write it out before proceeding.
The fix here is to write the data and indirect block pointer
for the snapshot, but to skip the call to ffs_update() to
write the snapshot inode. To ensure that we will never have
to update a pointer in the inode itself, the ffs_snapshot()
routine that creates the snapshot has to ensure that all the
direct blocks are allocated as part of the creation of the
snapshot.
A less obvious place that this deadlock occurs is when we hold
the snaplk because we are deleting a snapshot. In the course of
doing the deletion, we need to allocate various soft update
dependency structures and allocate some journal space. If we
hit a resource limit while doing this we decrease the resources
in use by flushing out an existing dirty file to get it to give
up the soft dependency resources that it holds. The flush can
cause an ffs_update() to be done on the inode for the file that
we have selected to flush resulting in the same deadlock as
described above when the inode that we have chosen to flush
resides in the same inode block as the snapshot inode that we hold.
The fix is to defer cleaning up any time that the inode on which
we are operating is a snapshot.
Help and review by: Jeff Roberson
Tested by: Peter Holm
MFC (to 9 only) after: 2 weeks
installs clang as /usr/bin/cc, /usr/bin/c++ and /usr/bin/cpp.
Note this does *not* disable building and installing gcc, which will
still be available as /usr/bin/gcc, /usr/bin/g++ and /usr/bin/gcpp. If
you want to disable gcc completely, you must use WITHOUT_GCC.
MFC after: 2 weeks
operations for setting and accessing vnode's v_socket field.
The operations are necessary to implement proper unix socket handling
on layered file systems like nullfs(5).
This change fixes the long standing issue with nullfs(5) being in that
unix sockets did not work between lower and upper layers: if we bound
to a socket on the lower layer we could connect only to the lower
path; if we bound to the upper layer we could connect only to the
upper path. The new behavior is one can connect to both the lower and
the upper paths regardless what layer path one binds to.
PR: kern/51583, kern/159663
Suggested by: kib
Reviewed by: arch
MFC after: 2 weeks
sys/xen/interface/io/blkif.h:
o Insert space in "Red Hat".
o Fix typo "discard-aligment" -> "discard-alignment"
o Fix typo "unamp" -> "unmap"
o Fix typo "formated" -> "formatted"
o Clarify the text for "params".
o Clarify the text for "sector-size".
o Clarify the text for "max-requests" in the backend section.
instead of accepting half-constructed vnode. Previous code cannot decide
what to do with such vnode anyway, and although processing it for hash
removal, paniced later when getting rid of nullfs reference on lowervp.
While there, remove initializations from the declaration block.
Tested by: pho
MFC after: 1 week
function null_destroy_proto() from null_insmntque_dtr(). Also
apply null_destroy_proto() in null_nodeget() when we raced and a vnode
is found in the hash, so the currently allocated protonode shall be
destroyed.
Lock the vnode interlock around reassigning the v_vnlock.
In fact, this path will not be exercised after several later commits,
since null_nodeget() cannot take shared-locked lowervp at all due to
insmntque() requirements.
Reported by: rea
Tested by: pho
MFC after: 1 week
- Reserver respective number of addresses for managment port
- octm uses base address directly
- other drivers get MACs on "first come first served" basis
Reviewed by: juli
external pagers in Mach. FreeBSD doesn't implement external pagers.
Moreover, it don't pageout the kernel object. So, the reasons for
having code don't hold.
Reviewed by: kib
MFC after: 6 weeks
following clang warning:
sys/kern/sys_pipe.c:1556:10: error: promoted type 'int' of K&R function parameter is not compatible with the parameter type 'mode_t'
(aka 'unsigned short') declared in a previous prototype [-Werror]
mode_t mode;
^
sys/kern/sys_pipe.c:155:19: note: previous declaration is here
static fo_chmod_t pipe_chmod;
^
Enforce a boundary of no more than 4GB - transfers crossing a 4GB
boundary can lead to data corruption due to PCIe limitations. This
change is a less-intrusive workaround that can be quickly merged back
to older branches; a cleaner implementation will arrive in HEAD later
but may require KPI changes.
This change is based on a suggestion by jhb@.
Reviewed by: scottl, jhb
Sponsored by: Sandvine Incorporated
MFC after: 3 days
amd64/i386/pc98 endian.h with stubs.
In __bswap64_const(x) the conflict between 0xffUL and 0xffULL has been
resolved by reimplementing the macro in terms of __bswap32(x). As a side
effect __bswap64_var(x) is now implemented using two bswap instructions on
i386 and should be much faster. __bswap32_const(x) has been reimplemented
in terms of __bswap16(x) for consistency.
get rid of testing explicitly for clang (using ${CC:T:Mclang}) in
individual Makefiles.
Instead, use the following extra macros, for use with clang:
- NO_WERROR.clang (disables -Werror)
- NO_WCAST_ALIGN.clang (disables -Wcast-align)
- NO_WFORMAT.clang (disables -Wformat and friends)
- CLANG_NO_IAS (disables integrated assembler)
- CLANG_OPT_SMALL (adds flags for extra small size optimizations)
As a side effect, this enables setting CC/CXX/CPP in src.conf instead of
make.conf! For clang, use the following:
CC=clang
CXX=clang++
CPP=clang-cpp
MFC after: 2 weeks
corruption. Thanks to scottl@ for the suggestion.
This change will likely be revised after consideration of a general
method to address this type of issue for other drivers.
Sponsored by: Sandvine Incorporated
MFC after: 3 days
extract a link status of PHY when parent driver is re(4).
RGEPHY_MII_SSR register does not seem to report correct PHY status
on some integrated PHYs used with re(4).
Unfortunately, RealTek PHYs have no additional information to
differentiate integrated PHYs from external ones so relying on PHY
model number is not enough to know that. However, it seems
RGEPHY_MII_SSR register exists for external RealTek PHYs so
checking parent driver would be good indication to know which PHY
was used. In other words, for non-re(4) controllers, the PHY is
external one and its revision number is greater than or equal to 2.
This change fixes intermittent link UP/DOWN messages reported on
RTL8169 controller.
Also, mii_attach(9) is tried after setting interface name since
rgephy(4) have to know parent driver name.
PR: kern/165509
not get syscall exit notification until the child performed exec of
exit. Swap the order of doing ptracestop() and waiting for P_PPWAIT
clearing, by postponing the wait into syscallret after ptracestop()
notification is done.
Reported, tested and reviewed by: Dmitry Mikulin <dmitrym juniper net>
MFC after: 2 weeks
USERSPACE:
1. add support for devices with different number of rx and tx queues;
2. add better support for zero-copy operation, adding an extra field
to the netmap ring to indicate how many buffers we have already processed
but not yet released (with help from Eddie Kohler);
3. The two changes above unfortunately require an API change, so while
at it add a version field and some spares to the ioctl() argument
to help detect mismatches.
4. update the manual page for the two changes above;
5. update sample applications in tools/tools/netmap
KERNEL:
1. simplify the internal structures moving the global wait queues
to the 'struct netmap_adapter';
2. simplify the functions that map kring<->nic ring indexes
3. normalize device-specific code, helps mainteinance;
4. start exploring the impact of micro-optimizations (prefetch etc.)
in the ixgbe driver.
Use 'legacy' descriptors on the tx ring and prefetch slots gives
about 20% speedup at 900 MHz. Another 7-10% would come from removing
the explict calls to bus_dmamap* in the core (they are effectively
NOPs in this case, but it takes expensive load of the per-buffer
dma maps to figure out that they are all NULL.
Rx performance not investigated.
I am postponing the MFC so i can import a few more improvements
before merging.
APIC is not found.
- Don't panic if lapic_enable_cmc() is called and the APIC is not enabled.
This can happen due to booting a kernel with APIC disabled on a CPU that
supports CMCI.
- Wrap a long line.
Descriptions are specific to drivers and we don't change drivers on attached
devices. This fixes a few places where we were not clearing the description
when detaching a driver (e.g. with device_attach() failed). While here, fix
a few other nits:
- Remove spurious call to remove a device's driver from
devclass_driver_deleted(). device_detach() removes it already.
- Fix a typo.
- In sched_pickcpu() be more careful taking previous CPU on SMT systems.
Do it only if all other logical CPUs of that physical one are idle to avoid
extra resource sharing.
- In sched_pickcpu() change general logic of CPU selection. First
look for idle CPU, sharing last level cache with previously used one,
skipping SMT CPU groups. If none found, search all CPUs for the least loaded
one, where the thread with its priority can run now. If none found, search
just for the least loaded CPU.
- Make cpu_search() compare lowest/highest CPU load when comparing CPU
groups with equal load. That allows to differentiate 1+1 and 2+0 loads.
- Make cpu_search() to prefer specified (previous) CPU or group if load
is equal. This improves cache affinity for more complicated topologies.
- Randomize CPU selection if above factors are equal. Previous code tend
to prefer CPUs with lower IDs, causing unneeded collisions.
- Rework periodic balancer in sched_balance_group(). With cpu_search()
more intelligent now, make balansing process flat, removing recursion
over the topology tree. That fixes double swap problem and makes load
distribution more even and predictable.
All together this gives 10-15% performance improvement in many tests on
CPUs with SMT, such as Core i7, for number of threads is less then number
of logical CPUs. In some tests it also gives positive effect to systems
without SMT.
Reviewed by: jeff
Tested by: flo, hackers@
MFC after: 1 month
Sponsored by: iXsystems, Inc.
Uftdi(4) examines (c_iflag & (IXON|IXOFF)) to control hw XON-XOFF support.
This is obviously no good, if changes to those bits are not communicated
down the stack.
allow.mount.zfs:
allow mounting the zfs filesystem inside a jail
This way the permssions for mounting all current VFCF_JAIL filesystems
inside a jail are controlled wia allow.mount.* jail parameters.
Update sysctl descriptions.
Update jail(8) and zfs(8) manpages.
TODO: document the connection of allow.mount.* and VFCF_JAIL for kernel
developers
MFC after: 10 days
socket protocol number. This is useful since the socket type can
be implemented by different protocols in the same protocol family,
e.g. SOCK_STREAM may be provided by both TCP and SCTP.
Submitted by: Jukka A. Ukkonen <jau iki fi>
PR: kern/162352
Discussed with: bz
Reviewed by: glebius
MFC after: 2 weeks
been bait-and-switched from the rate control code.
This will avoid the panic that I saw and will avoid sending invalid rates
(eg 11a/11g OFDM rates when in 11b, on 11b-only NICs (AR5211)) where the
rate table is not "big".
It also will point out situations where this occurs for the 11n NICs
which will have sufficiently large rate tables that "invalid rix" doesn't
occur.
I'll try to follow this up with a commit that adds a current operating mode
check. The "rix" is only relevant to the current operating mode and rate
table.
PR: kern/165475
* ath_reset() is being called in softclock context, which may have the
thing sleep on a lock. To avoid this, since we really _shouldn't_
be sleeping on any locks, break out the no-loss reset path into a tasklet
and call that from:
+ ath_calibrate()
+ ath_watchdog()
This has the added advantage that it'll end up also doing the frame
RX cleanup from within the taskqueue context, rather than the softclock
context.
* Shuffle around the taskqueue_block() call to be before we grab the lock
and disable interrupts.
The trouble here is that taskqueue_block() doesn't block currently
queued (but not yet running) tasks so calling it doesn't guarantee
no further tasks (that weren't running on _A_ CPU at the time of this
call) will complete. Calling taskqueue_drain() on these tasks won't
work because if any _other_ thread calls taskqueue_enqueue() for whatever
reason, everything gets very angry and stops working.
This slightly changes the race condition enough to let ath_rx_tasklet()
run before we try disabling it, and thus quietens the warnings a bit.
The (more) true solution will be doing something like the following:
* having a taskqueue_blocked mask in ath_softc;
* having an interrupt_blocked mask in ath_softc;
* only calling taskqueue_drain() on each individual task _after_ the
lock has been acquired - that way no further tasklet scheduling
is going to occur.
* Then once the tasks have been blocked _and_ the interrupt has been
disabled, call taskqueue_drain() on each, ensuring that anything
that _was_ scheduled or running is removed.
The trouble is if something calls taskqueue_enqueue() on a task
after taskqueue_blocked() has been called but BEFORE taskqueue_drain()
has been called, ta_pending will be set to 1 and taskqueue_drain()
will sit there stuck in msleep() until you hard-kill the machine.
PR: kern/165382
PR: kern/165220
unp->unp_vnode pointer to detect if there is a vnode associated with
(binded to) this socket and does necessary cleanup if there is.
The issue is that after forced unmount this check may be too late as
the unp_vnode is reclaimed and the reference is stale.
To fix this provide a helper function that is called on a socket vnode
reclamation to do necessary cleanup.
Pointed by: kib
Reviewed by: kib
MFC after: 2 weeks
RTL810x family , RTL8139 has different register map for Config
registers.
While here, follow the lead of re(4) in WOL configuration.
- Disable WOL_UCAST and WOL_MCAST capabilities by default.
- Config5 register write does not need to unlock EEPROM access
on RTL8139 family but unlocking EEPROM access does not affect
its operation and make it consistent with re(4).
Reported by: Matt Renzelmann mjr <> cs dot wisc dot edu
according to POSIX document, the clock ID may be dynamically allocated,
it unlikely will be in 64K forever. To make it future compatible, we
pack all timeout information into a new structure called _umtx_time, and
use fourth argument as a size indication, a zero means it is old code
using timespec as timeout value, but the new structure also includes flags
and a clock ID, so the size argument is different than before, and it is
non-zero. With this change, it is possible that a thread can sleep
on any supported clock, though current kernel code does not have such a
POSIX clock driver system.
call in an #if 0 section.
In in6_selecthlim() optimize a case where in6p cannot be NULL due to an
earlier check.
More consistently use u_int instead of int for fibnum function arguments.
Sponsored by: Cisco Systems, Inc.
MFC after: 3 days
bridge, this allows us to have more than one independent bridge in the same
STP domain.
PR: kern/164369
Submitted by: Nikos Vassiliadis (earlier version)
MFC after: 2 weeks
It doesn't _really_ help all that much, I'll commit something to
sys/net/if.c at some point explaining why, but the lock should be held
when checking/manipulating/branching because of said lock.
comlock, I'd like to find and analyse these cases to see if they
really are valid.
So, throw in a lock here and wait for the (hopefully!) inevitable
complaints.
- Remove all attempts to guess physical temperature using DiodeOffset.
There are too many reports that it varies wildly depending on motherboard.
Instead, if it is known to scale well and its offset is known from other
temperature sensors on board, the user may set "dev.amdtemp.0.sensor_offset"
tunable to compensate the difference. Document the caveats in amdtemp(4).
- Add a quirk for Socket AM2 Revision G processors. These processors are
known to have a different offset according to Linux k8temp driver.
- Warn about Family 10h Erratum 319. These processors have broken sensors.
- Report temperature in more logical orders under dev.amdtemp node. For
example, "dev.amdtemp.0.sensor0.core0" is now "dev.amdtemp.0.core0.sensor0".
- Replace K8, K10 and K11 with official processor names in amdtemp(4).
v_writecount. Keep the amount of the virtual address space used by
the mappings in the new vm_object un_pager.vnp.writemappings
counter. The vnode v_writecount is incremented when writemappings gets
non-zero value, and decremented when writemappings is returned to
zero.
Writeable shared vnode-backed mappings are accounted for in vm_mmap(),
and vm_map_insert() is instructed to set MAP_ENTRY_VN_WRITECNT flag on
the created map entry. During deferred map entry deallocation,
vm_map_process_deferred() checks for MAP_ENTRY_VN_WRITECOUNT and
decrements writemappings for the vm object.
Now, the writeable mount cannot be demoted to read-only while
writeable shared mappings of the vnodes from the mount point
exist. Also, execve(2) fails for such files with ETXTBUSY, as it
should be.
Noted by: tegge
Reviewed by: tegge (long time ago, early version), alc
Tested by: pho
MFC after: 3 weeks
a new jail parameter node with the following parameters:
allow.mount.devfs:
allow mounting the devfs filesystem inside a jail
allow.mount.nullfs:
allow mounting the nullfs filesystem inside a jail
Both parameters are disabled by default (equals the behavior before
devfs and nullfs in jails). Administrators have to explicitly allow
mounting devfs and nullfs for each jail. The value "-1" of the
devfs_ruleset parameter is removed in favor of the new allow setting.
Reviewed by: jamie
Suggested by: pjd
MFC after: 2 weeks
at which the lle_tbl pointer points to freed memory and the llt_free pointer is no longer
valid.
Move the free pointer in to the llentry itself and update the initalization sites.
MFC after: 2 weeks
"panic in 8.3-PRERELEASE" on Feb. 22, 2012. This panic was caused
by use of a mix of tsleep() and msleep() calls on the same event
in the new NFS server DRC code. It did "mtx_unlock(); tsleep();"
in two places, which kib@ noted introduced a slight risk that the
wakeup() would occur before the tsleep(), resulting in a 10sec
delay before waking up. This patch fixes the problem by replacing
"mtx_unlock(); tsleep();" with mtx_sleep(..PDROP..). It also
changes a nfsmsleep() call to mtx_sleep() so that the code uses
mtx_sleep() consistently within the file.
Tested by: hrs (in progress)
Reviewed by: jhb
MFC after: 5 days
to the debugger. When reparenting for debugging, keep the child in
the new orphan list of old parent. When looping over the children in
kern_wait(), iterate over both children list and orphan list to search
for the process by pid.
Submitted by: Dmitry Mikulin <dmitrym juniper.net>
MFC after: 2 weeks
I'm not sure _why_ the ic is NULL here, but I've seen it occasionally do
this after I've been tinkering with things for a while. It ends up
crashing in a call to ath_chan_set() via the net80211 scan code and scan
task.
don't give RX path more priority than TX path.
Also remove infinite loop in interrupt handler and limit number of
iteration to 32. This change addresses system load fluctuations
under high network load.