XXX It would be good to use a better way to size intrcnt.
o) Fix literal 4s that are supposed to be sizeof (u_long).
XXX Why the * 2 here? Is this an artifact of a different system that this
code came from? We seem to allocate twice as much space for intrcnt
as we admit to in sintrcnt.
In a very noisy 2.4GHz environment (with HT/40 enabled, making it worse)
I saw the following occur:
* the air was considered "busy" a lot of the time;
* the cabq time is quite short due to staggered beacons being enabled;
* it just wasn't able to keep up TX'ing CABQ frames;
* .. and the cabq would swallow up all the TX ath_buf's.
This patch introduces a twiddle which allows the maximum cabq depth to be
set, forcing further frames to be dropped.
It defaults to the TX buffer count at the moment, so the default behaviour
isn't changed.
I've also started fleshing out a similar setup for the data path, so
it doesn't swallow up all the available TX buffers and preventing management
frames (such as ADDBA) out.
PR: kern/165895
frames with stations in power saving mode.
I'm not (yet) sure how to handle TX'ing aggregates frames to stations
that are in power saving mode, or whether that's even a feasible thing
to do. So in order to (mostly) not forget, leave a couple of comments
in the code.
The code presently assumes that the aggregation TID state for an ath_node
is locked not by the ath_node lock or a node+TID lock, but behind the
hardware queue said TID maps to. This assumption is going to be
incorrect for stations in power saving mode as we'll be TX'ing frames
on the multicast queue.
In any case, I'm afraid its a "later problem". :/
didn't already have them. This is because the ternary expression will
return int, due to the Usual Arithmetic Conversions. Such casts are not
needed for the 32 and 64 bit variants.
While here, add additional parentheses around the x86 variant, to
protect against unintended consequences.
MFC after: 2 weeks
amd64, if 'device isa' is present quiesce the 8259A's during boot and
resume from suspend.
While here, be more selective on amd64 about which kernel configurations
need elcr.c.
MFC after: 2 weeks
- Return failure for a suspend attempt if we have no wake address.
- Use intr_disable()/intr_restore() instead of ACPI_DISABLE_IRQS().
- Invoke intr_suspend() earlier and call intr_resume() if suspend
fails.
- Use pause in the loop waiting for CPU to suspend.
- Restore PAT MSR, switchtime, switchticks, and MTRRs on resume.
Reviewed by: jkim (earlier version)
MFC after: 2 weeks
- Remove extern "C". There are no functions with external linkage here. [1]
- Rename bswapNN_const(x) to bswapNN_gen(x) to indicate that these macros
are generic implementations that can take non-constant arguments. [1]
- Split up __GNUCLIKE_ASM && __GNUCLIKE_BUILTIN_CONSTANT_P and deal with
each separately.
- Replace _LP64 with __amd64__ because asm instructions are machine
dependent, not ABI dependent.
Submitted by: bde [1]
Reviewed by: bde
This function must be called with both the source and destination TXQs
locked or things will get hairy.
I added this as part of some debugging in a PR but it turned out to not
be the cause. I still think it's -correct- so, here it is.
the last buffer in the list.
The current behaviour (due to me, so pointy hat is firmly on my head here)
was incorrect - it was setting the link pointer to the last descriptor
of the _first_ buffer in the TXQ. Instead, it should have set it to the
last descriptor in the _last_ buffer in the TXQ.
This showed up as occasional TX stalls with frames in the TXQ but no
TX progress being made. Further inspection showed the TXQ looked like
it contained multiple "lists" of frames - there'd be a list of correct
frames, then a NULL link pointer, but there'd be a next buffer in the
list.
Since this code is only called upon an interface reset, it's likely
this only began showing up when I started doing stress testing
in environments which annoy the radios enough to cause lockups.
I've not yet any TX stalls with this patch applied.
PR: kern/165866
Expand pci_save_state and pci_restore_state to save more of
the config state for PCI Express and PCI-X devices. Various
writable control registers are present in PCI Express that
can potentially be lost over suspend/resume cycle.
This change is modeled after similar functionality in Linux.
Reviewed by: wlosh,jhb
MFC after: 1 month
When using big inodes there is sufficient space in ext3 to
keep extra resolution and birthtime (creation) timestamps.
The appropriate fields in the on-disk inode have been approved
for a long time but support for this in ext3 has not been
widely distributed.
In preparation for ext4 most linux distributions have enabled
by default such bigger inodes and some people use nanosecond
timestamps in ext3. We now support those when the inode is big
enough and while we do recognize the EXT4F_ROCOMPAT_EXTRA_ISIZE,
we maintain the extra timestamps even when they are not used.
An additional note by Bruce Evans:
We blindly accept unrepresentable tv_nsec in VOP_SETATTR(), but
all file systems have always done that. When POSIX gets around
to specifying the behaviour, it will probably require certain
rounding to the fs's resolution and not rejecting the request.
This unfortunately means that syscalls that set times can't
really tell if they succeeded without reading back the times
using stat() or similar and checking that they were set close
enough.
Reviewed by: bde
Approved by: jhb (mentor)
MFC after: 2 weeks
the cached name used for KTR_SCHED traces when a thread's name changes.
This way KTR_SCHED traces (and thus schedgraph) will notice when a thread's
name changes, most commonly via execve().
MFC after: 2 weeks
revision 1.146
date: 2010/05/12 08:11:11; author: claudio; state: Exp; lines: +2 -3
bzero() the full compressed update struct before setting the values.
This is needed because pf_state_peer_hton() skips some fields in certain
situations which could result in garbage beeing sent to the other peer.
This seems to fix the pfsync storms seen by stephan@ and so dlg owes me
a whiskey.
I didn't see any storms, but this definitely fixes a useless memory
allocation on the receiving side, due to non zero scrub_flags field
in a pfsync_state_peer structure.
It's not clear to a user what they should do after seeing the "geometry
does not match label" kernel message, and it does not appear to present
a problem in practice. Thus, just remove the messages.
Approved by: marcel
all for platforms that only have 32-bit bus addresses. Second, remove
the 'tag_valid' flag from the softc. Instead, if we don't create a
tag in pci_attach_common(), just cache the value of our parent's tag
so that we always have a valid tag to return.
fifo_iseof() condition, allowing the v_fifoinfo to be reset and freed
by fifo_cleanup().
Precalculate EOF at the places were fo_wgen is changed, and cache the
state in a new pipe state flag PIPE_SAMEWGEN.
Reported and tested by: bf
Submitted by: gianni
MFC after: 1 week (a backport)
* Added verify_mesh_*_len functions that verify the length
according to the amendment spec and return number of destination addresses
for allocation of appropriate struct size in memory;
* Modified hwmp_recv_action_meshpath to allocate HWMP ie instead of
storing them on the stack and store all available field according the flags;
* Modify hwmp_add_mesh* to work with all cases of HWMP according to amendment.
* Modify hwmp_send_* to calculate correct len of bytes for the HWMP ie.
* Added new M_80211_MESH_* malloc defines.
* Added macros with magic numbers for HWMP ie sizes according to amendment.
* Added the external address to all HWMP ie structs.
Submitted by: monthadar@gmail.com
platforms.
This will make every attempt to mount a non-mpsafe filesystem to the
kernel forbidden, unless it is expressely compiled with
VFS_ALLOW_NONMPSAFE option.
This patch is part of the effort of killing non-MPSAFE filesystems
from the tree.
No MFC is expected for this patch.
o) Get rid of some unused macros related to features we don't intend to
provide.
o) Get rid of macro definitions for MIPS-I CPUs. We are not likely to
support anything that predartes MIPS-III.
o) Respell MIPS3_* macros as MIPS_*, which is how most of them were being
used already.
o) Eliminate a duplicate and mostly-unused set of exception vector macros.
There's still considerable duplication and lots more obsolete in our headers,
but this reduces one of the larger files to a size where one could reckon
about the correctness of its contents with a mere few hours of contemplation.
There is, of course, a question of whether we need definitions for fields,
registers and configurations that we are unlikely to ever use or implement,
even if they're not obsolete since 1991. FreeBSD is not a processor
reference manual, and things that aren't used may be wrong, or may be
duplicated because nobody could possibly actually know whether they're
already defined.
Winbond Super I/O chips.
With minor efforts it should be possible the extend the driver to support
further chips/revisions available from Winbond. In the simplest case
only new IDs need to be added, while different chipsets might require
their own function to enter extended function mode, etc.
Sponsored by: Sandvine Incorporated ULC (in 2011)
Reviewed by: emaste, brueffer
MFC after: 2 weeks
the native sigreturn doesn't use set_mcontext like the COMPAT_FREEBSD32 version
does, this wouldn't actually result in overwriting the TLS base. Probably it
makes sense to restructure the native sigreturn to use set_mcontext for
consistency, and to allow sigreturn to change the TLS base.
TLS:
o) The mc_tls field used to store the TLS base when doing context gets and
restores was left a pointer and not converted to a 32-bit integer. This
had the bug of not correctly capturing the TLS value desired by the user,
and the extra nastiness of making the structure the wrong size.
o) The mc_tls field was not being saved by sendsig. As a result, the TLS base
would always be set to NULL when restoring from a signal handler.
Thanks to gonzo for helping track down a bunch of other TLS bugs that came out
of tracking these down.
and offset it only if requested by RDHWR handler. Otherwise things
get overly complicated - we need to track whether address passsed in
request for setting td_md.md_tls is already offseted or not.
a loader or kernel. Specifically, kname cannot be pointed at cmd[] since
it's value is change to be an empty string after the initial call to
parse, and cmd[]'s value can be changed (thus losing a prior setting for
kname) due to user input at the boot prompt. While here, ensure that that
initial boot config file text is nul-terminated, that ops is initialized
to zero, and that kname is always initialized to a valid string.
Tested by: Domagoj Smolcic rank1seeker of gmail
MFC after: 1 week
amd64/i386/pc98 ptrace.h with stubs.
For amd64 PT_GETXSTATE and PT_SETXSTATE have been redefined to match the
i386 values. The old values are still supported but should no longer be
used.
Reviewed by: kib
This lets specify whereabouts of the parent PHY for a given MAC node
(and get rid of ugly kludges in mge(4) and tsec(4)).
Obtained from: Semihalf
MFC after: 1 week
before vnet_mroute_init(), since vnet_mroute_init() depends on mfchashsize
tunable to be set, and that is done in in ip_mroute_modevent().
Apparently I broke that ordering with r208744 almost 2 years ago...
PR: kern/162201
Submitted by: Stevan Markovic (mcafee.com)
MFC after: 3 days
After getting the current irql, if the kthread gets preempted and
subsequently runs on a different CPU, the saved irql could be wrong.
Also, correct the panic string.
PR: kern/165630
Submitted by: Vladislav Movchan <vladislav.movchan at gmail.com>
does not fit into registers, declare that we do not support this case
using CTASSERT(), and remove endianess-unsafe code to split return value
into td_retval.
While there, change the style of the sysctl debug.iosize_max_clamp
definition.
Requested by: bde
MFC after: 3 weeks
Without this patch we were not able to see the assembly function.
Only the function descriptor was visible.
- Distinguish between user-land and kernel when creating the ENTRY() point of
assembly source.
- Make the ENTRY() macro more readable, replace the .align directive with the
gas platform independant .p2align directive.
- Create an END()macro for later use to provide traceback tables on powerpc64.
These fans are not located under the same node as the the RPM controlled ones,
So I had to adapt the current source to parse and fill the properties correctly.
To control the fans we can set the PWM ratio via sysctl between 20 and 100%.
Tested by: nwhitehorn
MFC after: 3 weeks
* Change in mesh_input to validate that QoS is set and Mesh Control field
is present, also both bytes of the QoS are read;
* Moved defragmentation in mesh_input before we try to forward packet as
inferred from amendment spec, because Mesh Control field only present in first
fragment;
* Changed in ieee80211_encap to set QoS subtype and Mesh Control field present,
only first fragment have Mesh Control field present bit equal to 1;
Submitted by: monthadar@gmail.com
* Moved old categories as specified by D4.0 to be action fields of MESH category
as specified in amendment spec;
* Modified functions to use MESH category and its action fields:
+ ieee80211_send_action_register
+ ieee80211_send_action
+ ieee80211_recv_action_register
+ieee80211_recv_action;
* Modified ieee80211_hwmp_init and hwmp_send_action so they uses correct
action fields as specified in amendment spec;
* Modified ieee80211_parse_action so that it verifies MESH frames.
* Change Mesh Link Metric to use one information element as amendment spec.
Draft 4.0 defined two different information elements for request and response.
Submitted by: monthadar@gmail.com
version was missing an else and would always use the n64 TP_OFFSET. Eliminate
some duplication of logic here.
It may be worth getting rid of some of the ifdefs and introducing gratuitous
SV_ILP32 runtime checks on n64 kernels without COMPAT_FREEBSD32 and on o32
kernels, similarly to how PowerPC works.
that it is better to error out when people attempt to build using the
wrong bsd.*.mk files, than to silently ignore the problem.
This means, that after this commit, if you want to build kernel modules
by hand (or via a port) from a head source tree, you *must* make sure
the files in /usr/share/mk are in sync with that tree. If that isn't
possible, for example when you are running on an older FreeBSD branch,
you can:
- Run "make buildenv" from your head source tree, to have the correct
environment setup. (It's advisable to have run "make buildworld", or
at a minimum "make toolchain" first.)
- Alternatively, set MAKESYSPATH to the share/mk directory under your
head source tree. If your build tools are too old, other problems may
still occur.
- Alternatively, use "make -m" and specify the share/mk directory under
your head source tree. Again, build tools that are too old may still
result in trouble.
MFC after: 2 weeks
not be ideal, but is the ABI we've shipped so far. Fix macros which reflect
the results of _ALIGN on 32-bit MIPS to use the right alignment.
This fixes sendmsg under COMPAT_FREEBSD32 on n64 MIPS kernels.
kernel modules using their old installed /usr/share/mk/bsd.*.mk files,
instead of the updated ones in their source tree. This leads to errors
like:
"sys/conf/kmod.mk", line 111: Malformed conditional (${MK_CLANG_IS_CC} == "no" && ${CC:T:Mclang} != "clang")
Obviously, these errors will go away after a "make installworld", or
alternatively, by using "make buildenv" before attempting to manually
build modules.
However, since it is apparently an expected use case to build using old
.mk files, change the way we test for clang, so it also works when the
MK_CLANG_IS_CC macro doesn't exist.
Note the conditional expressions are becoming rather unreadable now, but
I will attempt to fix that on a followup commit.
MFC after: 2 weeks
- pci_find_extcap() is repurposed to be used for fetching PCI-express
extended capabilities (PCIZ_* constants in <dev/pci/pcireg.h>).
- pci_find_htcap() can be used to locate a specific HyperTransport
capability (PCIM_HTCAP_* constants in <dev/pci/pcireg.h>).
- Cache the starting location of the PCI-express capability for PCI-express
devices in PCI device ivars.
in the new NFS server for NFSv4, where it would report ENOENT
when the file actually existed on the server. This turned out
to be caused by not initializing ni_topdir before calling lookup()
and there was a rare case where the value on the stack location
assigned to ni_topdir happened to be a pointer to a ".." entry,
such that "dp == ndp->ni_topdir" succeeded in lookup().
This patch initializes ni_topdir to fix the problem.
MFC after: 5 days
using the o32 ABI. This mostly follows nwhitehorn's lead in implementing
COMPAT_FREEBSD32 on powerpc64.
o) Add a new type to the freebsd32 compat layer, time32_t, which is time_t in the
32-bit ABI being used. Since the MIPS port is relatively-new, even the 32-bit
ABIs use a 64-bit time_t.
o) Because time{spec,val}32 has the same size and layout as time{spec,val} on MIPS
with 32-bit compatibility, then, disable some code which assumes otherwise
wrongly when built for MIPS. A more general macro to check in this case would
seem like a good idea eventually. If someone adds support for using n32
userland with n64 kernels on MIPS, then they will have to add a variety of
flags related to each piece of the ABI that can vary. That's probably the
right time to generalize further.
o) Add MIPS to the list of architectures which use PAD64_REQUIRED in the
freebsd32 compat code. Probably this should be generalized at some point.
Reviewed by: gonzo
and not asynchronously. This fixes problems related to USB system
suspend and resume. It is assumed that we are always allowed to sleep
from the device_suspend() method.
MFC after: 1 week
Submitted by: jkim
significantly. Upon investigation this was caused by name cache
misses for lookups of "..". For name cache entries for non-".."
directories, the cache entry serves double duty. It maps both the
named directory plus ".." for the parent of the directory. As such,
two ctime values (one for each of the directory and its parent) need
to be saved in the name cache entry.
This patch adds an entry for ctime of the parent directory to the
name cache. It also adds an additional uma zone for large entries
with this time value, in order to minimize memory wastage.
As well, it fixes a couple of cases where the mtime of the parent
directory was being saved instead of ctime for positive name cache
entries. With this patch, Lookup RPC counts return to values similar
to pre-r230394 kernels.
Reported by: bde
Discussed with: kib
Reviewed by: jhb
MFC after: 2 weeks
cards that should be handled by the mfi(4) driver.
The root of the problem is that the mpt(4) driver was masking off the
bottom bit of the PCI device ID when deciding which cards to attach to.
It appears that a number of the mpt(4) Fibre Channel cards had a LAN
variant whose PCI device ID was just one bit off from the FC card's device
ID. The FC cards were even and the LAN cards were odd.
The problem was that this pattern wasn't carried over on the SAS and
parallel SCSI mpt(4) cards. Luckily the SAS and parallel SCSI PCI device
IDs were either even numbers, or they would get masked to a supported
adjacent PCI device ID, and everything worked well.
Now LSI is using some of the odd-numbered PCI device IDs between the 3Gb
SAS device IDs for their new MegaRAID cards. This is causing the mpt(4)
driver to attach to the RAID cards instead of the mfi(4) driver.
The solution is to stop masking off the bottom bit of the device ID, and
explicitly list the PCI device IDs of all supported cards.
This change should be a no-op for mpt(4) hardware. The only intended
functional change is that for the 929X, the is_fc variable gets set. It
wasn't being set previously, but needs to be because the 929X is a Fibre
Channel card.
Reported by: Kashyap Desai <Kashyap.Desai@lsi.com>
MFC After: 3 days
handle address, where we're using handles as raw addresses.
This fixes devices with subregions on Octeon PCI specifically, and likely also on
MIPS more generally, where there isn't another bus_space in use that was doing the
math already.
The tag enforces a single restriction that all DMA transactions must not
cross a 4GB boundary. Note that while this restriction technically only
applies to PCI-express, this change applies it to all PCI devices as it
is simpler to implement that way and errs on the side of caution.
- Add a softc structure for PCI bus devices to hold the bus_dma tag and
a new pci_attach_common() routine that performs actions common to the
attach phase of all PCI bus drivers. Right now this only consists of
a bootverbose printf and the allocate of a bus_dma tag if necessary.
- Adjust all PCI bus drivers to allocate a PCI bus softc and to call
pci_attach_common() from their attach routines.
MFC after: 2 weeks
associated with the previous vnode (if any) associated with the target of
a rename(). Otherwise, a lookup of the target pathname concurrent with a
rename() could re-add a name cache entry after the namei(RENAME) lookup
in kern_renameat() had purged the target pathname.
MFC after: 2 weeks
interface supported by mvs(4) are 88SX, while AHCI-like chips are 88SE.
PR: kern/165271
Submitted by: Jia-Shiun Li <jiashiun@gmail.com>
MFC after: 1 week
The scan code unlocks the comlock and calls into the driver. It then
assumes the state hasn't changed from underneath it.
Although I haven't seen this particular condition trigger, I'd like to
be informed if I or anyone else sees it.
What I'm thinking may occur:
* A cancellation comes in during the scan_end call;
* the cancel flag is set;
* but it's never checked, so scandone isn't updated;
* .. and the interface stays in the STA power save mode.
It's a subtle race, if it even exists.
PR: kern/163318
clients.
These are helful when making certain drivers work on both Linux
and FreeBSD without changing the code flow too much.
Reviewed by: kib, wlosh
MFC after: 1 month
pci_cfg_save() and pci_cfg_restore() for device drivers to use when
saving and restoring state (e.g. to handle device-specific resets).
Reviewed by: imp
MFC after: 2 weeks
long for specifying a boundary constraint.
- Change bus_dma tags to use bus_addr_t instead of bus_size_t for boundary
constraints.
These allow boundary constraints to be fully expressed for cases where
sizeof(bus_addr_t) != sizeof(bus_size_t). Specifically, it allows a
driver to properly specify a 4GB boundary in a PAE kernel.
Note that this cannot be safely MFC'd without a lot of compat shims due
to KBI changes, so I do not intend to merge it.
Reviewed by: scottl
snapshots on UFS filesystems running with journaled soft updates.
This is the first of several bugs that need to be fixed before
removing the restriction added in -r230250 to prevent the use
of snapshots on filesystems running with journaled soft updates.
The deadlock occurs when holding the snapshot lock (snaplk)
and then trying to flush an inode via ffs_update(). We become
blocked by another process trying to flush a different inode
contained in the same inode block that we need. It holds the
inode block for which we are waiting locked. When it tries to
write the inode block, it gets blocked waiting for the our
snaplk when it calls ffs_copyonwrite() to see if the inode
block needs to be copied in our snapshot.
The most obvious place that this deadlock arises is in the
ffs_copyonwrite() routine when it updates critical metadata
in a snapshot and tries to write it out before proceeding.
The fix here is to write the data and indirect block pointer
for the snapshot, but to skip the call to ffs_update() to
write the snapshot inode. To ensure that we will never have
to update a pointer in the inode itself, the ffs_snapshot()
routine that creates the snapshot has to ensure that all the
direct blocks are allocated as part of the creation of the
snapshot.
A less obvious place that this deadlock occurs is when we hold
the snaplk because we are deleting a snapshot. In the course of
doing the deletion, we need to allocate various soft update
dependency structures and allocate some journal space. If we
hit a resource limit while doing this we decrease the resources
in use by flushing out an existing dirty file to get it to give
up the soft dependency resources that it holds. The flush can
cause an ffs_update() to be done on the inode for the file that
we have selected to flush resulting in the same deadlock as
described above when the inode that we have chosen to flush
resides in the same inode block as the snapshot inode that we hold.
The fix is to defer cleaning up any time that the inode on which
we are operating is a snapshot.
Help and review by: Jeff Roberson
Tested by: Peter Holm
MFC (to 9 only) after: 2 weeks
installs clang as /usr/bin/cc, /usr/bin/c++ and /usr/bin/cpp.
Note this does *not* disable building and installing gcc, which will
still be available as /usr/bin/gcc, /usr/bin/g++ and /usr/bin/gcpp. If
you want to disable gcc completely, you must use WITHOUT_GCC.
MFC after: 2 weeks
operations for setting and accessing vnode's v_socket field.
The operations are necessary to implement proper unix socket handling
on layered file systems like nullfs(5).
This change fixes the long standing issue with nullfs(5) being in that
unix sockets did not work between lower and upper layers: if we bound
to a socket on the lower layer we could connect only to the lower
path; if we bound to the upper layer we could connect only to the
upper path. The new behavior is one can connect to both the lower and
the upper paths regardless what layer path one binds to.
PR: kern/51583, kern/159663
Suggested by: kib
Reviewed by: arch
MFC after: 2 weeks
sys/xen/interface/io/blkif.h:
o Insert space in "Red Hat".
o Fix typo "discard-aligment" -> "discard-alignment"
o Fix typo "unamp" -> "unmap"
o Fix typo "formated" -> "formatted"
o Clarify the text for "params".
o Clarify the text for "sector-size".
o Clarify the text for "max-requests" in the backend section.
instead of accepting half-constructed vnode. Previous code cannot decide
what to do with such vnode anyway, and although processing it for hash
removal, paniced later when getting rid of nullfs reference on lowervp.
While there, remove initializations from the declaration block.
Tested by: pho
MFC after: 1 week
function null_destroy_proto() from null_insmntque_dtr(). Also
apply null_destroy_proto() in null_nodeget() when we raced and a vnode
is found in the hash, so the currently allocated protonode shall be
destroyed.
Lock the vnode interlock around reassigning the v_vnlock.
In fact, this path will not be exercised after several later commits,
since null_nodeget() cannot take shared-locked lowervp at all due to
insmntque() requirements.
Reported by: rea
Tested by: pho
MFC after: 1 week
- Reserver respective number of addresses for managment port
- octm uses base address directly
- other drivers get MACs on "first come first served" basis
Reviewed by: juli
external pagers in Mach. FreeBSD doesn't implement external pagers.
Moreover, it don't pageout the kernel object. So, the reasons for
having code don't hold.
Reviewed by: kib
MFC after: 6 weeks
following clang warning:
sys/kern/sys_pipe.c:1556:10: error: promoted type 'int' of K&R function parameter is not compatible with the parameter type 'mode_t'
(aka 'unsigned short') declared in a previous prototype [-Werror]
mode_t mode;
^
sys/kern/sys_pipe.c:155:19: note: previous declaration is here
static fo_chmod_t pipe_chmod;
^
Enforce a boundary of no more than 4GB - transfers crossing a 4GB
boundary can lead to data corruption due to PCIe limitations. This
change is a less-intrusive workaround that can be quickly merged back
to older branches; a cleaner implementation will arrive in HEAD later
but may require KPI changes.
This change is based on a suggestion by jhb@.
Reviewed by: scottl, jhb
Sponsored by: Sandvine Incorporated
MFC after: 3 days
amd64/i386/pc98 endian.h with stubs.
In __bswap64_const(x) the conflict between 0xffUL and 0xffULL has been
resolved by reimplementing the macro in terms of __bswap32(x). As a side
effect __bswap64_var(x) is now implemented using two bswap instructions on
i386 and should be much faster. __bswap32_const(x) has been reimplemented
in terms of __bswap16(x) for consistency.
get rid of testing explicitly for clang (using ${CC:T:Mclang}) in
individual Makefiles.
Instead, use the following extra macros, for use with clang:
- NO_WERROR.clang (disables -Werror)
- NO_WCAST_ALIGN.clang (disables -Wcast-align)
- NO_WFORMAT.clang (disables -Wformat and friends)
- CLANG_NO_IAS (disables integrated assembler)
- CLANG_OPT_SMALL (adds flags for extra small size optimizations)
As a side effect, this enables setting CC/CXX/CPP in src.conf instead of
make.conf! For clang, use the following:
CC=clang
CXX=clang++
CPP=clang-cpp
MFC after: 2 weeks
corruption. Thanks to scottl@ for the suggestion.
This change will likely be revised after consideration of a general
method to address this type of issue for other drivers.
Sponsored by: Sandvine Incorporated
MFC after: 3 days
extract a link status of PHY when parent driver is re(4).
RGEPHY_MII_SSR register does not seem to report correct PHY status
on some integrated PHYs used with re(4).
Unfortunately, RealTek PHYs have no additional information to
differentiate integrated PHYs from external ones so relying on PHY
model number is not enough to know that. However, it seems
RGEPHY_MII_SSR register exists for external RealTek PHYs so
checking parent driver would be good indication to know which PHY
was used. In other words, for non-re(4) controllers, the PHY is
external one and its revision number is greater than or equal to 2.
This change fixes intermittent link UP/DOWN messages reported on
RTL8169 controller.
Also, mii_attach(9) is tried after setting interface name since
rgephy(4) have to know parent driver name.
PR: kern/165509
not get syscall exit notification until the child performed exec of
exit. Swap the order of doing ptracestop() and waiting for P_PPWAIT
clearing, by postponing the wait into syscallret after ptracestop()
notification is done.
Reported, tested and reviewed by: Dmitry Mikulin <dmitrym juniper net>
MFC after: 2 weeks
USERSPACE:
1. add support for devices with different number of rx and tx queues;
2. add better support for zero-copy operation, adding an extra field
to the netmap ring to indicate how many buffers we have already processed
but not yet released (with help from Eddie Kohler);
3. The two changes above unfortunately require an API change, so while
at it add a version field and some spares to the ioctl() argument
to help detect mismatches.
4. update the manual page for the two changes above;
5. update sample applications in tools/tools/netmap
KERNEL:
1. simplify the internal structures moving the global wait queues
to the 'struct netmap_adapter';
2. simplify the functions that map kring<->nic ring indexes
3. normalize device-specific code, helps mainteinance;
4. start exploring the impact of micro-optimizations (prefetch etc.)
in the ixgbe driver.
Use 'legacy' descriptors on the tx ring and prefetch slots gives
about 20% speedup at 900 MHz. Another 7-10% would come from removing
the explict calls to bus_dmamap* in the core (they are effectively
NOPs in this case, but it takes expensive load of the per-buffer
dma maps to figure out that they are all NULL.
Rx performance not investigated.
I am postponing the MFC so i can import a few more improvements
before merging.
APIC is not found.
- Don't panic if lapic_enable_cmc() is called and the APIC is not enabled.
This can happen due to booting a kernel with APIC disabled on a CPU that
supports CMCI.
- Wrap a long line.
Descriptions are specific to drivers and we don't change drivers on attached
devices. This fixes a few places where we were not clearing the description
when detaching a driver (e.g. with device_attach() failed). While here, fix
a few other nits:
- Remove spurious call to remove a device's driver from
devclass_driver_deleted(). device_detach() removes it already.
- Fix a typo.
- In sched_pickcpu() be more careful taking previous CPU on SMT systems.
Do it only if all other logical CPUs of that physical one are idle to avoid
extra resource sharing.
- In sched_pickcpu() change general logic of CPU selection. First
look for idle CPU, sharing last level cache with previously used one,
skipping SMT CPU groups. If none found, search all CPUs for the least loaded
one, where the thread with its priority can run now. If none found, search
just for the least loaded CPU.
- Make cpu_search() compare lowest/highest CPU load when comparing CPU
groups with equal load. That allows to differentiate 1+1 and 2+0 loads.
- Make cpu_search() to prefer specified (previous) CPU or group if load
is equal. This improves cache affinity for more complicated topologies.
- Randomize CPU selection if above factors are equal. Previous code tend
to prefer CPUs with lower IDs, causing unneeded collisions.
- Rework periodic balancer in sched_balance_group(). With cpu_search()
more intelligent now, make balansing process flat, removing recursion
over the topology tree. That fixes double swap problem and makes load
distribution more even and predictable.
All together this gives 10-15% performance improvement in many tests on
CPUs with SMT, such as Core i7, for number of threads is less then number
of logical CPUs. In some tests it also gives positive effect to systems
without SMT.
Reviewed by: jeff
Tested by: flo, hackers@
MFC after: 1 month
Sponsored by: iXsystems, Inc.
Uftdi(4) examines (c_iflag & (IXON|IXOFF)) to control hw XON-XOFF support.
This is obviously no good, if changes to those bits are not communicated
down the stack.
allow.mount.zfs:
allow mounting the zfs filesystem inside a jail
This way the permssions for mounting all current VFCF_JAIL filesystems
inside a jail are controlled wia allow.mount.* jail parameters.
Update sysctl descriptions.
Update jail(8) and zfs(8) manpages.
TODO: document the connection of allow.mount.* and VFCF_JAIL for kernel
developers
MFC after: 10 days
socket protocol number. This is useful since the socket type can
be implemented by different protocols in the same protocol family,
e.g. SOCK_STREAM may be provided by both TCP and SCTP.
Submitted by: Jukka A. Ukkonen <jau iki fi>
PR: kern/162352
Discussed with: bz
Reviewed by: glebius
MFC after: 2 weeks
been bait-and-switched from the rate control code.
This will avoid the panic that I saw and will avoid sending invalid rates
(eg 11a/11g OFDM rates when in 11b, on 11b-only NICs (AR5211)) where the
rate table is not "big".
It also will point out situations where this occurs for the 11n NICs
which will have sufficiently large rate tables that "invalid rix" doesn't
occur.
I'll try to follow this up with a commit that adds a current operating mode
check. The "rix" is only relevant to the current operating mode and rate
table.
PR: kern/165475
* ath_reset() is being called in softclock context, which may have the
thing sleep on a lock. To avoid this, since we really _shouldn't_
be sleeping on any locks, break out the no-loss reset path into a tasklet
and call that from:
+ ath_calibrate()
+ ath_watchdog()
This has the added advantage that it'll end up also doing the frame
RX cleanup from within the taskqueue context, rather than the softclock
context.
* Shuffle around the taskqueue_block() call to be before we grab the lock
and disable interrupts.
The trouble here is that taskqueue_block() doesn't block currently
queued (but not yet running) tasks so calling it doesn't guarantee
no further tasks (that weren't running on _A_ CPU at the time of this
call) will complete. Calling taskqueue_drain() on these tasks won't
work because if any _other_ thread calls taskqueue_enqueue() for whatever
reason, everything gets very angry and stops working.
This slightly changes the race condition enough to let ath_rx_tasklet()
run before we try disabling it, and thus quietens the warnings a bit.
The (more) true solution will be doing something like the following:
* having a taskqueue_blocked mask in ath_softc;
* having an interrupt_blocked mask in ath_softc;
* only calling taskqueue_drain() on each individual task _after_ the
lock has been acquired - that way no further tasklet scheduling
is going to occur.
* Then once the tasks have been blocked _and_ the interrupt has been
disabled, call taskqueue_drain() on each, ensuring that anything
that _was_ scheduled or running is removed.
The trouble is if something calls taskqueue_enqueue() on a task
after taskqueue_blocked() has been called but BEFORE taskqueue_drain()
has been called, ta_pending will be set to 1 and taskqueue_drain()
will sit there stuck in msleep() until you hard-kill the machine.
PR: kern/165382
PR: kern/165220
unp->unp_vnode pointer to detect if there is a vnode associated with
(binded to) this socket and does necessary cleanup if there is.
The issue is that after forced unmount this check may be too late as
the unp_vnode is reclaimed and the reference is stale.
To fix this provide a helper function that is called on a socket vnode
reclamation to do necessary cleanup.
Pointed by: kib
Reviewed by: kib
MFC after: 2 weeks
RTL810x family , RTL8139 has different register map for Config
registers.
While here, follow the lead of re(4) in WOL configuration.
- Disable WOL_UCAST and WOL_MCAST capabilities by default.
- Config5 register write does not need to unlock EEPROM access
on RTL8139 family but unlocking EEPROM access does not affect
its operation and make it consistent with re(4).
Reported by: Matt Renzelmann mjr <> cs dot wisc dot edu
according to POSIX document, the clock ID may be dynamically allocated,
it unlikely will be in 64K forever. To make it future compatible, we
pack all timeout information into a new structure called _umtx_time, and
use fourth argument as a size indication, a zero means it is old code
using timespec as timeout value, but the new structure also includes flags
and a clock ID, so the size argument is different than before, and it is
non-zero. With this change, it is possible that a thread can sleep
on any supported clock, though current kernel code does not have such a
POSIX clock driver system.
call in an #if 0 section.
In in6_selecthlim() optimize a case where in6p cannot be NULL due to an
earlier check.
More consistently use u_int instead of int for fibnum function arguments.
Sponsored by: Cisco Systems, Inc.
MFC after: 3 days
bridge, this allows us to have more than one independent bridge in the same
STP domain.
PR: kern/164369
Submitted by: Nikos Vassiliadis (earlier version)
MFC after: 2 weeks
It doesn't _really_ help all that much, I'll commit something to
sys/net/if.c at some point explaining why, but the lock should be held
when checking/manipulating/branching because of said lock.
comlock, I'd like to find and analyse these cases to see if they
really are valid.
So, throw in a lock here and wait for the (hopefully!) inevitable
complaints.
- Remove all attempts to guess physical temperature using DiodeOffset.
There are too many reports that it varies wildly depending on motherboard.
Instead, if it is known to scale well and its offset is known from other
temperature sensors on board, the user may set "dev.amdtemp.0.sensor_offset"
tunable to compensate the difference. Document the caveats in amdtemp(4).
- Add a quirk for Socket AM2 Revision G processors. These processors are
known to have a different offset according to Linux k8temp driver.
- Warn about Family 10h Erratum 319. These processors have broken sensors.
- Report temperature in more logical orders under dev.amdtemp node. For
example, "dev.amdtemp.0.sensor0.core0" is now "dev.amdtemp.0.core0.sensor0".
- Replace K8, K10 and K11 with official processor names in amdtemp(4).
v_writecount. Keep the amount of the virtual address space used by
the mappings in the new vm_object un_pager.vnp.writemappings
counter. The vnode v_writecount is incremented when writemappings gets
non-zero value, and decremented when writemappings is returned to
zero.
Writeable shared vnode-backed mappings are accounted for in vm_mmap(),
and vm_map_insert() is instructed to set MAP_ENTRY_VN_WRITECNT flag on
the created map entry. During deferred map entry deallocation,
vm_map_process_deferred() checks for MAP_ENTRY_VN_WRITECOUNT and
decrements writemappings for the vm object.
Now, the writeable mount cannot be demoted to read-only while
writeable shared mappings of the vnodes from the mount point
exist. Also, execve(2) fails for such files with ETXTBUSY, as it
should be.
Noted by: tegge
Reviewed by: tegge (long time ago, early version), alc
Tested by: pho
MFC after: 3 weeks
a new jail parameter node with the following parameters:
allow.mount.devfs:
allow mounting the devfs filesystem inside a jail
allow.mount.nullfs:
allow mounting the nullfs filesystem inside a jail
Both parameters are disabled by default (equals the behavior before
devfs and nullfs in jails). Administrators have to explicitly allow
mounting devfs and nullfs for each jail. The value "-1" of the
devfs_ruleset parameter is removed in favor of the new allow setting.
Reviewed by: jamie
Suggested by: pjd
MFC after: 2 weeks
at which the lle_tbl pointer points to freed memory and the llt_free pointer is no longer
valid.
Move the free pointer in to the llentry itself and update the initalization sites.
MFC after: 2 weeks
"panic in 8.3-PRERELEASE" on Feb. 22, 2012. This panic was caused
by use of a mix of tsleep() and msleep() calls on the same event
in the new NFS server DRC code. It did "mtx_unlock(); tsleep();"
in two places, which kib@ noted introduced a slight risk that the
wakeup() would occur before the tsleep(), resulting in a 10sec
delay before waking up. This patch fixes the problem by replacing
"mtx_unlock(); tsleep();" with mtx_sleep(..PDROP..). It also
changes a nfsmsleep() call to mtx_sleep() so that the code uses
mtx_sleep() consistently within the file.
Tested by: hrs (in progress)
Reviewed by: jhb
MFC after: 5 days
to the debugger. When reparenting for debugging, keep the child in
the new orphan list of old parent. When looping over the children in
kern_wait(), iterate over both children list and orphan list to search
for the process by pid.
Submitted by: Dmitry Mikulin <dmitrym juniper.net>
MFC after: 2 weeks
I'm not sure _why_ the ic is NULL here, but I've seen it occasionally do
this after I've been tinkering with things for a while. It ends up
crashing in a call to ath_chan_set() via the net80211 scan code and scan
task.
don't give RX path more priority than TX path.
Also remove infinite loop in interrupt handler and limit number of
iteration to 32. This change addresses system load fluctuations
under high network load.
found on Adaptec AIC-6915 Starfire ethernet controller.
While here, use status register to know resolved speed/duplex.
With this change, sf(4) correctly reports speed/duplex of
established link.
Reviewed by: marius
the traffic flow, this may not be the case giving poor traffic distribution.
Add a sysctl which allows us to fall back to our own flow hash code.
PR: kern/164901
Submitted by: Eugene Grosbein
MFC after: 1 week
UMTX_OP_WAIT. Upper 16bits is enough to hold a clock id, and lower
16bits is used to pass flags. The change saves a clock_gettime() syscall
from libthr.
- Centralize address assignment
- Make sure managment ports get first MAC address in pool
- Properly propagate fail if address allocation failed
Submitted by: Andrew Duane <aduane@juniper.net>
sys/dev/hpt27xx/osm_bsd.c, since it gets the following warnings:
sys/dev/hpt27xx/osm_bsd.c:1180:25: error: format string is not a string literal (potentially insecure) [-Werror,-Wformat-security]
S_IRUSR | S_IWUSR, driver_name);
^~~~~~~~~~~
@/dev/hpt27xx/hpt27xx_config.h:46:21: note: expanded from:
#define driver_name hpt27xx_driver_name
^~~~~~~~~~~~~~~~~~~
Since 'hpt27xx_driver_name' is a constant string symbol (coming from the
proprietary hpt27xx_lib.o file), there is no security problem.
Because this driver is provided by the vendor, and applying changes
requires re-certification and other bureaucratic exercises, just disable
the warning for now.
MFC after: 1 week
several sys/cam/ctl files, since these get the following warnings:
In file included from sys/cam/ctl/ctl_backend.c:60:
sys/cam/ctl/ctl_private.h:300:30: error: variable 'page_index_template' is not needed and will not be emitted [-Werror,-Wunneeded-internal-declaration]
static struct ctl_page_index page_index_template[] = {
^
These warnings are tricky to fix without a lot of overhaul, and they are
harmless, so disable them for now.
MFC after: 1 week
Add the sysctl debug.iosize_max_clamp, enabled by default. Setting the
sysctl to zero allows to perform the SSIZE_MAX-sized i/o requests from
the usermode.
Discussed with: bde, das (previous versions)
MFC after: 1 month
types 0x05 and 0x0f, but 0x05 is preferred and used when partition is
created with "gpart add -t ebr ...".
This should keep EBR partitions accessible after r231754 for those,
who have EBR on the partition with type 0x0f.
- In case the parent is bge(4), don't set the Jumbo frame settings unless
the MAC actually is Jumbo capable as otherwise the PHY might not have the
corresponding registers implemented. This is also in line with what the
Linux tg3 driver does.
PR: 165032
Submitted by: Alexander Milanov
Obtained from: OpenBSD
MFC after: 3 days
hold the lock.
This is part of my series of work to try and capture when net80211
locking isn't.
ObNote: it'd be nice to be able to mark a lock as "assert if the lock
is dropped", so I could capture functions which decide that dropping
and reacquiring the lock is a good idea (without re-checking the
sanity of the state protected by the lock.)
Vnode-backed mappings cannot be put into the kernel map, since it is a
system map.
Use exec_map for transient mappings, and remove the mappings with
kmem_free_wakeup() to notify the waiters on available map space.
Do not map the whole executable into KVA at all to copy it out into
usermode. Directly use vn_rdwr() for the case of not page aligned
binary.
There is one place left where the potentially unbounded amount of data
is mapped into exec_map, namely, in the COFF image activator
enumeration of the needed shared libraries.
Reviewed by: alc
MFC after: 2 weeks
devices that are unplugged via QEMU.
sys/dev/xen/blkback/blkback.c:
Toolstack initiated closures change the frontend's state
to Closing. The backend must change to Closing as well,
even if we can't actually close yet, in order for the
frontend to notice and start the closing process.
MFC after: 3 days
- remove the KEVENT code, which was incomplete and not compiled anyways;
- change some while() loops into for()
- adjust indentation
- remove extra whitespace
MFC after: 1 week
struct ccb_pathinq from sys/cam/cam_ccb.h wasn't added to stable/7 at all
and didn't appear in stable/8 until svn R195534. Since __FreeBSD_version
did not get bumped until svn R195634, assume that maxio is valid at 800102
or higher.
Obtained from: Yahoo! Inc.
MFC after: 0 days
with RX/TX halting.
* Always disable/enable interrupts during a channel change, just to simply
things.
* Ensure that the ath taskqueue has completed and is paused before
continuing.
This dramatically reduces the instances of overlapping RX and reset
conditions.
PR: kern/165220