Put a bandaid to prevent ixgbe(4) from completely locking up the system
under high load. Our platform has a few CPU cores and a single active
ixgbe(4) port with 4 queues. Under high enough traffic load, at about
7.5GBs and 700,000 packets/sec (outbound), the entire system would
deadlock. What we found was that each CPU was in an endless loop on a
different ix taskqueue thread. The OACTIVE flag had gotten set on each
queue, and the ixgbe_handle_queue() function was continuously rescheduling
itself via the taskqueue_enqueue. Since all CPUs were busy with their
taskqueue threads, the ixgbe_local_timer() function couldn't run to clear
the OACTIVE flag.
Submitted by: scottl
MFC after: 1 week
http://austingroupbugs.net/view.php?id=385#c713
(Resolved state) recommend this way for the current standard (called
"earlier" in the text)
"However, earlier versions of this standard did not require this, and the
same example had to be written as:
// buf was obtained by malloc(buflen)
ret = write(fd, buf, buflen);
if (ret < 0) {
int save = errno;
free(buf);
errno = save;
return ret;
}
"
from feedback I have for previous commit it seems that many people prefer
to avoid mass code change needed for current standard compliance
and prefer to track unpublished standard instead, which requires now
that free() itself must save errno, not its usage code.
So, I back out "save errno across free()" part of previous commit,
and will fill PR for changing free() isntead.
2) Remove now unused serrno.
MFC after: 1 week
it turns out that it negatively affects performance. I'm stil investigating
exactly why deferring the IO causes such negative TCP performance but
doesn't affect UDP preformance.
Leave the ath_tx_kick() change in there however; it's going to be useful
to have that there for if_transmit() work.
PR: kern/168649
called to "kick" along TX.
For now, schedule a taskqueue call.
Later on I may go back to the direct call of ath_rx_tasklet() - but for
now, this will do.
I've tested UDP and TCP TX. UDP TX still achieves 240MBit, but TCP
TX gets stuck at around 100MBit or so, instead of the 150MBit it should
be at. I'll re-test with no ACPI/power/sleep states enabled at startup
and see what effect it has.
This is in preparation for supporting an if_transmit() path, which will
turn ath_tx_kick() into a NUL operation (as there won't be an ifnet
queue to service.)
Tested:
* AR9280 STA
TODO:
* test on AR5416, AR9160, AR928x STA/AP modes
PR: kern/168649
a critical bugfix:
Processing of DNS resource records where the rdata field is zero length
may cause various issues for the servers handling them.
Processing of these records may lead to unexpected outcomes. Recursive
servers may crash or disclose some portion of memory to the client.
Secondary servers may crash on restart after transferring a zone
containing these records. Master servers may corrupt zone data if the
zone option "auto-dnssec" is set to "maintain". Other unexpected
problems that are not listed here may also be encountered.
All BIND users are strongly encouraged to upgrade.
implementing parallel TX and TX/RX completion can be done without
simply abusing long-held locks.
Right now, multiple concurrent ath_start() entries can result in
frames being dequeued out of order. Well, they're dequeued in order
fine, but if there's any preemption or race between CPUs between:
* removing the frame from the ifnet, and
* calling and runningath_tx_start(), until the frame is placed on a
software or hardware TXQ
Then although dequeueing the frame is in-order, queueing it to the hardware
may be out of order.
This is solved in a lot of other drivers by just holding a TX lock over
a rather long period of time. This lets them continue to direct dispatch
without races between dequeue and hardware queue.
Note to observers: if_transmit() doesn't necessarily solve this.
It removes the ifnet from the main path, but the same issue exists if
there's some intermediary queue (eg a bufring, which as an aside also
may pull in ifnet when you're using ALTQ.)
So, until I can sit down and code up a much better way of doing parallel
TX, I'm going to leave the TX path using a deferred taskqueue task.
What I will likely head towards is doing a direct dispatch to hardware
or software via if_transmit(), but it'll require some driver changes to
allow queues to be made without using the really large ath_buf / ath_desc
entries.
TODO:
* Look at how feasible it'll be to just do direct dispatch to
ath_tx_start() from if_transmit(), avoiding doing _any_ intermediary
serialisation into a global queue. This may break ALTQ for example,
so I have to be delicate.
* It's quite likely that I should break up ath_tx_start() so it
deposits frames onto the software queues first, and then only fill
in the 802.11 fields when it's being queued to the hardware.
That will make the if_transmit() -> software queue path very
quick and lightweight.
* This has some very bad behaviour when using ACPI and Cx states.
I'll do some subsequent analysis using KTR and schedgraph and file
a follow-up PR or two.
PR: kern/168649
"The setting of errno after a successful call to a function is
unspecified unless the description of that function specifies that
errno shall not be modified."
However, free() in IEEE Std 1003.1-2008 does not mention its interaction
with errno, so MAY modify it after successful call
(it depends on particular free() implementation, OS-specific, etc.).
So, save errno across free() calls to make code portable and
POSIX-conformant.
2) Remove unused serrno assignment.
MFC after: 1 week
update for ZFS. It seems that this does not really affect anything except
the help command. Nevertheless, rearrange things so loaddev is set only
once in all cases in order to get it right.
Pointed out by: avg
MFC after: r235364
a single device to be opened multiple times concurrently unfortunately
isn't sufficient with ZFS. This is due to the fact, that ZFS may open
different partitions of a single device simultaneously. So the best we
can do in this case is to cache the lastly used device path and close
and open devices in ofwd_strategy() as needed.
PR: 165025
Submitted by: Gavin Mu
MFC after: 1 week
EARLY_BUILD macro: the -Qunused-arguments flag isn't passed anymore when
building this particular program. However, with clang 3.1 and -Werror,
such unused argument warnings are flagged as errors, causing buildkernel
to fail at this stage, due to the -nostdinc flag passed during linking.
Since the -nostdinc flag isn't actually needed, just remove it.
X-MFC-With: r236528
clock. In general, gettimeofday() is not appropriate interface
when accounting for elasped time because it can go backward, in
which case the policy code could errornously consider the limit
as exceeded.
MFC after: 1 week
Reported by: Mahesh Arumugam
Submitted by: Dorr H. Clark via gnn
Sponsored by: Citrix / NetScaler
The skew calculation here is exactly backwards. We were able to repro
it on a multi-package ESX server running a FreeBSD VM, where the TSCs
can be pretty evil.
MFC after: 1 week
Submitted by: Jeff Ford <jeffrey.ford2@isilon.com>
Reviewed by: avg, gnn
m_cat(), storing pointer to last mbuf in chain in local variable and
attaching new mbuf to the end of chain.
Submitter reports that CPU load dropped for > 10% on a web server
serving large files with this optimisation.
Submitted by: Sergey Budnevitch <sb nginx.com>