The cause of "Duplicate mbuf free panic" is in the programming
error of hme_load_txmbuf(). The code path of the panic is the
following.
1. Due to unknown reason DMA engine was freezed. So TX descritors
of HME become full and the last failed attempt to transmit a
packet had set its associated mbuf address to hme_txdesc
structure. Also the failed packet is requeued into interface
queue structure in order to retrasmit it when there are more
available TX descritors.
2. Since DMA engine was freezed, if_timer starts to decrement its
counter. When if_timer expires it tries to reset HME. During
the reset phase, hme_meminit() is called and it frees all
associated mbuf with descriptors. The last failed mbuf is also
freed here.
3. After HME reset completed, HME starts to retransmit packets
by dequeing the first packet in interface queue.(Note! the
packet was already freed in hme_meminit()!)
4. When a TX completion interrupt is posted by the HME, driver
tries to free the successfylly transmitted mbuf. Since the
mbuf was freed in step2, now we get "Duplicate mbuf free panic".
However, the real cause is in DMA engine freeze. Since no fatal
errors reported via interrupts, there might be other cause of
the freeze. I tried hard to understand the cause of DMA engine
freeze but couldn't find any clues. It seems that the freeze
happens under very high network loads(e.g. 7.5-8.0 MB/s TX speed).
Though this fix is not enough to eliminate DMA engine freeze it's
better than panic.
Reported by: jhb via sparc64 ML