Opteron rev E family of processor expose a bug where, in very rare

ocassions, memory barriers semantic is not honoured by the hardware
itself. As a result, some random breakage can happen in uninvestigable
ways (for further explanation see at the content of the commit itself).

As long as just a specific familly is bugged of an entire architecture
is broken, a complete fix-up is impratical without harming to some
extents the other correct cases.
Considering that (and considering the frequency of the bug exposure)
just print out a warning message if the affected machine is identified.

Pointed out by:	Samy Al Bahra <sbahra at repnop dot org>
Help on wordings by:	jeff
MFC:	3 days
This commit is contained in:
Attilio Rao 2009-11-04 01:32:59 +00:00
parent 421cd2f2fb
commit 06db609d4a
Notes: svn2git 2020-12-20 02:59:44 +00:00
svn path=/head/; revision=198868
2 changed files with 36 additions and 0 deletions

View File

@ -607,6 +607,24 @@ print_AMD_info(void)
printf(", %d lines/tag", (regs[2] >> 8) & 0x0f);
print_AMD_l2_assoc((regs[2] >> 12) & 0x0f);
}
/*
* Opteron Rev E shows a bug as in very rare occasions a read memory
* barrier is not performed as expected if it is followed by a
* non-atomic read-modify-write instruction.
* As long as that bug pops up very rarely (intensive machine usage
* on other operating systems generally generates one unexplainable
* crash any 2 months) and as long as a model specific fix would be
* impratical at this stage, print out a warning string if the broken
* model and family are identified.
*/
if (CPUID_TO_FAMILY(cpu_id) == 0xf && CPUID_TO_MODEL(cpu_id) >= 0x20 &&
CPUID_TO_MODEL(cpu_id) <= 0x3f) {
printf("WARNING: This architecture revision has known SMP "
"hardware bugs which may cause random instability\n");
printf("WARNING: For details see: "
"http://bugzilla.kernel.org/show_bug.cgi?id=11305\n");
}
}
static void

View File

@ -1303,6 +1303,24 @@ print_AMD_info(void)
(amd_whcr & 0x0100) ? "Enable" : "Disable");
}
}
/*
* Opteron Rev E shows a bug as in very rare occasions a read memory
* barrier is not performed as expected if it is followed by a
* non-atomic read-modify-write instruction.
* As long as that bug pops up very rarely (intensive machine usage
* on other operating systems generally generates one unexplainable
* crash any 2 months) and as long as a model specific fix would be
* impratical at this stage, print out a warning string if the broken
* model and family are identified.
*/
if (CPUID_TO_FAMILY(cpu_id) == 0xf && CPUID_TO_MODEL(cpu_id) >= 0x20 &&
CPUID_TO_MODEL(cpu_id) <= 0x3f) {
printf("WARNING: This architecture revision has known SMP "
"hardware bugs which may cause random instability\n");
printf("WARNING: For details see: "
"http://bugzilla.kernel.org/show_bug.cgi?id=11305\n");
}
}
static void