It is reported that AMD is trying to track down as many as
3,000 Opteron processors which could experience erratic behavior under high-temperature
conditions. Processors affected include a number of single-core Opteron processors
manufactured within the past six months. The chips were shown to experience
higher than normal core temperatures when running in a high temperature environment.
This caused the chips to flub some floating-point calculations.

The problem is believed to affect only a fraction – perhaps
no more than 3,000 individual CPUs – which managed to slip through AMD’s screening
net. It is not known how this so-called ‘test escape’ ocurred, but it took place
“in part of 2005 and early 2006″, an AMD spokesman said. Although
only a few processors are defective, the fact that no one can place an exact
bearing on which batch of processors has the problem is troubling at best. AMD
claims measures have been put in place to prevent the bug from happening again,
but also stresses that the condition is not likely to happen in financial environments.

I’m just curious as to how AMD is tracking down the suspected
buggy processors to rectify the situation. I believe the erratic behavior is
simply put, a processor working above it’s capability and the ‘test escape’
is basically an over-lenient speed-binning process. For those who suspect they
are affeeted by this please check at the official
AMD response page
to get your ‘situation’ corrected.

It is reported that AMD is trying to track down as many as
3,000 Opteron processors which could experience erratic behavior under high-temperature
conditions. Processors affected include a number of single-core Opteron processors
manufactured within the past six months. The chips were shown to experience
higher than normal core temperatures when running in a high temperature environment.
This caused the chips to flub some floating-point calculations.

Because of the tests, AMD has changed the screening process
for rating the two product lines as the chips come off the production line,
Taylor said. As a result, some chips that would have been rated with clock speeds
of 2.8 MHz in the past would be listed at 2.6 MHz, making them less likely to
be used in extreme computing environments. This appears to be a separate issues
that was earlier reported by The Register claiming that a bug in a batch of
Opteron processors will result in incorrect results in iterations with millions
of loops. Coupled with high ambient temperatures, the processor will corrupt
data.

The problem is believed to affect only a fraction – perhaps
no more than 3,000 individual CPUs – which managed to slip through AMD’s screening
net. It is not known how this so-called ‘test escape’ ocurred, but it took place
“in part of 2005 and early 2006″, an AMD spokesman said. Although
only a few processors are defective, the fact that no one can place an exact
bearing on which batch of processors has the problem is troubling at best. AMD
claims measures have been put in place to prevent the bug from happening again,
but also stresses that the condition is not likely to happen in financial environments.

I’m just curious as to how AMD is tracking down the suspected
buggy processors to rectify the situation. I believe the erratic behavior is
simply put, a processor working above it’s capability and the ‘test escape’
is basically an over-lenient speed-binning process. For those who suspect they
are affected by this please check at the official
AMD response page
to get your ‘situation’ corrected.