Next: About this document ...
Cache Coherency Algorithms in X86 SMP Systems
Part II
Matthew Caron
Transferring data between CPU's
- Intel uses a Shared Front Side bus in their 850
chipset. This bus services each pair of
processors (4 processors = 2 buses, for example). The bus runs 64 bits
wide at 400MHz (RAMBUS Speed) for a bandwidth of 3.2GB/sec shared
between two processors.
- AMD uses the Point-To-Point Front Side bus in their 760MP
chipset and each bus is 64 bits wide running at 133MHz double-pumped
(uses both rising and falling clock edges; DDR SDRAM speed) for a
bandwidth of 2.1GB/sec for each processor.
Discussion of bus architecture
- Intel's shared bus means that each processor only gets 1.6GB/sec
bandwidth under heavy load, where AMD's assures each processor
2.1GB/sec. Why does this matter? Well, let's assume that we have
1.2GHz processors. Now, because they're superscalar and pipelined,
assume they have a latency of 4 CPI. This yields an effective clock
rate of
. Assuming each instruction
requires 64 bits of data (for both the instruction and any data that
is needed to be written both to and from the processor, we have the
processor using
Note that this exceeds
even AMD's bandwidth allocation, but the additional 500MB/sec will
provide a huge performance increase.
- Add to this the increased efficiency of the Point-to-Point bus
implementation. Commands on the Point-to-Point bus wait less for data
from the other processor's cache than in the Shared bus implementation
(because the shared bus has to write to memory, then read back).
- Even more efficiency is added by the MOESI algorithm, which
decreases external bus traffic at the expense of increased processor
complexity.
- In short, it seems that AMD has Intel beaten by a fair margin.
References
This was based largely on
http://www.anandtech.com/showdoc.html?i=1483
Next: About this document ...
2002-06-15