AMD Barcelona Quad Core Architecture

  • Advanced branch prediction. Doubled the return stack size, more branch
    history bits, and built in a 512-entry indirect branch predictor
  • 32-byte instruction fetch. Increases efficiency by reducing split-fetch
    instruction cases
  • Sideband stack optimizer. Adjustments to the stack don’t take up
    functional unit bandwidth.
  • Independent memory controllers, which enables more memory pages to remain
    open
  • Memory controllers now support full 48-bit hardware addressing, which
    theoretically allows for 256 terabytes of physical memory
  • Implemented 1GB memory page size in addition to the common 4KB and 2MB
    page sizes
  • L1 cache is 64KB, the L2 cache is 512KB dedicated per core and the L3
    cache is 2MB shared between 4 cores to better suited for coming age of
    virtualization.
  • Improved hardware support for virtualization through virtualized address
    translation, instead of the current shadow paging.
  • Supports separate CPU core and memory controller power planes to allow CPU
    to lower its power state while the memory controller is running full bore
  • Enhanced AMD’s PowerNow allows individual core frequencies to lower while
    other cores may be running full bore




Core
Architecture
Santa Rosa Barcelona
SSE Execution Width64 bits wide128 bits wide
Instruction Fetch Bandwidth16 bytes/cycle32 bytes/cycle
Data Cache Bandwidth2 x 64 bit loads/cycle2 x 128 bits loads/cycle
L2 cache/memory controller bandwidth64 bits/cycle128 bits/cycle
Floating-point scheduler depth36 dedicated x 64-bit ops36 dedicated x 128-bit ops
  • SSE MOV instructions can be performed in the floating-point "store" pipe
  • Two SSE operations can be executed and one SSE move per cycle
  • Support an unaligned load/execute mode, which can improve instruction
    packing and decoding efficiency
  • Advanced branch prediction. Doubled the return stack size, more branch
    history bits, and built in a 512-entry indirect branch predictor
  • 32-byte instruction fetch. Increases efficiency by reducing split-fetch
    instruction cases
  • Sideband stack optimizer. Adjustments to the stack don’t take up
    functional unit bandwidth.
  • Out-of-order load execution. Load instructions can actually bypass other
    loads in some cases, as well as stores that are not dependent on the load in
    question. This minimizes the effect of L2 cache latency.
  • Optimizations to the TLBs (translation lookaside buffers)
  • Additional Fastpath instructions
  • Extensions to bit manipulations and SSE instructions
  • Independent memory controllers, which enables more memory pages to remain
    open
  • Memory controllers now support full 48-bit hardware addressing, which
    theoretically allows for 256 terabytes of physical memory
  • Implemented 1GB memory page size in addition to the common 4KB and 2MB
    page sizes
  • L1 cache is 64KB, the L2 cache is 512KB dedicated per core and the L3
    cache is 2MB shared between 4 cores to better suited for coming age of
    virtualization.
  • Improved hardware support for virtualization through virtualized address
    translation, instead of the current shadow paging.
  • Supports separate CPU core and memory controller power planes to allow CPU
    to lower its power state while the memory controller is running full bore
  • Enhanced AMD’s PowerNow allows individual core frequencies to lower while
    other cores may be running full bore
VR-Zone is a leading online technology news publication reporting on bleeding edge trends in PC and mobile gadgets, with in-depth reviews and commentaries.