Hirdetés

Új hozzászólás Aktív témák

  • S_x96x_S

    addikt

    részletes(ebb) elemzés a Zen4, zen5, zen5c között.
    A jelentős átalakítás miatt - vannak olyan másodlagos hatások, amelyek egyes helyeken rosszabb teljesítményt nyújtanak. ( a few downsides [1] )

    AMD Strix Point “Ryzen AI 9 365” APU Benchmarks Revealed Zen 5’s IPC, Latency, Throughput & Various Performance Aspects
    https://wccftech.com/amd-strix-point-ryzen-ai-9-365-apu-ipc-latency-throughput-various-performance-tests/ ( forrás: David Huang )


    [1]

    David lists that while Zen 5 has improvements thanks to its ground-up design, the architecture also has a few downsides which are as below:
    - The throughput of various scalar ALU instructions has been greatly increased, but because the number of vector units in the mobile Zen 5 is halved compared to desktop and server, the SIMD throughput in this test remains unchanged compared to Zen 4. Even on the Zen 5 core with halved vector units, SIMD store operations of all widths are still doubled compared to the previous generation, and the SIMD load store throughput reaches 1:1;
    - The branch processing capability has been greatly enhanced, with the number of non-taken branches that can be processed per cycle increased from two to three, and two taken branches can be processed per cycle. This should be related to the new front-end design;
    - The latency of 128/256/512bit SSE/AVX/AVX512 SIMD integer addition calculations has all been increased to 2 cycles. This change may be to make it easier to maintain high frequencies.
    - The throughput of 128/256bit SIMD integer addition operations is halved compared to Zen 4, but 512bit remains unchanged. It is speculated that this problem only exists on Zen 5 cores with halved SIMD, which may be related to port allocation;
    - Removed the nop fusion feature introduced in Zen 4. It is no longer possible to merge a nop instruction with another instruction on the same macro-op;
    - Adjusted the throughput of some logical register operations, unifying the throughput of some mov operations and some register zeroing operations to 5, which is a mixed improvement compared to Zen 4.

Új hozzászólás Aktív témák