It is quite uncommon to find an academic paper in a major computer architecture conference to attract so much attention. But it is apparent that Intel's paper comparing (or, "debunking") GPU performance against CPU (in terms of speed-ups) has caused quite some movement. Several forums including Linux Magazine, Beyond 3d and even nvidia's forums have reported on this paper.
While GPUs often report extraordinary peak performance based on FLOPS rate, many applications are actually mainly limited by memory bandwidth. If you compare the available memory bandwidhts of both CPUs and GPUs it makes sense that reported speed-ups are in the vecinity of 1-10x. After all, adding more compute units to increse theoretical peak FLOPS rate (as Moore's Law permits) is apparently simpler than increasing the memory bandwidth. There are reports already claiming 128GFLOPS for a Intel Sandy Bridge (SB) Processor with 4 cores. High throughput for such a processor can only be sustained if the application offers enough data reuse in the caches (same for GPU). No reuse at all would require 2TB/s (!) input bandwidth to keep all these SB units busy. Expect to see future system's performance to be increasingly dominated by external bandwidths.
No comments:
Post a Comment