slide 15 of 19
Performance
-
Shared memory performance is highly scheduler-dependent
-
Without kernel scheduling hints, having too many processes
spinning on shared memory yields zero performance
-
When all processes are scheduled together
-
Bandwidth is 1/2 path to shared cache/DRAM, at least 5 MB/s
-
Latency is a few microseconds,
faster for shared cache
-
Using N PE shared cache, 10% - 20% more than N speedup can
occur due to shared fetching