slide 15 of 19

Performance

Shared memory performance is highly scheduler-dependent
Without kernel scheduling hints, having too many processes spinning on shared memory yields zero performance
When all processes are scheduled together
- Bandwidth is 1/2 path to shared cache/DRAM, at least 5 MB/s
- Latency is a few microseconds, faster for shared cache
Using N PE shared cache, 10% - 20% more than N speedup can occur due to shared fetching