 
 
 
 slide 15 of 19
slide 15 of 19
Performance
- 
Shared memory performance is highly scheduler-dependent
- 
Without kernel scheduling hints, having too many processes
spinning on shared memory yields zero performance
- 
When all processes are scheduled together
- 
Bandwidth is 1/2 path to shared cache/DRAM, at least 5 MB/s
- 
Latency is a few microseconds,
faster for shared cache
 
- 
Using N PE shared cache, 10% - 20% more than N speedup can
occur due to shared fetching