The articles I read - 2: Nulticore
As the number of cores grow from two to 64 performance plummets by a factor of five. Additional processors nullify each other. The author called it the Nulticore Effect. Read the full article here.
http://www.embedded.com/design/mcus-processors-and-socs/4008183/The-Nulticore-effect
This article sites IEEE Spectrum’s article.
http://spectrum.ieee.org/computing/hardware/multicore-is-bad-news-for-supercomputers
In past there had many few attempts to solve such issues. Berkeley IRAM was one of the significant one.
Few advances in chip fabrication technology as well as design practices in recent past can fuel new good ideas. Here is one of my idea.
The diagram above shows the concept where multiple processors access a Unified memory Complex which provides boost in memory throughput. The memory shown in the middle can be a monolithic single chip or collection of separate unit depending upon technology limitations.
Each Processor directly talks to dedicated Cache. It also accesses a copy of common memory devoted to it. These copies are kept in sync using an crossbar interconnect, possibly using an asynchronous design techniques.
All processors should be able to read the memory individually giving Nx speedup. The writes are also can be done individually at Nx speed as long as they are done to different area of address map. The asynchronous sync machine will sync up all N copies of memories in background using faster backdoor channels.