Designing logic to sit in a DRAM process presents several constraints, not least routing with only few metal layers. This means that the same functions in DRAM process would take up much more area than in a CMOS logic process. However, the Upmem team have designed their DPU specifically to the functionality and DRAM process constraints while producing an instruction set that remains general purpose, Roy said.
"The fundamental benefit of processing-in-memory is the combination of DRAM and CPU. We attach 1 DPU per DRAM bank. It means 16 cores per 8Gbit DRAM chip. On a 16Gbyte DIMM, we deliver 256 cores, and 8 of them can be added to a standard CPU socket. We end up with a co-processing system of 2048 cores together with 128Gbytes of DRAM per socket," said Roy.
However, such a design with a 2000 processing elements each handling maybe a dozen threads on behalf of a controlling CPU would require a major change in software approach. Roy made the point that although compilers and SDKs will be provided for the DPU it does not need to directly support an operating system because it is design run small programs or routines on behalf of the CPU. "It’s a programmable co-processor optimised for data computing," said Roy.
"The high level application is distributing tasks to the co-pocessors easily because it knows what DPU has what data. You can see it as a distributed computing system at the server level. The DPUs are independent from each other in term of code and data, making the solution scalable. All the communication is done thought the x86, under the control of the application," he added.
Roy said Upmem is in talks with all of the major DRAM manufacturers about making DPU. "They bring the DRAM building blocks, the PHY interface, the facility and the test infrastructure, while Upmem brings the CPU IP and the software stack."