Graphcore's two-chip 'Colossus' close to launch

April 05, 2018 // By Peter Clarke
Colossus, a pair of intelligence processing units (IPUs), designed by startup Graphcore Ltd. (Bristol, England) as a processor intended specifically for machine intelligence workloads, is a few months away from launch, according to Simon Knowles, CTO and co-founder of the company.

Knowles gave an indication of the launch timing as well as some further details of the Colossus architecture in a talk given at the Scaled ML Conference held at Stanford University on March 24.

Graphcore's aim is to produce a combination of programming environment and semiconductor hardware optimized for a broad range of machine learning networks and strategies as applied in the cloud and enterprise datacenters. The startup, founded in 2016, claims Colussus can increase performance on machine learning algorithms by a factor of up to 100 compared with systems based on GPUs, which tend to be the fastest systems available today.

Knowles had previously said that Graphcore's IPU would be shipping to early-access customers before the end of 2017 with more general availability set to start early in 2018 (see Graphcore's 'Colossus' chip due before end of year). That may still be true. In his Stanford presentation Knowles said that the IPU would launch in the "next few months," which does not exclude the possibility that samples of the 16nm IPUs have already been shipped.

As part of his presentation at the conference Knowles discussed the philosophy behind building an IPU that is memory-centric and that uses bulk synchronous parallel (BSP) communications between processors – in contrast to conventional designs that separate logic and memory.

Two phases of bulk synchronous parallel (BSP) computation. Source: ScaledML Conference and Graphcore Ltd.

Under Graphcore's implementation of BSP there is a communications phase, where all the processors send and receive information as required and then a processing phase which then produces results that will be communicated in the next cycle. Knowles described BSP has a simple abstraction that is guaranteed free of concurrency hazards, although he acknowledged that load-balancing is key to getting the most efficient performance out of the IPU.

Next: 'Perfect interconnect'