The efficiency of computation and reduced movements of data mean a much higher throughput per watt than existing solutions, with a performance advantage that is especially strong at low batch sizes which are required for edge applications where there is typically one camera/sensor.
For YOLOv3 real time object recognition, InferX X1 processes 12.7 frames/second of 2 megapixel images at batch size = 1. Performance is roughly linear with image size: frame rate approximately doubles for a 1 megapixel image.
The nnMax 1K tile and InferX X1 coprocessor support 8, 16 and bfloat16 numerics with the ability to mix them across layers. InferX is programmed using TensorFlow Lite and ONNX, two of the most popular inference ecosystems.
"The difficult challenge in neural network inference is minimizing data movement and energy consumption, which is something our interconnect technology can do amazingly well,” said Geoff Tate, CEO of Flex Logix. "While processing a layer, the datapath is configured for the entire stage using our reconfigurable interconnect, enabling InferX to operate like an ASIC, then reconfigure rapidly for the next layer. Because most of our bandwidth comes from local SRAM, InferX requires just a single DRAM, simplifying die and package, and cutting cost and power."
InferX X1 will be available as chips for edge devices and on half-height, half-length PCIe cards for edge servers and gateways. It is programmed using the nnMAX Compiler which takes Tensorflow Lite or ONNX models. The internal architecture of the inference engine is hidden from the user.
The nnMax 1K is in development and will be available for integration in SoCs by 3Q19. The InferX X1 is due to tape-out in 3Q19 and samples of chips and PCIe boards will be available shortly after.
Related links and articles: