The architecture is also capable of up to 250 trillion floating-point operations per second (FLOPS), the company said.
Groq's A0 Tensor Streaming Processor is shown on a PCIe board which is currently being tested by customers. The company did not say what foundry is making the chip or what manufacturing process is being used.
The Groq TSP architecture is based on a software-first philosophy that allows it to support both traditional and new machine learning models. Groq said that the architecture is in operation at customer sites in both x86 and non-x86 systems. Groq claims its TSP architecture achieves both compute flexibility and parallelism without the synchronization overhead of traditional GPU and CPU architectures.
"The Groq architecture is many multiples faster than anything else available for inference, in terms of both low latency and inferences per second. Our customer interactions confirm that," said Jonathan Ross, Groq's co-founder and CEO, in a statement. "We had first silicon back, first-day power-on, programs running in the first week, sampled to partners and customers in under six weeks, with A0 silicon going into production."
The company stresses the determinism of the architecture and that execution planning happens in software, freeing up real estate otherwise dedicated to dynamic instruction execution. The tight control provided by this architecture provides deterministic processing that is suitable for applications where safety and accuracy are paramount.
"Groq’s solution is ideal for deep learning inference processing for a wide range of applications," said Dennis Abts, chief architect at Groq, in the same statement. "but even beyond that massive opportunity, the Groq solution is designed for a broad class of workloads. Its performance, coupled with its simplicity, makes it an ideal platform for any high-performance, data- or compute-intensive workload."
Related links and articles: