Untether AI is a processor company that is pioneering the technology of moving processing steps to where the data resides rather than the other way around. This reduces data movement and thereby improves operational efficiency and power efficiency. To this end it has developed a bus-free near-memory computing architecture that is good for neural network inferencung

With its first generation runAI200 chip (see ‘At-memory’ inference engine raises NN performance) gaining traction, now was the right time for Untether to raise funds to expand the company and get ready for a next generation chip.

Venture capitalists agreed and have put up $125 million in an oversubscribed round co-led by an affiliate of Tracker Capital Management LLC and previous investor Intel Capital. The round included money from new investor the Canada Pension Plan Investment Board and existing investor Radical Ventures. The round puts Untether.AI’s funding to date above $150 million, Iyengar said.

Iyengar said the substantial amount of money would be used for two purposes: to expand the engagements for the runAI200 in multiple markets and to define and develop the next generation of hardware. “We have to build up our support to take advantage of every potential product interaction,” Iyengar told eeNews Europe in a Zoom interview.

That might be the easier part of the task if the 16nm chip and its 8TOPS/W efficiency is doing as well as the company claims. It is the management team’s job to decide where the sweet spot is for various applications. Should Untether.AI go to the 12nm manufacturing process to keep costs down or go to 7nm to get significant uplift in performance and capacity to hold neural nets? Or should the company go “all-in” to try and intersect the 3nm when it arrives in 2022 or 2023.

As a further indication of where Untetcher stands today, the runAI200 is designed to try and contain complete neural networks and the coefficients on a single chip – or on four chips when considering the TsunAImi PCIe card. The chip contains 200Mbytes of SRAM with 260,000 processing elements dispersed among the SRAM. The design supports int8 and int16 data types and has a 720MHz clock frequency for efficiency and a 960MHz mode optimized for performance.

Next: Time to choose

So now the company is considering how much performance increase is desirable and how close to the leading-edge it needs to go to achieve it. Iyengar pointed out that the company chose to start at 16nm partly because they didn’t need to go right to the leading-edge with their foundry TSMC. They felt they could gain a performance lead over competition while still at a slightly more affordable node.

“Given where we are in terms of the chip shortage, we are fortunate. We can get what we need in terms of volume. We didn’t anticipate Covid-19 and the chip shortage, so were fortunate.”

Untether.AI believes it has a scalable architecture that allows it to have impact everywhere from edge to the data center. “We are addressing infrastructure for edge and for the cloud,” said Iyengar. He then laid out the market in terms of four sectors.

One is vision systems including autonomous driving, drones, robotics and factory automation. “These are all applications where we have engagement,” said Iyengar.

Second is banking and finance applications. Iyengar has experience here from his time as a senior executive at Altera from 1996 to 2012. Back then banks were interested in the phenomenon of high frequency trading and were prepared to try and draft hardware to minimize latency and again an advantage.

Now the banking interest is risk assessment based on dynamic market positions, which AI maybe able to resolve more quickly and accurately than humans. “Right now an international bank with multiple desks in multiple markets and time zones can only do a risk evaluation when all the data is in, say at the end of the week, or the end of the month. They would like to be doing risk assessments at least every day, or even every hour,” said Iyengar.

The third market is the data center with AaaS or AI-as-a-service, which could include such challenges as natural language processing. The fourth market is government in areas such as security which uses the combined power of vision and natural language processing.

As Untether.AI pursues these markets it introduces more complexity: around product definition, the die size and cost to fulfill that definition, and the choice of target process. In other words how much additional IP should the company include to optimize a next generation inference engine for the application? This could be by way of additional memory for buffering data or for forms of application-specific processing and does Untether have to make that IP or license it in, as it’s primary added value is in the efficient execution of inferencing.

Next: Time for chiplets?

“You are right that we are really a general-purpose AI engine and so when looking at the next generation it has started us thinking about chiplets. It is a way of coping with the variety of application-specific requirements,” said Iyengar.

Iyengar referenced the example of a vision processing where a runAI engine could usefully be placed alongside an image signal processor (ISP). “We don’t have that visual IP but plenty of others do.” In the chiplet style manufacturing third-parties would provide bare die for assembly within a multi-die package. The fact that TSMC, Untether’s foundry supplier, is a pioneer of 3D integration does not hurt. Iyengar declined to give any names of companies he is in discussions with, but clearly he has more than enough ideas and projects to spend $125 million on.

Iyengar said that the top processor companies now have multigenerational experience of chiplet design. He should know because he spent time at AMD after leaving Altera and before a spell at Xilinx. However, he also acknowledged that designing in 3D is still not well supported by the EDA sector. While optimizations in 2D are detailed and sophisticated, when it comes to the third dimension it is much more heuristic and ad hoc. In addition, the support ecosystem is not yet routine and supply chain relationships for chiplet manufacturing are only just starting to coalesce. That means some parts of the process can be expensive because the are not yet supplied in high volume.

Nonetheless this provides an environment that could provide an early mover advantage. And that could apply to both the component performance and the more general business model.

Related links and articles:

News articles:

‘At-memory’ inference engine raises NN performance

AI startup appoints FPGA, embedded veteran as CEO

EV Group, ASM partner for 3D-IC, chiplet bonding

Intel invests $3.5 billion in chiplet packaging

ARM pushes chiplets and 3D packaging for Neoverse chips


Linked Articles
eeNews Analog