I find myself unreasonably excited for the work that Taalas is doing. They’re turning open weight models directly into ASICs (application specific integrated circuits) so that the chip essentially acts as the model - and only as the model. You get raw transistor switching speed within the model. Not only does this result in incredibly fast computation (~15k tokens a second for llama 8b!) but you are doing so at ridiculously low power consumption and efficiency.
I’ve heard some rumors that are tempting and I hope are true. First - a Qwen 27B model is in the works; this is a great model that can do a lot of the simpler tasks you’d want from local AI, and can even do a bit of coding well. Importantly it’s a very strong tool calling agent, so it could very easily hand off tasks to additional AIs when needing capable compute. Supposedly cost to produce the board is ~$400 (again - unstantiated rumors here) which would mean an attractive $800 price tag.
If we start to look at large models - 70b and 120b open models - being translated to this style of chip, we could be looking at an explosion of local AI that completely changes the costs (both monetary and power consumption) of AI application. But…
I truly don’t expect that they will ever make it to selling consumer boards; I fully expect them to either find themselves gobbled up by NVIDIA to secure potential competition, or for someone like Google/Samsung/Apple to buy them to build out the equivalent for phones.
#ai #hardware