Thinking Machines unveils 'interaction models' aiming to replicate natural conversation speeds
The company states its TML-Interaction-Small model responds in 0.40 seconds, significantly outpacing current offerings from major rivals, though the technology remains in a research preview phase.

Thinking Machines Lab has announced the development of interaction models, a new artificial intelligence architecture designed to process user input and generate responses simultaneously. This full duplex technology aims to replicate the natural flow of a phone conversation, moving away from the traditional turn-based protocol where the AI must finish listening before speaking. The company claims its specific model, TML-Interaction-Small, achieves a response time of 0.40 seconds, which it states is comparable to natural human conversation speed and significantly faster than current models from competitors like OpenAI and Google.
The startup, founded last year by former OpenAI CTO Mira Murati, notes that existing AI systems generally operate on a strict sequential basis. Under the current standard, the user speaks, the AI listens, the AI responds, and the user listens again. This new approach departs from that method by allowing the system to handle input and output at the same time. The industry standard has historically prioritised accuracy and safety over simultaneous output, resulting in longer perceived latency for users.
The technology is currently in a research preview phase, not yet a commercial product. A limited research preview is expected within the next few months, with a wider release planned for later in the year. While the company asserts that the benchmarks are impressive and the underlying idea that interactivity should be native to a model is interesting, the actual real-world user experience has not yet been tested by the public.
Whether the real-world experience lives up to the technical claims remains to be seen until people can actually use the system. The 0.40-second latency claim relies on internal benchmarks that have not yet been independently verified or published in detail. It remains unclear if the full duplex approach will introduce new latency issues during complex processing tasks compared to the established sequential models currently dominating the market.


