Recent years have seen steady improvements in the quality and performance of speech-based human-machine interaction driven by a significant convergence in the methods and techniques employed. However, the quantity of training data required to improve state-of-the-art systems seems to be growing exponentially, yet performance appears to be reaching an asymptote that is not only well short of human performance, but which may also be inadequate for many real-world applications. This suggests that there may be a fundamental flaw in the underlying architecture of contemporary systems, and the future direction for research into spoken language processing is currently uncertain.
This talk addresses these issues by stepping outside the usual domains of speech science and technology, and instead draws inspiration from recent findings in the neurobiology of living systems. In particular, four areas will be discussed: the growing evidence for an intimate relationship between sensor and motor behaviour in living organisms, the power of negative feedback control to accommodate unpredictable disturbances in real-world environments, mechanisms for imitation and mental imagery for learning and modeling, and hierarchical models of temporal memory for predicting future behaviour and anticipating the outcome of events.
The talk will conclude by showing how these results point towards a novel architecture for speech-based human-machine interaction that blurs the distinction between the core components of a traditional spoken language dialogue system; an architecture in which cooperative and communicative behaviour emerges as a by-product of a model of interaction where the system has in mind the needs and intentions of a user, and a user has in mind the needs and intentions of the system.