| |
The 7th Annual Meeting on High Performance Computing and Infrastructure in NorwayNOTUR2008 - ABSTRACTSLarge-Scale Computation in Language TechnologyStephan OepenLanguage Technology (LT) is the interdisciplinary study of computational systems that process (and to a certain degree 'understand') natural languages. Applications of such technology can comprise, for example, speech interfaces, automated translation from one language into another, information extraction and retrieval, grammar and style checking, and a range of human--computer interaction tasks using natural language. Current mainstream R&D in language technology often combines symbolic and numeric computation, and large-scale computation has become an ever more important prerequisite for LT with the rapidly growing availability of massive on-line document collections. Most symbolic LT computation is organized as constraint-based search, applying a formal model of the system of language rules to determine, say, the grammatical structure of sentences. Numeric computation typically acquires, evaluates, and puts to use stochastic models of language use, for example to rank (or prune) competing hypotheses. In this paradigm, a seeming simple sentence often may give rise to large numbers of grammatically possible interpretations of which some are judged more plausible (i.e. probable) than others. I will use examples from machine translation (MT) to illustrate the type and scale of computation that has caused LT research groups to develop a growing appetite for large-scale distributed and parallel computation. Based on this requirement, I will reflect on our experience integrating a relatively small project-owned cpu cluster into the TITAN environment. |