Abstract:
The single biggest obstacle in performing comprehensive cross-lingual discourse analysis is the scarcity of multilingual resources. The existing resources are overwhelmingly monolingual, compelling researchers to infer the discourse information in the target languages through error-prone automatic means. The current paper aims to provide more direct insight regarding the cross-lingual variations in discourse structures by offering an aligned version of a multilingual resource, namely TED-Multilingual Discourse Bank, which consists of independently annotated six Ted talks in seven different languages. It is shown that discourse relations in these languages can be automatically aligned with high accuracy, verified by the experiments on the manual alignments of three diverse languages. The resulting alignments have a great potential to reveal the divergences the target languages exhibit in local discourse relations, with respect to source text, as well as to lead to new resources, as exemplified by the induction of bilingual discourse connective lexicons.