Title:
|
Arabic text to Arabic sign language example-based translation system
|
This dissertation presents the first corpus-based system for translation from Arabic text into Arabic Sign Language (ArSL) for the deaf and hearing impaired, for whom it can facilitate access to conventional media and allow communication with hearing people. In addition to the familiar technical problems of text-to-text machine translation,building a system for sign language translation requires overcoming some additional challenges. First,the lack of a standard writing system requires the building of a parallel text-to-sign language corpus from scratch, as well as computational tools to prepare this parallel corpus. Further, the corpus must facilitate output in visual form, which is clearly far more difficult than producing textual output. The time and effort involved in building such a parallel corpus of text and visual signs from scratch mean that we will inevitably be working with quite small corpora. We have constructed two parallel Arabic text-to-ArSL corpora for our system. The first was built from school level language instruction material and contains 203 signed sentences and 710 signs. The second was constructed from a children's story and contains 813 signed sentences and 2,478 signs. Working with corpora of limited size means that coverage is a huge issue. A new technique was derived to exploit Arabic morphological information to increase coverage and hence, translation accuracy. Further, we employ two different example-based translation methods and combine them to produce more accurate translation output. We have chosen to use concatenated sign video clips as output rather than a signing avatar, both for simplicity and because this allows us to distinguish more easily between translation errors and sign synthesis errors. Using leave-one-out cross-validation on our first corpus, the system produced translated sign sentence outputs with an average word error rate of 36.2% and an average position-independent error rate of 26.9%. The corresponding figures for our second corpus were an average word error rate of 44.0% and 28.1%. The most frequent source of errors is missing signs in the corpus; this could be addressed in the future by collecting more corpus material. Finally, it is not possible to compare the performance of our system with any other competing Arabic text-to-ArSL machine translation system since no other such systems exist at present.
|