SISU: یک زبان برنامه نویسی محاسباتی زبانشناسی، با یک نرم افزار برای تجزیه و تحلیل فنلاندی
Abstract: This dissertation describes SISU, a new programming language for Computational Linguistics (CL). The motivation for the creation of SISU is discussed in detail. The historical beginnings and development of CL are given, beginning with machine translation and going on through associations with artificial intelligence. The influences of linguistic research upon the field are shown. It is pointed out that in nearly all the literature about CL applications, very little is said about the programming details, or how the system was actually implemented on the computer. This observation leads to an examination of the languages and systems which have been used. It is noted that they either had been developed for other purposes and were adapted for natural-language processing, or that they were developed for only part of the CL problem domain. None of them are designed to meet the needs of the linguistic researcher who is inexperienced in programming. It is proposed that a language designed for the broad range of CL applications, which encourages structured programming techniques and which would be easy for the novice to learn and use, would be an advance in the state of the art of computer science. A number of CL and natural-language processing application areas and examples are examined to develop a list of requirements for such a programming language. The SISU language was developed to meet those requirements. It provides a large number of features to simplify the construction of CL programs. It is implemented as an embedded extension to the PL/I programming language by using a preprocessor. It provides powerful commands and data abstractions to facilitate handling blocks of text, sentences and words by referring to them as just that--text, sentences and words. There are also a number of commands which operate on individual words for the purpose of morphemic analysis, allowing the words to be easily taken apart, examined and altered. A facile dictionary-building and referencing technique is also included, as well as debugging and text output features. As an example of its use, SISU is applied to the Finnish language. A brief description of Finnish is given. It is pointed out that through the process of agglutination, or the adding on of a number of inflectional endings, a single word can generate literally tens of variant forms; however, it is only the basic form which is carried in dictionaries. Moreover, the endings often cause changes in the basic form so that the remaining stem, after removal of the endings, must be changed back into the basic form before it can be found in a computer dictionary, which could not contain all the variant forms. Without this capability, computer analysis of Finnish would be very limited. This problem was analyzed and solved for the class of plural nouns. A general method was developed which allowed the extension to other word classes. The resulting algorithm was found to be easily and quickly programmable in SISU. The program and the results are given.