nlp: Tech Gaun - Your tech village

In this post, you will be able to download advanced natural language processing slides and assignments that were used as the study material during the course conducted in Kathmandu University on 27^th August - 21^st September, 2012.

The course program comprised of two graduate level courses, which are itself divided up into two modules. The two courses are: 1) Advanced Linguistic Resources; 2) Advanced Applications for Natural Language Processing. The official website for the course was up for a while but seems to be down now so I decided to upload these documents for you guys.

Course 1: Advanced Resources for Natural Language Processing

Module A: Grammars and Treebanks for Syntactic Processing
(Stefanie Dipper, Univ. Bochum and Heike Zinsmeister, Univ. Stuttgart)

Syntactic preprocessing is becoming more and more important for NLP applications, such as Anaphora Resolution or Phrase-Based Statistical Machine Translation (see Course 2). This course aims at getting students acquainted with relevant state-of-the-art resources for syntactic processing, teaching them how to use and evaluate them, and enabling them to create such resources on their own. Course topics include: symbolic and statistical models for syntactic processing for NLP applications; Resources for syntactic analysis — grammars and their use in parsers; annotated corpora — constituency- and dependency-based treebanks; evaluation measures for inter-annotator agreement and system evaluation. The course will be a combination of lectures and hands-on practice in applying and developing tools for syntactic processing. The lectures are complemented by extensive hands-on exercises. Students will be encouraged to practice and create their own resources.

Module B: Word and Verb Nets for Semantic Processing
(Miriam Butt, Univ. Konstanz and Annette Hautli, Univ. Konstanz)

The course will provide an introduction to existing lexical resources for English such as WordNet, VerbNet and PropBank and why they have proven to be useful for NLE. A WordNet, VerbNet and PropBank for Hindi are currently being created as part of various projects in India, the USA and Germany and the course will use the preliminary versions of the Hindi resources to introduce students to the special structures found in South Asian languages and to discuss where different design decisions need to be made. The course will also show students why it is important to understand established linguistic categories with regard to lexical structure and lexical semantics and how that can help guide the classification and encoding of lexical information in lexical resources in a manner that will be useful to NLE.

Course 2: Advanced Applications for Natural Language Processing

Module A: Statistical Machine Translation
(Alex Fraser, Univ. Stuttgart)

The goal of the course is to have students acquire in depth knowledge of statistical machine translation methods and be familiar with the relevant iterature and an open source statistical machine translation system. The course will cover: Basic statistical modeling for machine translation; Automatic and manual evaluation of machine translation output; Bitext alignment of parallel sentence pairs; Basic phrase-based statistical machine translation models and decoding; Log-linear models and minimum error rate training; Discriminative word alignment; morphological and syntactic modeling.

Module B: Automatic Speech Recognition
(Sarmad Hussain, Univ. of Engineering and Technology)

The course will start by covering articulatory and acoustic phonetics, followed by some basis understanding of speech processing needed to separate the phonetic content from a speech signal. The course will then develop an understanding of the Baysian model for speech recognition and its implementation using Hidden Markov Models, covering both training and decoding algorithms. Finally the course will focus on practical aspects of designing, developing and labeling a speech corpus and using tool-kits to develop speech recognition models. The course will have two labs, first on acoustic phonetics and second on developing a prototype speech recognition system with limited vocabulary.

Download Course Material

Update
Thanks to Rohit Man Amatya, one of the participants of Summer School. He has written installation scripts for debian based systems and provided a list of what needs to be installed for working on the whole course. Plus the solutions for programming assignments.