Sample Proposal: Quantitative Reasoning

College of Liberal Arts

Please direct all questions about the flag proposal process to the Center for the Skills & Experience Flags.

LIN 350 Computational Semantics

Department of Linguistics

Please give a brief description of the course. Include a description of the specific Quantitative Reasoning skills that students will learn and apply within this course.

This course covers two very different types of approaches in computational semantics: distributional modeling, and logical form. Distributional modeling is about modeling word meanings through the contexts in which the words have been observed. The students will count word co-occurrences in large text resources (10 million words or more) to compute a meaning representation for a word from its observed contexts. They will use these representations, which are high-dimensional vectors of real numbers, to draw conclusions about semantic similarity between words, possible word associations that people may form, and semantic relations between words. The students will also learn about machine learning models that produce similar word vectors, in particular deep learning. The students will learn the mathematics behind these models (matrix operations mostly; I will sketch but not go into detail on how to use derivatives in order to train these models via gradient descent), and they will also learn to use these models. In the second half of the course, the students will learn to use first-order logic to describe the meaning of sentences. They will practice defining logical forms for words in such a way that the logical forms for smaller phrases can be combined into the logical forms of larger phrases using typed lambda calculus. They will learn how theorem provers determine the validity of a formula (in enough detail that they can do a small example of Robinson resolution by hand), and they will learn to use theorem provers to draw inferences over natural language text.

Courses that carry the Quantitative Reasoning Flag must emphasize how QR skills can be applied in students’ everyday or professional lives. Please describe the kinds of applications the course uses to teach Quantitative Reasoning. Specific examples from assignments or exams are strongly encouraged.

We focus on problems in language technology, such as: How can we extract relevant knowledge about diseases, their causes and cures from large amounts of text data (information extraction)? How can a machine equipped with a database of flight schedules help a person plan a flight, interacting in human language? One problem that we will study in depth is textual entailment: Given two sentences, would a person reading the first sentence conclude that the second also holds? This problem is at the heart of a number of language technology tasks, including information extraction. Example assignment: Determine the top 10 most similar words for each of the following target words, which all occur 50 times or more in the corpus. beauty-NN fire-NN heart-NN dare-VB throw-VB conceal-VB quiet- JJ evil-JJ apparently-RB slowly-RB In your solution, include these lists of 10 most similar words. Also include a discussion of what you see: • Do you detect effects of the genre (gothic-ish stories) in these word lists? • Inspecting only the top most similar word for each target, and only for the PPMI space, how often do you see synonyms of the target words? co-hyponyms? antonyms? You do not need to use a dictionary here, it will suffice if you label the words yourself. But you can use WordNet if you would like. ————— Other homework assignment, about topic modeling: This technique detects “topics” (characterized as probability distributions over typical words) in a collection of documents. For this problem, you will collect a very small corpus of 20 documents yourself. Choose a current controversial issue. On the web, find 20 opinion texts on this issue. They can be editorials, blog posts, or other texts that you can find. Make sure that you collect texts that argue different opinions on the issue. Copy and paste these texts into separate files or Python strings, making sure that you only grab the text itself and no surrounding material. Now preprocess your 20 texts like you did with the State of the Union addresses – including cutting them up into smaller chunks–, and turn them into a gensim corpus. Then use topic modeling. Again, experiment with the number of topics until you find topics that are interesting and interpretable. Do the topics reflect some of the main arguments? Are there topics that are prevalent across different opinions on the issue? Or are the topic distributions completely disjoint for texts that reflect different opinions on the issue? Are there subtleties of the texts that bag-of-words topics are too coarse-grained to capture?

Courses that carry the Quantitative Reasoning Flag go beyond a superficial application of equations and strive for understanding of the underlying concepts. Please describe how you teach and assess conceptual understanding of Quantitative Reasoning. Specific examples from class, assignments, or exams are strongly encouraged.

In class, I describe the mathematical concepts underlying the semantic models in depth: vector spaces and cosine similarity, Latent Dirichlet Allocation for topic modeling, first-order logic and lambda calculus. In the first half of the course, I assess conceptual understanding by having the students use the concepts to build semantic representations of their own, where they have to understand the underlying concepts in order to program their models. In the second half, typical homework assignments are more theoretical. Here are two examples. —- First-order logic sentences are evaluated over models that consist of a domain D and an interpretation function V . Here is an interpretation function, and a domain. Are the following sentences true or false in this model? —— Here is a lexicon associating words with logical forms in typed lambda calculus. For each of the following sentences, sketch the phrase structure analysis, and associate each node in the syntactic structure with its semantic representation. Please show both the lambda expression and the semantic type at each node. Perform beta reduction as often as possible in each step. (Please ignore issues of tense.) (a) John sleeps (b) Every rabbit sleeps © A duck bit every rabbit.

To satisfy the Quantitative Reasoning Flag, at least half of the course grade must be based on the use of quantitative skills. Please describe the course grading scheme in such a way that clearly demonstrates at least half of the grade requires Quantitative Reasoning. Denote which components require Quantitative Reasoning and the total grade percentage these comprise.

The course grade will be determined based on homeworks and one final course project. All of the homeworks, as well as the final project, will consist entirely of the kinds of quantitative reasoning tasks described above.

college_school