Damir Cavar is an Associate Professor at Indiana University in Bloomington. He is a computational linguist who lived and studied in Germany (Frankfurt a.M., Potsdam), worked at the University of Hamburg, Technical University of Berlin, and various other Universities and organizations in Europe and the US. His main research interest is in Natural Language Processing that involves deep linguistic processing and in particular pragmatics and semantics. He is also interested in multi-modal processing of speech signal properties, visual input, and language processing for the semantic analysis of utterances, text, or conversation and dialog.
In this project, I used natural language processing components and knowledge representations (entities) provided by the Wolfram Language and a natural language processing RESTful API using JSON-NLP to process deep linguistic properties of text: identifying named entities, tagging parts of speech and lemmatizing sentences, extracting core semantic entity relations, using constituent and dependency parse trees, integrating anaphora and coreference resolution, detecting active and passive clauses, and identifying the scope of negation for semantic interpretation. I show how these technologies can be used to provide advanced linguistic annotations that can help extracting knowledge graph representations from unstructured text or to validate truth values of utterances.
Summary of Results
I have successfully integrated into the Wolfram Language, using a RESTful API, a state-of-the-art neural network–based constituent and dependency parser, two neural parts-of-speech taggers, one lemmatizer, one neural coreference and anaphora resolution system, a predicate disambiguator with output of WordNet IDs, and natural language processing middleware using JSON-NLP. I show how the negation scope can be computed using constituent parse trees, one of the core problems in current text-mining applications. I also show how predicate argument structures for entity-relation extraction from dependency parse trees can be used to generate semantic representations, which after anaphora resolution for dereferencing can be linked to Wolfram Language entities.
The components need to be fully integrated and some of the sketched approaches need to be developed in a more generic way. Various other technologies that I mention in the project notebook would need to be integrated. I am interested in integrating much more lexical semantics resources, symbolic (and probabilistic) unification algorithms, grammar engineering environments, and conversion of language (and other information sources) to knowledge representations for semantic processing, validation, and pragmatic reasoning.