Research | CoLing Lab

Distributional Memory (DM) – a general distributional semantic model, developed in collaboration with Marco Baroni.
LexIt – on line database developed at the CoLing Lab, containing automatically corpus-derived information on the argument structure properties of Italian verbs.
Text2Query – Deep Learning models for Big Data analysis through Natural Language
Event Extraction for Fake News Detection a project focused on the development of a system for fake news detection by means of a graph-based representation of news events and actors. The project is in collaboration with the Computer Science and Artificial Intelligence Lab of the Massachussets Institute of Technology (CSAIL-MIT)
MUSE – MUltimodal Semantic Extraction – the goal of the project is the semantic multimodal analysis of both texts and images by exploiting Natural Language Processing and Computer Vision techniques. The project is in collaboration with the Company BNova s.r.l. (POR FSE 2014-2020 Asse A)
UBIMOL – UBIquitous Massive Open Learning – the project aims at developing an E-learning platform enriched with innovative NLP technologies able to offer personalized courses. The project involves the companies M.E.T.A. Srl, 01Sistemi Srl, VIDITRUST Srl, PERSAFE Srl and the research partners ILC-CNR and CoLing Lab (POR FESR 2014 – 2020).
Voci della Grande Guerra – two-year project, funded by the Special Mission for the Celebrations of the 100th Anniversary of World War I at the Presidenza del Consiglio dei Ministri of the Italian Government, to build an annotated corpus of digital texts representative of the different ways to experience and describe the Italian war.
Word Combinations in Italian – Theoretical and descriptive analysis, computational models, lexicographic layout and creation of a dictionary – a 3-year project funded by the Italian Ministry of Research (PRIN 2010/2011), coordinated by Raffaele Simone (University of Rome 3). The goal of CoLing Lab is to develop advanced computational linguistics methods for the extraction of distributional information from text corpora. The project will end in February 2016.
SEM – Il Chattadino – a 2-year project funded by Regione Toscana in collaboration with IT companies to develop a chatbot to query services and documents in the Public Administration (POR-CReO FESR 2014 – 2020 – Bandi RS 2017)
SEMPLICE – SEMantic instruments for PubLIc administrators and CitizEns – a 2-year project funded by Regione Toscana in collaboration with IT companies to develop NLP-based tools for knowledge management, information extraction and opinion mining for local public administrations.
BLIND – Semantic representations in congenital blind subjects – a 2-year project funded by the Italian Ministry of Research (PRIN 2008), in collaboration with Giovanna Marotta (University of Pisa, Project Director), Pietro Pietrini (University of Pisa), and Marco Baroni (University of Trento). The overall goal of the project was to conduct linguistic, computational and neuro-cognitive analyses of semantic representations in the congenitally blind.
Paisà – Piattaforma per l’Apprendimento dell’Italiano Su corpora Annotati – a 3-year project funded by the Italian Ministry of Research (Firb 2007), in collaboration with University of Bologna (Project Director Sergio Scalise), ILC-CNR, University of Trento and Eurac (Bolzano). The project has built a large, freely available, richly annotated corpus of Italian, and lexical databases that will be automatically acquired from it.
Semawiki – 2-year project funded by the Fondazione Cassa di Risparmio di Pisa. The project has developed various computational tools and resources for Italian NLP, and was carried out in collaboration with the Department of Computer Science of the University of Pisa and ILC-CNR.