Research

  • Distributional Memory (DM) – a general distributional semantic model, developed in collaboration with Marco Baroni.
  • LexIt – on line database developed at the CoLing Lab, containing automatically corpus-derived information on the argument structure properties of Italian verbs.
  • Voci della Grande Guerra – two-year project, funded by the Special Mission for the Celebrations of the 100th Anniversary of World War I at the Presidenza del Consiglio dei Ministri of the Italian Government, to build an annotated corpus of digital texts representative of the different ways to experience and describe the Italian war.
  • Word Combinations in Italian – Theoretical and descriptive analysis, computational models, lexicographic layout and creation of a dictionary – a 3-year project funded by the Italian Ministry of Research (PRIN 2010/2011), coordinated by Raffaele Simone (University of Rome 3). The goal of CoLing Lab is to develop advanced computational linguistics methods for the extraction of distributional information from text corpora. The project will end in February 2016.
  • SEMPLICE – SEMantic instruments for PubLIc administrators and CitizEns – a 2-year project funded by Regione Toscana in collaboration with IT companies to develop NLP-based tools for knowledge management, information extraction and opinion mining for local public administrations.
  • BLIND – Semantic representations in congenital blind subjects – a 2-year project funded by the Italian Ministry of Research (PRIN 2008), in collaboration with Giovanna Marotta (University of Pisa, Project Director), Pietro Pietrini (University of Pisa), and Marco Baroni (University of Trento). The overall goal of the project was to conduct linguistic, computational and neuro-cognitive analyses of semantic representations in the congenitally blind.
  • Paisà – Piattaforma per l’Apprendimento dell’Italiano Su corpora Annotati – a 3-year project funded by the Italian Ministry of Research (Firb 2007), in collaboration with University of Bologna (Project Director Sergio Scalise), ILC-CNR, University of Trento and Eurac (Bolzano). The project has built a large, freely available, richly annotated corpus of Italian, and lexical databases that will be automatically acquired from it.
  • Semawiki – 2-year project funded by the Fondazione Cassa di Risparmio di Pisa. The project has developed various computational tools and resources for Italian NLP, and was carried out in collaboration with the Department of Computer Science of the University of Pisa and ILC-CNR.