A Distributional Model of Verb-Specific Semantic Roles Inferences [Supplementary Material]

In a standard view, thematic roles are treated as primitive entities that represent the roles played by the arguments of a predicate. In theoretical linguistics, however, the impossibility to reach a consensus over a primitive set of semantic roles leaded to the proposal of new approaches in which thematic roles are described as a bundle of primitive entities (e.g. Dowty, 1991; Van Valin, 1999) or as structural configurations (e.g. Jackendoff, 1987). In a complementary way, psycholinguistic evidence supports the idea that thematic roles and nominal concepts are represented in similar ways (McRae et al., 1997; Ferretti et al., 2001), thus suggesting that the former can be accounted for as predicate-specific bundles of inferences activated by the semantic of the verb (e.g. the patient of “kill” is typically alive before the event and dead afterwards). Such inferences can take either the form of presuppositions or of entailment relations activated when a filler saturates a specific argument position for a given predicate.

We investigated the feasibility of automatically characterizing the semantic content of verb-specific agent and patient proto-roles as bundles of presuppositions and entailment relations. We evaluated three different implementations of our method against a dataset of human-elicited descriptions collected with a modified version of the McRae et al. (1997)’s paradigm and expanded with lexical knowledge from WordNet. Our primary intent was to test whether and to what extent semantic knowledge automatically extracted from text can be used to infer the kinds of entailments on which semantic roles are grounded. At the same time, by tackling this issue we implicitly provided evidence in favor of the idea that at least part of the knowledge about events manifests itself in the way verbs are used in a communicative environment, and that part of this generalized knowledge can be distilled from the linguistic productions available in corpus.

For more information, see:

G.E. Lebani and A. Lenci (to appear) “A Distributional Model of Verb-Specific Semantic Roles Inferences”. In T. Poibeau and A. Villavicencio (eds), Language, Cognition, and Computational Models, Cambridge University Press.


Speakers’ elicited properties

The following tab-separated UTF8-encoded text files contain the data collected by asking a group of native speakers to characterize the thematic role properties activated by the same set of English verbs described by McRae et al. (1997).

  • role-based features: these descriptions were collected by means of a modified version of the McRae et al. (1997) paradigm. The data file consists of the following fields: the described [Verb]-[Role] pair, a normalized focal [Feature] associated to the target pair, together with its [Frequency] and [Average_Rank] of production with respect to the target pair;
  • expanded role-based features: these descriptions have been obtained by enriching the descriptions in the role-based norms with all their synonyms available in the WordNet (v. 3.1) database. The data file consists of the following fields: the described [Verb]-[Role] pair and a [Feature] associated to the target pair.

Automatically acquired properties

The following archives encode the properties that have been automatically extracted from a coreference-annotated and parsed version of the British National Corpus by three different extraction strategies. Data are stored in tab-separated UTF8-encoded text files consisting of the following fields: the described [Verb]-[Argument] pair, a [Contextual_Construction] associated to the target pair (e.g. “obj-1:reject-v”, to be interpreted as the object of the verb “to reject”, or “sbj-1:free-j” to be interpreted as the subject described by the predicate adjective “free”), their [Frequency] of co-occurrence and the strength of their association, estimated via positive local mutual information ([pLMI]).

  • full model: a coreference-based distributional model enhanced by a syntax-based representation;
  • coreference-based model: a model that relies solely on the information that can be extracted from the coreference chains.


Special thanks to Gaia Bonucelli for taking care of the normalization of the speakers’ elicited properties. This research received financial support from the CombiNet project (PRIN 2010-2011, grant n. 20105B3HE8) funded by the Italian Ministry of Education, University and Research (MIUR).