Lexical and sublexical variables: norms for 626 italian nouns.


Laura Barca*^, Cristina Burani* & Lisa Saskia Arduino*°

* Institute for Cognitive Sciences and Technologies, National Research Council (CNR), Rome

^ University "La Sapienza" of Rome

° University of Milano-Bicocca

In lexvar.xls (format Excel 5.0) several numerical values for lexical and sublexical variables for 626 Italian simple nouns are reported: Age of acquisition, familiarity, imageability, concreteness, adult written frequency, child written frequency, adult spoken frequency, neighborhood size, bigram frequency,length in syllables and in letters. Three non-numerical fields are also included: lexical stress, word initial phoneme (classified as voiced vs. voiceless, and for manner of articulation; and by phonetic features, as well). The mean word naming time for each item (see Barca, Burani & Arduino, 2002) is also reported (RT). The English translation for each Italian word is also reported.

  1. Age of acquisition (AOA) refers to the age at which raters believed they first learned a word either in spoken or written form
  2. Familiarity (FAM) is a subjective frequency measure which estimates how much a word is present in someone's daily life
  3. Imageability (IMAG) is defined in terms of the ability (ease and rapidity) of a word to evoke a mental image (i.e. a visual representation, a sound or some other sensory experience)
  4. Concreteness (CONC):A concrete word is a word that refers to objects, materials, or persons

The mean values for the above mentioned variables were obtained from empirical ratings (on 7-point scales) provided by a total of 176 Italian students from different Rome Universities, 20 to 30 years old, half males and half females. For these four variables standard deviations (s.d.) are also reported.

The following variables are also included:

  1. Adult written word frequency (AdultWrtFQ): One taken from a frequency count based on a written corpus of 1,500,000 occurrences (ILC; Istituto di Linguistica Computazionale, CNR, Pisa, 1989) and another from a written corpus that comprises 3.798.275 lexical occurrences (CoLFIS; Bertinetto et al., 2005;
  2. Child written word frequency or Elementary Lexicon (Child Written), taken from the frequency count (1 million occurrences) by Marconi, Ott, Pesenti, Ratti, & Tavella (1993), subdivided in TotFQ (total frequency values), CompFQ (frequency of the words in the texts read by children), ProdFQ (frequency of the words written by children)
  3. Adult spoken word frequency (AdultSpkFQ), taken from a frequency count based on a corpus of 500,000 occurrences in the spoken language (De Mauro, Mancini, Vedovelli, & Voghera, 1993)
  4. Neighborhood size (NSIZE), which corresponds to the number of words (orthographic neighbors) that differ by one letter with respect to a target word while preserving the identity and position of the other letters. These values were taken, with adjustements, from Baldi & Traficante (2001)
  5. Bigram frequency (BIGR) is a measure of transitional orthographic frequency, that is the frequency of two letters occurring in sequence. Our measure calculates the word mean bigram frequency, extracted from the frequency count by Istituto di Linguistica Computazionale, CNR, Pisa (1979)
  6. Length in syllables (SYL) and Length in letters (LET), automatically extracted from the database by Thornton, Iacobini, & Burani (1994; 1997)
  7. Lexical stress, classified as either stress on the penultimate syllable (p), or stress on the antepenultimate syllable (ap)
  8. Type of word initial phoneme, classified as voiced vs. voiceless (PHON:VOICE), and for manner of articulation (PHON:MANN). The classification is based on the standard pronunciation taken from a dictionary of Italian (Zingarelli, 1985).
  9. Type of word initial phoneme, classified into 13 binomial variables (0 = absence; 1= presence), as follows: one variable for voice (VOICE), five for manner (STOP, NASAL, FRICATIVE, AFFRICATE, and LIQUID), six for place of articulation (BILABIAL, LABIO-DENTAL, DENTAL, ALVEOLAR, PALATAL, and VELAR), and one for vowel (VOWEL). This second classification enables a more detailed characterization of the initial phonemes if relevant (ffor the effects of phonetic characteristics of word initial phoneme, see, e.g., Bates, Burani, D’Amico, & Barca, 2001; Treiman, Mullennix, Bijeljac-Babic, & Richmond-Welty, 1995).
  10. Mean word naming time (RT), see Barca et al. (2002).


Raw values for the variables are reported. Only mean bigram frequency values are transformed on the basis of the natural logarithm.

In Barca et al. (2002) a more detailed treatment of these variables and of their role in lexical processing, with main reference to word reading aloud, may be found. In Barca et al.'s study, several statistics on the database are also reported.

This lexical database is useful for research on lexical processing, and supplies with an instrument for the study, diagnosis and rehabilitation of lexical diseases in Italian patients.

For further information, please contact either Dr. Laura Barca, e-mail:, or Dr. Cristina Burani, e-mail:



