Nouns slow down speech across structurally and culturally diverse languages

Authored by pnas.org and submitted by the_phet

Significance When we speak, we unconsciously pronounce some words more slowly than others and sometimes pause. Such slowdown effects provide key evidence for human cognitive processes, reflecting increased planning load in speech production. Here, we study naturalistic speech from linguistically and culturally diverse populations from around the world. We show a robust tendency for slower speech before nouns as compared with verbs. Even though verbs may be more complex than nouns, nouns thus appear to require more planning, probably due to the new information they usually represent. This finding points to strong universals in how humans process language and manage referential information when communicating linguistically.

Abstract By force of nature, every bit of spoken language is produced at a particular speed. However, this speed is not constant—speakers regularly speed up and slow down. Variation in speech rate is influenced by a complex combination of factors, including the frequency and predictability of words, their information status, and their position within an utterance. Here, we use speech rate as an index of word-planning effort and focus on the time window during which speakers prepare the production of words from the two major lexical classes, nouns and verbs. We show that, when naturalistic speech is sampled from languages all over the world, there is a robust cross-linguistic tendency for slower speech before nouns compared with verbs, both in terms of slower articulation and more pauses. We attribute this slowdown effect to the increased amount of planning that nouns require compared with verbs. Unlike verbs, nouns can typically only be used when they represent new or unexpected information; otherwise, they have to be replaced by pronouns or be omitted. These conditions on noun use appear to outweigh potential advantages stemming from differences in internal complexity between nouns and verbs. Our findings suggest that, beneath the staggering diversity of grammatical structures and cultural settings, there are robust universals of language processing that are intimately tied to how speakers manage referential information when they communicate with one another.

Human language in its most widespread form (i.e., in spontaneously spoken interactions) is locked in one-dimensional time. This was recognized by the founding father of modern linguistics, Ferdinand de Saussure, as one of the two fundamental principles of the linguistic sign, the other one being its arbitrary nature (1, 2). An unresolved question is which aspects of local variation in speech rate are universal (3, 4), which vary across languages and cultures (5), and which vary across individuals (6). For example, marking the end of utterances by slowing down speech is cross-linguistically common, but its implementation is language-specific (7). Good candidates for truly universal temporal features are the relatively fast pronunciations of frequent, and thus predictable, words (8) and second mentions of words (9). This speedup is argued to result from automated articulation (4) and has been suggested to contribute to efficient communication by spreading information more evenly across the speech signal (10, 11). Frequency effects also explain why function words, such as articles, prepositions, and pronouns, are pronounced faster than the less frequently occurring content words, such as nouns and verbs (12).

An aspect of speech rate that has received less attention is the local speech rate during the planning, rather than the actual pronunciation, of words. Speed variation before the articulatory onset of a word can provide key evidence for cognitive processes. For example, speakers have been found to slow down their speech rate before complex, infrequent, or novel words (13, 14), a finding that is consistent with the slowdown in lexical access speed that such words trigger in picture naming and related tasks (15⇓–17). Here, we investigate speech rate in word-planning windows in naturalistic speech from nine languages to assess differences in the two major word classes usually found in languages: nouns and verbs. To our knowledge, the relative speedup or slowdown of speech preceding nouns versus verbs has never been directly studied. Related measures like response times in picture-naming experiments suggest that nouns require less planning time than verbs (18, 19). This is attributed to increased planning costs of verbs because of their relative grammatical and semantic complexity and their links with other elements in the clause, for example, subjects and objects. While it is unclear to what extent the planning demands of a word leave traces in the speed of its own articulation (20), these findings are potentially in conflict with studies suggesting slower rates for nouns than verbs in English noun/verb homophones (such as a fly vs. to fly) (21).

A factor that has been neglected in this research is how referential information is managed in connected, interactive speech. In running speech, the choice between referring expressions (e.g., between a noun like the teacher and a pronoun like she) is subject to complex, multidimensional decision procedures which involve various internal and audience-oriented processing mechanisms (22⇓⇓–25) and are shaped both by general pragmatic principles (26, 27) and by language-specific and cultural factors (22, 28, 29). What emerges as a cross-linguistically stable pattern, however, is that the use of nouns typically signals the newness of a referent (e.g., a new person or object introduced into the discourse), a new temporal or local setting, the need to disambiguate between referents, or a shift in discourse topic or perspective (30). In all other contexts, pronouns (I saw the teacher, he [the teacher] was tired) or gaps (The teacher came in and [the teacher] sat down) are highly preferred (31, 32). Verbs are fundamentally different in this regard: Even if the same actions or states are referred to repeatedly, a verb is typically still necessary to form a complete sentence. In line with this, languages do not generally have “pro-verbs” to systematically replace verbs as pronouns do for nouns. While the generic nature of some verbs (e.g., to do) occasionally brings them close to such a function, this is usually confined to highly constrained syntactic contexts (as in Susan drank wine and so did Mary). Similarly, verbs can occasionally be gapped in some languages (Susan drank wine and Mary beer), but this is again subject to special syntactic constraints. In general, the use of verbs is thus the default option, regardless of the information status of the actions or states referred to, while the use of nouns is a marked option that is felicitous only in contexts of information novelty, disambiguation needs, or topic and perspective shifts. Given these additional constraints on the use of nouns, their use should correlate with a higher planning cost, slowing down speech before the noun.

Here, we aim to settle not only the question of the direction of the effect of subsequent noun versus verb use on speech rate, but also its universality. For this we use time-aligned corpora of naturalistic speech from multimedia language documentations (33). To ensure linguistic and cultural diversity, we chose a set of such corpora from languages spoken in the Amazonian rainforest (Bora and Baure), Mexico (Texistepec), the North American Midwest (Hoocąk), Siberia (Even), the Himalayas (Chintang), and the Kalahari Desert (Nǁng) (Fig. 1). These seven corpora were compiled during on-site fieldwork over the past 25 y and were transcribed, translated, and annotated with word class tags by experts on the languages in collaboration with native speakers. They document naturalistic speech of various genres, including narratives, descriptive texts, and conversations, that were recorded in their original, interactive settings, such as the recording of a Bora myth illustrated in Fig. 2. While the genres covered by the corpora are diverse, all data are comparable in that they document speech which is spontaneously produced, not read out or memorized, even if texts stem from local oral traditions. We additionally used relevant sections of published corpora of spoken Dutch and English, which likewise document naturalistic spoken language annotated for word class by experts.

Fig. 1. Location of the nine languages and size of the corpora studied here. For detailed information, see SI Appendix, Table S1.

Fig. 2. Bora utterance illustrating slow articulation and presence of a pause before a noun compared with fast articulation and no pause before a verb. The example translates as “After you bit my father, he died” and is taken from a Bora mythological narrative, available online at https://hdl.handle.net/1839/00-0000-0000-000C-DFBE-1. (a) Waveform of audio signal; (b) time-aligned transcription of words; (c) word-by-word translation; (d) word class N = noun vs. V = verb vs. X = other; (e) position of word within utterance from 0 = start to 1 = end; (f) z-normalized word length calculated as SDs from mean word length in the language; (g) preword context windows for the noun llihíyoúvuke “my father” and the verb ds j veébe “he died,” adjusted in size to word boundaries close to 500 ms before onset of target words (preword window for hdóneri “after biting” not shown here); (h) length of preword context windows; (i) articulation rate of words (excluding pauses) within preword context windows; and (j) presence vs. absence of pauses within preword context windows. Procedures for time-aligning transcriptions and for determining position, word length, and context window size are described in Materials and Methods.

To assess the effects of subsequent noun versus verb use on speech rate, we used the word-class category of the lexical root contained in a word, as identified by language-specific criteria, even though individual words may be nominalized or verbalized (in our data, this occurs in less than 5% of nouns and verbs). This captures more closely the distinction between “object words” and “action words,” which is known to be more relevant to language processing than the syntactic surface categories of words (18, 34). We investigated speedup versus slowdown effects of nouns versus verbs in time windows of ∼500 ms preceding their onset (Materials and Methods and Fig. 2). This window size was set following picture- and action-naming studies that have shown that planning a single content word takes around 600 ms (35). Slowing down speech can have two independent effects (36), which we investigated in two separate studies: (i) slower articulation of words, measured as phonological segments (approximated by orthographic characters) per second (37) for all words within the time window preceding a noun or verb, and (ii) higher probability of pauses within such windows, as indicated by the presence of at least one interval ≥150 ms without articulation or with articulation of fillers only (such as English uhm) (Materials and Methods). We analyzed both measures with generalized linear mixed-effects models with the word class (noun vs. verb) of the target word as the main predictor of interest. We controlled for potential slowdown at the end of utterances by including the target word’s position within the utterance, as well as the target word’s length. Our models furthermore took into account random effects caused by idiosyncrasies of individual speakers, recording sessions, and individual word forms. Inclusion of word forms takes care of the expected speedup associated with frequent and predictable items, since frequency and predictability are properties of individual word forms (38, 39) (Materials and Methods). Modeling the entire dataset revealed a significant interaction between language and the effect of word class, and we therefore fitted individual but comparable models to each language separately.

Results and Discussion Results are summarized in the effect displays in Fig. 3, showing that all nine languages exhibit a significant slowdown before nouns compared with verbs with respect to at least one of our two ways of measuring slowdown. Only one language (English) exhibits a significant slowdown before verbs, and only when measured in terms of pause probability (see SI Appendix, Supplementary Text for details). The overall tendency for slowdown before nouns is striking because the culturally and linguistically vastly diverse populations in our sample display remarkable differences in many respects; for example, in overall speed and the range of variation (Fig. 3 and SI Appendix, Table S6). For instance, Hoocąk speakers articulate more slowly and pause more often in the context of both nouns and verbs than Dutch speakers do. Language or culture-specific facts may also mask the observed effect in individual studies for individual languages. For instance, Nǁng words are so short (on average 4.61 segments per word) that there is little room for differences in articulation rate within words. We have presently no explanation for the exceptional behavior of English regarding pauses, except for speculating that English noun planning might be “easier” because the gap option (as opposed to the pronoun option) is far less common than in the other languages, reducing choice efforts. Another possibility is that our English corpus is based on telephone rather than face-to-face interactions, but evidence so far suggests that speakers are not strongly influenced by the visual presence of listeners in reference production (9, 23). Whatever the reason, this result highlights the need for a diverse sample, such as that represented here, including languages other than English, which has been found to be exceptional in other studies also (40). Fig. 3. Speech rate in contexts before nouns versus verbs. The effect displays show a cross-linguistic tendency for slower articulation before nouns (A) and a higher probability of pauses before nouns (B). The effect of word class (nouns vs. verbs) is plotted according to (generalized) linear mixed-effects models, with 95% confidence intervals based on these models. Both studies are based on models that are consistent across the individual languages, controlling for word position and word length as fixed factors and including random intercepts for speaker, text, and word type. The models for articulation speed included an additional interaction between word class and position, but A shows the overall effects of word class, averaging over positions, to simplify the visual representation (Materials and Methods and SI Appendix, Supplementary Text). Levels of statistical significance are indicated as *P < 0.05; **P < 0.01; ***P < 0.001; and n.s. (not significant) > 0.05. The overall results, based on models with data from all nine languages taken together, show that, across our diverse sample, the slowdown effect before nouns prevails: Regarding articulation, the effect is small but robust, causing around 3.5% slower articulation rate before nouns than before verbs, despite strong variation overall and a few exceptions found in specific utterance positions in individual languages (see SI Appendix, Supplementary Text for details). Regarding pauses, across all nine languages, the probability of pauses before nouns is about 60% greater than before verbs, and, in the majority of languages, the odds of pauses before nouns are about twice as high than before verbs (see SI Appendix, Supplementary Text for details). Compared with other factors, the effect of word class is also surprisingly strong: In statistical models of all our data taken together, this effect is about two times stronger than the effect of a target word’s length and more than eight times stronger than the effect of its position within the utterance (SI Appendix, Tables S9, S20, S33, and S44).

Conclusion Our results from naturalistic speech contradict experimental studies showing faster planning of nouns (18, 19) and thus suggest that the effect of referential information management overrides potential effects of higher processing costs of verbs. As such, these results resonate with earlier findings of cross-linguistic parallels in the timing of turn taking (5, 41) and point to strong universals of language processing that are grounded in how humans manage information. But our present findings indicate that speech rate variation is universally constrained also at a fine-grained level, within turns and depending on which kinds of content words are used: Pragmatic principles of noun use and the slowdown associated with new information converge to create a uniform pattern of speech rate variation across diverse languages and cultures. Our finding has several implications. First, models of language processing need to more systematically incorporate aspects of information management in interactive speech (41⇓–43). Second, while speech rate in corpora is mostly studied in terms of the articulation of a word, speech rate variation before words of different types is a measure with great potential to gain insights into the mechanisms of language production. Third, naturalistic corpus studies on widely diverse languages allow detection of signals that do not suffer from the sampling bias in much of current theorizing about language and speech (33, 44). Most such work is still largely based on educated speakers of a small number of mostly Western European languages, and it remains unclear whether findings generalize beyond this (40, 45, 46). Finally, by revealing patterns linked to specific word classes, our finding opens avenues for explaining how grammars are shaped through the long-term effects of fast pronunciation, such as phonological reduction (47) and the emergence of grammatical markers (4). In particular, slower speech and more pauses before nouns entail a lower likelihood of contraction of independent words. This explains the fact that cross-linguistically fewer function words become fused as prefixes to nouns than as prefixes to verbs, a fact so far little understood (48).

Acknowledgments We thank all native speakers that provided data and all assistants that helped annotate the data. We acknowledge the comments of Damián E. Blasi, Sebastian Sauppe, Volker Dellwo, and Sabine Stoll. The research of F.S. and J.S. was supported by grants from the Volkswagen Foundation’s Dokumentation Bedrohter Sprachen (DoBeS) program (80 110, 83 522, 86 292, and 89 550) and the Max Planck Institute for Evolutionary Anthropology; the research of S.W. was supported by a European Research Council (ERC) Advanced Grant [MesAndLin(g)k, Grant 295918, Principal Investigator W. Adelaar] and by a subsidy of the Russian Government to support the Programme of Competitive Development of Kazan Federal University. B.P. is grateful to Laboratoires d’Excellence (LABEX) Advanced Studies on Language Complexity (ASLAN) (ANR-10-LABX-0081) of the Université de Lyon for its financial support within the program “Investissements d’Avenir” (ANR-11-IDEX-0007) of the French government operated by the National Research Agency (ANR). The research of B.B. was supported by Swiss National Science Foundation Grant CRSII1_160739 (“Linguistic Morphology in Time and Space”).

Footnotes Author contributions: F.S., J.S., and B.B. designed research; J.S. performed research; F.S., J.S., S.D., I.H., B.P., S.W., A.W.-M., and B.B. analyzed data; F.S., N.H.d.J., and B.B. wrote the paper with input from all other authors; and J.S. produced the SI Appendix with input from all other authors.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

Data deposition: The complete datasets used in this study are available at https://figshare.com/s/085b09d7d82b5501df4e.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1800708115/-/DCSupplemental.

willis936 on May 16th, 2018 at 12:10 UTC »

Are there not many, many, many more nouns than verbs? And honestly nouns are more useful than verbs in conveying complex ideas. It would make sense that being able to grab the correct noun would be harder than the correct verb.

yes_its_him on May 16th, 2018 at 12:08 UTC »

There are many more nouns than verbs, so the choice of what noun to use would seem to require a greater act of decision-making.

Edit: this is particularly the case for something like proper nouns, whereas there are no "proper verbs." Coming up with the correct person's name or geographic location is probably a more complex task than remembering the verb for "to run."

Alimbiquated on May 16th, 2018 at 10:02 UTC »

You can lose nouns completely with a stroke in the right part of your brain. Also when you get older you start swapping out one noun for another. It is a lot less effort.

I wonder how it's related to the complicated noun classification schemes in some languages.