Figure 1: Relationships between text types, conventional genres, and textual genres.
Figure 2: Proportion of paragraphs containing direct speech, travelogues versus novels.
Figure 3: Number of words per page for a sample of 100 pages.
Figure 4: Number of words for the full texts of 129 works carrying the label “novela”.
Figure 5: Number of pages and words for the bibliographic entries of 252 works carrying the label “novela”.
Figure 6: Number of words for 381 works carrying the label “novela”.
Figure 7: Number of words for 65 works carrying the label “novela corta”.
Figure 8: Works by source. Left: candidates, right: entries in the bibliography.
Figure 9: Inclusion and reasons for exclusion of works.
Figure 10: Kinds of subgenres in the context of a discursive model.
Figure 11: Sources by institution.
Figure 12: Sources by file type and institution.
Figure 13: Sources by type of edition and type of institution.
Figure 14: Distribution of spelling errors without exception words.
Figure 15: Distribution of spelling errors without exception words (logarithmic scale).
Figure 16: Top 30 spelling errors.
Figure 17: Number of error tokens and types covered by exception lists.
Figure 18: Distribution of spelling errors with exception words.
Figure 19: Distribution of error tokens and types for the corpus files (absolute).
Figure 20: Distribution of error tokens and types for the corpus files (relative).
Figure 21: Distribution of error tokens and types for the corpus files (by type of source edition).
Figure 22: Distribution of error tokens and types for the corpus files (by source file type).
Figure 23: Distribution of error tokens and types for the corpus files (by source institution).
Figure 24: Death years of authors.
Figure 25: Years of the novels' first publications.
Figure 26: Publication years of basis editions.
Figure 27: Copyright statuses of the novels in the corpus.
Figure 28: Characterization of the direct speech annotated in the corpus.
Figure 29: Pages with direct speech from “Libro extraño” by Francisco Sicardi, with initial speech signs (left page) and without speech signs (right page).
Figure 30: Scores for direct speech recognition (gold standard versus regular expression approach).
Figure 31: F1 scores for direct speech recognition by kind of edition.
Figure 32: F1 scores for direct speech recognition by type of speech sign.
Figure 33: Verb forms with enclitic pronouns in the novels of the corpus.
Figure 34: FreeLing POS of verb forms with enclitic pronouns.
Figure 35: FreeLing POS of verb forms with enclitic pronouns in the texts of the corpus.
Figure 36: Proportions of zero values in MFW feature sets.
Figure 37: Distribution of zero values in MFW100.
Figure 38: Distribution of zero values in MFW5000.
Figure 39: Variances of the 1000 MFW (absolute values).
Figure 40: Variances of the 1000 MFW (tf-scores).
Figure 41: Variances of the 1000 MFW (tf-idf-scores).
Figure 42: Variances of the 1000 MFW (z-scores).
Figure 43: Mean coherence of the topic models with different parameter settings.
Figure 44: Example topics.
Figure 45: Frequency of rank 1 for different values of
n_neighbors
(KNN).
Figure 46: Frequency of rank 1 for different values of weights
(KNN).
Figure 47: Frequency of rank 1 for different values of metric
(KNN).
Figure 48: Frequency of rank 1 for different values of C
(SVM).
Figure 49: Frequency of rank 1 for different values of
max_features
(RF).
Figure 50: Classification workflow.
Figure 51: Primary thematic subgenres in the corpus.
Figure 52: Classification results for topic feature sets (SVM, varying number of topics, and optimization intervals).
Figure 53: Feature weights (topics) for historical versus sentimental novels.
Figure 54: Most distinctive topics for historical versus sentimental novels.
Figure 55: Topics “v_d-instante-corazón” and “tía-do-aire”.
Figure 56: Feature weights (topics) for novels of customs versus historical novels.
Figure 57: Topics “mesa-puerta-sala” and “boca-cabeza-perro”.
Figure 58: Feature weights (topics) for novels of customs versus sentimental novels.
Figure 59: Predictions for novela histórica versus other novels (topics).
Figure 60: Top topics for novela histórica versus other novels in the novel “La cruz y la espada”.
Figure 61: Top topics for novela histórica versus other novels in the novel “Las gentes que son así”.
Figure 62: Top topics for novela histórica versus other novels in the novel “Los bandidos de Río Frío”.
Figure 63: Top topics for novela histórica versus other novels in the novel “Los esposos”.
Figure 64: Top topics for novela histórica versus other novels in the novel “Vía Crucis”.
Figure 65: Top topics for novela histórica versus other novels in the novel “Las ranas pidiendo rey”.
Figure 66: Predictions for novela sentimental versus other novels (topics).
Figure 67: Predictions for novela de costumbres versus other novels (topics).
Figure 68: Classification results for MFW feature sets (RF, varying number of MFW and normalization technique).
Figure 69: Classification results for word n-gram feature sets (RF, varying number of MFW, grams, and normalization technique).
Figure 70: Classification results for classic character n-gram feature sets (RF, varying number of MFW, grams, and normalization technique).
Figure 71: Classification results for “word” character n-gram feature sets (RF, varying number of MFW, grams, and normalization technique).
Figure 72: Classification results for "affix-punct" character n-gram features sets (RF, varying number of MFW, grams, and normalization technique).
Figure 73: Primary literary currents in the corpus.
Figure 74: Classification results for topic feature sets (SVM, varying number of topics and optimization intervals).
Figure 75: Classification results for MFW feature sets (SVM, varying number of MFW and normalization technique).
Figure 76: Classification results for word n-gram feature sets (SVM, varying number of MFW, grams, and normalization technique).
Figure 77: Classification results for classic character n-gram feature sets (SVM, varying number of MFW, grams, and normalization technique).
Figure 78: Classification results for “word” character n-gram feature sets (SVM, varying number of MFW, grams, and normalization technique).
Figure 79: Classification results for “affix-punct” character n-gram feature sets (SVM, varying number of MFW, grams, and normalization technique).
Figure 80: Feature weights (MFW) for realist versus romantic novels.
Figure 81: Feature weights (MFW) for naturalistic versus realist novels.
Figure 82: Predictions for novela romántica versus other novels (MFW).
Figure 83: Predictions for novela realista versus other novels (MFW).
Figure 84: Predictions for novela naturalista versus other novels (MFW).
Figure 85: Subcorpus for the family resemblance analysis.
Figure 86: Examples of topics for the family resemblance analysis.
Figure 87: Network of historical novels based on topics (HIST).
Figure 88: Overview of cluster metadata in the network HIST.
Figure 89: Clusters by year in the network HIST.
Figure 90: Topic scores for cluster 3 in the network HIST.
Figure 91: Top distinctive topics in the clusters of the network HIST.
Figure 92: Clusters by year in the network SENT.
Figure 93: Clusters by subgenre in the combined network.
Figure 94: Number of works per author.
Figure 95: Number of editions per author.
Figure 96: Authors by country.
Figure 97: Authors by nationality.
Figure 98: Authors by country of birth.
Figure 99: Authors by country of death.
Figure 100: Author gender.
Figure 101: Knowledge of the authors’ life dates.
Figure 102: Births and deaths of authors by decade.
Figure 103: Authors alive per year.
Figure 104: Number of active authors per year.
Figure 105: Author ages when publishing novels.
Figure 106: Authors’ age at death.
Figure 107: Number of works per year in Bib-ACMé and Conha19.
Figure 108: Works by decade in Bib-ACMé and Conha19.
Figure 109: Works before and after 1880.
Figure 110: Works by decade and country.
Figure 111: Works by country in Bib-ACMé and Conha19.
Figure 112: Publication countries of first editions.
Figure 113: High and low prestige novels by country.
Figure 114: High and low prestige novels by decade.
Figure 115: High and low prestige novels before and in or after 1880.
Figure 116: Narrative perspective by country.
Figure 117: Narrative perspective by decade.
Figure 118: Narrative perspective before and in or after 1880.
Figure 119: Continent and country of the setting.
Figure 120: Continent of the setting by country.
Figure 121: Continent of the setting per decade.
Figure 122: Continent of the setting before and in or after 1880.
Figure 123: Time periods of the setting relative to the authors’ birth year and publication year.
Figure 124: Time periods of the setting by country.
Figure 125: Time period of the setting per decade.
Figure 126: Time period of the setting before and in or after 1880.
Figure 127: Length of the novels in the corpus.
Figure 128: Length of the novels by country.
Figure 129: Length of the novels per decade.
Figure 130: Number of editions per work in Bib-ACMé and Conha19.
Figure 131: Editions per year in Bib-ACMé and Conha19.
Figure 132: Editions per decade in Bib-ACMé and Conha19.
Figure 133: Editions before and in or after 1880.
Figure 134: Editions by country in Bib-ACMé and Conha19.
Figure 135: Editions by place of publication in Bib-ACMé and Conha19.
Figure 136: Works with the label “novela” by decade.
Figure 137: Top 20 most frequent explicit subgenre labels in the bibliography.
Figure 138: Top 20 most frequent explicit subgenre labels in the corpus.
Figure 139: Works with an “identity label” by decade.
Figure 140: Top 20 most frequent subgenre signals in the bibliography.
Figure 141: Top 20 most frequent subgenre signals in the corpus.
Figure 142: Top 20 most frequent literary historical subgenre labels in the bibliography.
Figure 143: Top 20 most frequent literary historical subgenre labels in the corpus.
Figure 144: Number of different subgenre labels on discursive levels (in Bib-ACMé).
Figure 145: Overall number of subgenre labels on discursive levels (in Bib-ACMé).
Figure 146: Thematic subgenre labels in Bib-ACMé and Conha19.
Figure 147: Sources of thematic subgenres in Bib-ACMé.
Figure 148: Number of thematic labels per work.
Figure 149: Primary thematic subgenres of the works.
Figure 150: Subgenre labels related to literary currents in Bib-ACMé and Conha19.
Figure 151: Sources of subgenre labels related to literary currents in Bib-ACMé.
Figure 152: Publication years of works by literary current in Bib-ACMé.
Figure 153: Subgenre labels related to the mode of representation in Bib-ACMé and Conha19.
Figure 154: Sources of labels related to the mode of representation in Bib-ACMé.
Figure 155: Subgenre labels related to the mode of reality in Bib-ACMé and Conha19.
Figure 156: Sources of subgenre labels related to the mode of reality in Bib-ACMé.
Figure 157: Subgenres related to the linguistic, geographical, and socio-cultural identity.
Figure 158: Sources of identity subgenre labels in Bib-ACMé.
Figure 159: Constellations of identity groups in Conha19.
Figure 160: Subgenre labels related to medial aspects in Bib-ACMé and Conha19.
Figure 161: Sources of the subgenre labels related to medial aspects in Bib-ACMé.
Figure 162: Subgenre labels related to the attitude in Bib-ACMé and Conha19.
Figure 163: Sources of subgenre labels related to the attitude in Bib-ACMé.
Figure 164: Subgenre labels related to the intention in Bib-ACMé and Conha19.
Figure 165: Sources of subgenre labels related to the intention in Bib-ACMé.
Figure 166: Number of works per subgenre label.
Figure 167: Primary thematic subgenres in Bib-ACMé and Conha19.
Figure 168: Primary thematic subgenre labels in Bib-ACMé and Conha19 by country.
Figure 169: Primary thematic subgenre labels in Bib-ACMé per decade.
Figure 170: Primary thematic subgenre labels in Conha19 per decade.
Figure 171: Primary thematic subgenres in Bib-ACMé before and in or after 1880.
Figure 172: Primary thematic subgenres in Conha19 before and in or after 1880.
Figure 173: Primary thematic subgenre labels in Conha19 by prestige.
Figure 174: Primary thematic subgenre in Conha19 by narrative perspective.
Figure 175: Primary thematic subgenres in Conha19 by continent of the setting.
Figure 176: Primary thematic subgenres in Conha19 by time period of the setting.
Figure 177: Work lengths in tokens by primary thematic subgenre in Conha19.
Figure 178: Primary subgenres related to literary currents in Bib-ACMé and Conha19.
Figure 179: Primary subgenre labels related to literary currents in Bib-ACMé and Conha19 by country.
Figure 180: Primary subgenre labels related to literary currents in Bib-ACMé by decade.
Figure 181: Primary subgenre labels related to literary currents in Conha19 by decade.
Figure 182: Primary subgenres related to literary currents in Bib-ACMé before and in or after 1880.
Figure 183: Primary subgenres related to literary currents in Conha19 before and in or after 1880.
Figure 184: Primary subgenre labels related to literary currents in Conha19 by prestige.
Figure 185: Primary subgenre labels related to literary currents in Conha19 by narrative perspective.
Figure 186: Primary subgenre labels related to literary currents in Conha19 by continent of the setting.
Figure 187: Primary subgenres related to literary currents in Conha19 by time period of the setting.
Figure 188: Work length in tokens by primary subgenres related to literary currents in Conha19.