Review Comment:
This paper provides a detailed, well-structured and original overview of different perspectives for the exploration and analysis of biography collections. The results are significant and deserve to be published, as they align well with other recent surveys of biography collections (especially the ODNB, see Warren 2018). The quality of writing is high; the paper is well-written and the language is clear. The introduction and the description of the transformation process from the NBF to a Linked Data service in sections one and two are concise and introduce the reader to the topic of the paper. In the main section of the paper (3. Analyzing and Visualizing the NBF), the reader is guided through seven perspectives for searching and exploring the data from the National Biography of Finland. The space devoted to the description of the different analytical perspectives is divided somewhat unevenly and the number of tables and figures is rather overwhelming. For the sake of the reader, I would suggest to reduce the number of figures and tables in sections 3.1 and 3.4. The discussion (section 4) touches upon the issue of data literacy, which is very important.
The authors pursue two goals: (1) “to argue and show that using biographies as Linked Data opens up unprecedented new possibilities for the study by distant reading” (p. 2) and (2) to “present novel insights into the nature and contents of NBF” (p. 2). In the title of the paper and on p. 3, the authors point out, that biographies can also be studied from a “historiographical perspective as an artifact reflecting its own time, the editorial values and biases in selecting the biographees, the authors’ perspectives, and also from a linguistic point of view.”
The paper convincingly argues that semantic technologies have the potential of opening up biography collections in novel ways. The authors show at length how exploratory statistical and network analysis can be conducted based on data from the National Biography of Finland. The exploratory analysis is very informative; in particular, I like the use of the PageRank measure very much. At the same time, the precise nature and value-added of the historiographical perspective remains rather implicit throughout most of the text (although it is there). It is only in sections 3.6 and 3.7 that more profound text-analytical insights into the biography collections as the result of man-made selection processes are given. From a historian’s point of view, it would have been great to learn more about the differences between the vocabularies for male and female entries in the NBF (p. 25), about the different styles of authors delivering biographies to the NBF, or about the impact of (tacit?) editorial decisions. Finally, for a paper that engages with the historiographical perspective on biography collections, section 3.7. (Author analysis) is rather short. It would have been great to gain a better insight into the different writing and editorial strategies during the production of the biographies. The authors of the paper do hint at these differences on p. 25, when referring to later additions to the NBF (“Multifaceted Finland”), but the topic of ‘history writing’ is not pursued much further. The impression remains that the authors’ engagement with the ‘historiographical perspective’ could obtain a more prominent position throughout the text, e.g. by pointing out how the analysis of NBF data contributes to understanding the process of biography writing. Perhaps, a more direct comparison with the findings discussed in Warren (2018), which has clearly inspired the present paper, might be useful.
Besides these suggestions, I have some (very) minor remarks about the text, figures and tables, which I list below:
p. 2, left, r. 44: …1997 … when Finland celebrated her 90 years … => 80 years?
p. 2, right, r. 19: 13100 biographies by 980 scholars. Fig. 2 on p. 7 visualizes a subset of 6500 biographies written by 1000 authors. => is the latter figure about the number of authors correct?
p. 3, left, footnote 12: is redundant (see footnote 9)
p. 3, right, r. 32-40: It seems to me that these lines fit better at the end of section 1.2.
p. 4, right, r. 15: 6478 entries => I arrive at 6476 entries
p. 5, left, r. 37: CVS => CSV
p. 5, right, r. 16: on p.4 it is said that only data for men and women from the core NBF are used. These are 6197 entries. Here, a total of 6510 biographies is given. It would be good to explain where the different numbers come from.
p. 6, right, r. 12-17: It would be helpful to point out where this discussion could be found later in the text.
p. 7, left, r. 24: [? ] => a reference is missing
p. 8, left, r. 30: … from the start of the 20th century => from 1950?
p. 10, left, r. 37: I could not find the term vapaaherratar on fig. 8.
p. 11, right, r. 41-43 = p. 12, left, r. 47-48.
p. 13, left, r. 39-42: To me, it is a bit puzzling that the selection of biographies does not reflect the importance of agriculture until the 1960s, because ‘farmer’ or ‘farmer’s wife’ are listed consistently among the top vocations of parents in table 2. It would be helpful to explain this briefly in the text.
p. 28, right: I think [21] and [22] refer to the same title.
|