Review Comment:
I thank the authors for the systematic approach chosen to answer the reviewers' comments. I especially appreciate the authors' efforts in running Zhu et al.'s algorithm. My previous comments as to the originality and significance of the results remain. The quality of writing has been improved but I would still suggest that the authors read through their paper carefully. Please find my comments below:
Answers to answers
- R1C2: The formulation is still incorrect and should read "The evaluation of GenLink has shown that the algorithm delivers good
results with F-measures above 95\% on different dense datasets such as sider-drugbank, LinkedMDB, restaurants [19].
- R1C5: I'm afraid the meaning of M is still not clearly described. The paper reads "the subset M consisting of all pairs" and does not state what M is a subset of. Do you simply mean set? Please fix.
- R1C6: The text now reads "a relation ~R". Do you mean owl:sameAs? If yes, please say so. If not, please define ~R formally.
- R1C14: "removing (or loosening) the penalty has the potential to result with overfitted model and thus would not improve results" Do you mean "removing (or loosening) the penalty has the potential to result in an overfitted model. Thus, it might not improve the results of our approach"? Please check.
- R1C15d: While I agree with the potential for improvement as to the runtimes and would like said fact to be added to the discussion, I'm afraid I disagree with the authors w.r.t. reporting runtime results. It is only fair to state runtimes (and the hardware on which the runtimes were achieved) so that other researchers know what to expect when aiming to use or extend your approach. Hence, I must repeat my request for a table of the runtimes one the different datasets being added to the paper. It'd be especially helpful if the runtimes of the other algorithms were added as well.
- R1C21: I would argue that seeing the manual rule would actually help the reader get an idea of the complexity of the problem at hand. Hence, I'd still suggest that it is added.
- R1C28: The authors seem to mix up the LIMES framework and the LIMES algorithm. The paper describing the framework is (Ngonga Ngomo, 2012) and I guess the authors mean the framework when they talk about LIMES.
Algorithms
- Equation 6. Do you mean $|\cup_{i=1}^|G| tp_i|$? You redefine i on the top of your sum, making your equation incorrect. Please check.
- Equation 7. See Eq. 6.
- How do you know when to switch to a larger value of $c$. Any insights?
- Again, you use * and \times to mean multiplication (see Eq. 5 and the fitness_group equation). This was already pointed out in my last review and correction was claimed. I would be thankful if the authors could check their manuscript anew for such inconsistencies.
Experiments
- Table 4: Please add the variable name to the labels (e.g., \alpha, \beta)
- The authors train on 66% of the data and evaluate on 33%. Why do they not use the standard protocol of n-fold validations? In all other cases, it could be that their results are just an artifact of the slices chosen (which is unlikely given the significance of their results but should still be checked). Please clarify or switch to an n-fold cross-validation. Any reason for the 3-fold instead of 10-fold validation commonly used?
- Statistical test. The authors do not describe what they test exactly. Do you compare the average F-measures achieve over 10 runs or do you compare single runs?
- " Correspondingly, the GenLinkSA algorithm gives comparable results in F-measure compared to FEBRL, mostly due to the jump in precision". It seems to me that your approach is commonly better than FEBRL. Please check.
Typos
"Additionally, find" => Additionally, we find [Do you mean we compute. You don't really need to find U once you have M]
Please mind the punctuation after your equations.
Footnotes should be placed after punctuation marks.
|