Review Comment:
This paper deals with automated building regulation compliance checking using BIM models and semantic web technologies. This is a well-known subject of research in the area of architecture, engineering and construction (AEC). A formal language is proposed, namely Structured Natural Language (SNL), for expressing building regulations in a human-friendly AND formal manner. Each SNL expression can be transformed into SPARQL queries. By executing these SPARQL queries against BIM models, building regulations can be automatically checked. This approach is evaluated using the BIM model of the Z15 Tower in Beijing and 5 rule libraries.
(1) originality
The concept of automating compliance checking of building regulations is not new, neither is the usage of semantic web technologies for that purpose. The extent to which it is implemented, is broader than usual. In fact, seeing all parts of the usual process (model preparation, rule extraction, rule checking, reporting) implemented, including a real-world use case, is really great. Unfortunately, the application of semantic web technologies is fairly limited in the proposed approach (in contrast to previously made proposals in the literature). As a result, this proposal does not go beyond the state of the art for a semantic web domain (which is the core area of this journal). Namely, apart from the usage of Jena and the usage of SPARQL, not many components of the semantic web stack are being used. It comes as a big surprise that a paper about semantic web based does not make any use of the available rule languages or reasoning engines available in this domain. Even more, if there is an explicit formal ontology defined, it is not really used in the process. Most of the proposed approach consists of straight-forward programming, combined with executing SPARQL queries. Probably, this approach could just as well be done with a regular schemaless graph database, which are considered to have user-friendly query languages. So, that probably even defeats the need for an SNL language. Why use semantic web technologies in the suggested approach: what did you gain?
(2) significance of the results
As indicated, the completeness of the approach, covering all parts of the compliance checking of building regulations with BIM models, is very significant and much appreciated. The test case with a real building is also great. Evaluating and improving the performance of the approach is so much appreciated as well!! For the semantic web research domain, this test has probably little significance (see above). The last remaining aspect is the search for a language that can formalise building regulations, yet remains highly user-friendly. I personally do not believe in the possibility of creating such a language (formal and user-friendly are like fire and water), nor in the real need for such a language. End users in the industry, especially AEC, need GUIs, not a query language. User-friendliness depends on the interface, which is not formal anyway. So, a mapping needs to be made from interface to computer language. Actually, this is probably where the SNL language resides: a non-formal language that can be used in the interface to accommodate end users. Existing semantic rule languages and query languages (SPARQL) remain to be the formal languages needed to achieve an automated rule-checking process. This is where results are sought in a semantic web community. To be significant enough for this domain, a far better connection needs to be made with the diverse ontology, query, and rule languages in this semantic web domain.
(3) quality of writing
The structure of the paper is great. The style of writing is good. Spelling is okay.
The references are poorly formatted. Please use a consistent format, as requested by the journal, to structure references.
Detailed comments:
-----------------
1. Introduction
- "In these scenarios, generic and transboundary model representation are more welcome than the building specific representation like the standardization through Industry Foundation Classes(IFC) [6]." => Maybe I misinterpret, but this makes little sense. In order to "cross boundaries" of models, there is a strong need for defining "models with defined boundaries" in the first place, ideally standardised between a large enough collection of end users. The IFC data model is one of those bounded domain models. If it does not exist, you cannot cross its boundaries. The semantic web domain has several other such domain models defined as OWL ontologies that can be combined when going 'transboundary'. Standardisation and going beyond model boundaries go hand in hand.
- "Semantic web technologies have great advantages on ..., and 'logical basics'" => What are those 'logical basics' advantages? Don't you mean 'logical basis'? I wouldn't call semantic web technologies to be basic in terms of logic.
- "The other challenge is how to effectively input checking rules which can be understood by both human beings and machines." => Well, common query languages and rule languages can be understood by human beings and machines. So why develop yet another language? The problem is likely that domain experts are the ones that do not understand the language. But, well, they will likely never randomly understand a formal or structured language unless they learn it. So, proposing yet another structured language makes little sense to me. They would have to learn this as well. As the SNL is not that formal, it might be a solution to suggest is as a non-formal end user language to encode rules?
- Throughout the paper, and also in the introduction, you indicate that rule checking is performed by formulating SPARQL queries against . It seems to me that you should be referring to RDF graphs here, not OWL models. OWL is typically reserved to point to OWL ontologies, not to the actual individuals or instances. Throughout the paper, there is no real usage of an OWL ontology, as far as I can see, so I suggest to keep the term RDF graph throughout the paper.
2. Proposed method
- You suggest to build rule libraries in the SNL language. That seems quite 'off'. Would it not be a better choice to build libraries of the more formal SPARQL queries? The SNL representation is just an interface to the end user; it can probably be generated from the stored SPARQL queries? In addition, nobody is capable of parsing SNL rules; SPARQL queries can easily be exchanged and run in different systems.
- "It is then transformed into an OWL model" => you surely mean "RDF graph" here. It is only containing instances / individuals.
- "The feature of our research framework is to handle real-world scale [should be 'size'] BIM models. Therefore, we take SPARQL queries as the basis of [the] checking engine, rather than reasoning engines. Although taking [an] exist[ing] reasoning engine is an off-the-shelf solution, it can only process BIM models in very limited scale [size], by our experience." => This is nonsense. Then you must have a bad experience. All solutions in the semantic web deploy reasoning engines, on vast amounts of data, even if it were only to perform OWL reasoning (domain-range-subClassOf-...). Jena, which you use, includes reasoning engines. It can handle large-size models just fine. As with the querying, however, it is recommended to have a decent strategy (prepare the reasoning as you prepare the querying, allow only a certain level of expressiveness, indexing, ...) to make it perform even better. Perhaps you can take a look at the recent article in http://www.sciencedirect.com/science/article/pii/S1474034617301945? There are reasons to choose for a query language and not a rule language, but "able to handle large size models" is not one of them.
3. The SNL Language
- You distinguish 'declarative' and 'conditional' sentences in the SNL language. The declarative sentence is basically defined as a rule with a universal qualifier ('every' / forAll). What about sentences with existential qualifiers ('there is' / forSome)? Or, well, a bunch of other formal structures available in a formal rule language... Furthermore, the conditional sentences that you mention are typically considered to be part of a declarative language. So, this terminology and classification is odd.
- "The SNL supports two kinds of data types: string and digital". I don't know what you mean with 'digital', but I guess that you refer to numbers (integers, floats, doubles)? What about booleans, dates, and other data types commonly used everywhere?
- NOT is also a logical operator. It might make sense to place this in the list with AND and OR.
- The SNL rules are not that formal. As a result, they can be read by a human, but not so much by a machine. As an example "Every Bedroom Has Window". How will a machine automatically know how to translate this into a SPARQL rule? This will always require a human being programmer to manually rewrite this into a SPARQL query. This programmer will likely prefer to just have the original building regulation text to start from. As an example, the word after 'Every' could be 'Space', in which case there does not need to happen any 'Bedroom' regex filter on the name of the 'Space'. A machine does not automatically know that a 'Bedroom' is a kind of 'space'. It would need an ontology or schema for that. So, this is very far away from formal. As another example, what does '10' mean? Is that centimeters, meters, inches, ... Any semantic web developer would use a combination of ontology, instances, rules and queries in such a case.
- "Furthermore, the SNL language still keeps the accurate and strict semantics, so computers can process them too." => This is obviously not the case. Yes, the computer can parse the text and recognize some symbols ('Every') if the programmer implements the required programming code. But that is far away from a set of rules that can be input by any designer and be automatically converted into SPARQL queries.
- The rule on the top of page 3 is somewhat odd, in the sense that it concludes (THEN-part) that 'a Space has an Outlet', whereas it actually means to say that 'a Space have an outlet'. This is not common. It makes somewhat sense when the engine looks for the negatives (as explained in one of the later Sections), but it seems odd in this part of the paper.
- "Furthermore, since the SMC template language relies on choosing from the BIM models for the concept and relation description, it [cannot] be used independently as a formal description of building codes". => I agree with this sentence, but this sentence also marks a flaw / weakness in the SNL (see previous point). If the user is not supported in picking from a list of available terms, the SNL rule can contain anything, making it non-formal, and very expensive to reformat into checkable SPARQL queries (see above).
- I am missing a reference here to the work around NLP and building regulations, as documented in http://www.sciencedirect.com/science/article/pii/S0926580516301819.
4. Semantic extraction of BIM models
- It is really great to see you consider the extraction of subsets of a BIM model before starting the code compliance checking!! It is a pity to see that it is all implemented in code without considering any semantic web technologies. Do you think that would be a viable alternative, for example to maintain the link with a standard ontology like ifcOWL? Cfr. "Finally, the extracted sub-model M is transformed into the [RDF graph] [using the] Jena API[no -S]."
- "Considering the checking efficiency, we reorganize the structure of the [RDF graph], rather than keeping it another representation of IFC files like the work in the literature" => I can only strongly support, encourage and appreciate moving away from the standardised language for gaining performance in specific tasks, like rule checking. This is also the entire idea behind the SimpleBIM procedure as documented in https://biblio.ugent.be/publication/8041826 (please read this). In that case, however, a link is maintained to the actual ifcOWL standard. Would it not be better to take the same route and publish your model both in compliance with an ifcOWL standard and in a simplified version of that? In any case, it is recommended to maintain clear reference to 'a' (standard) ontology, so that people and machines outside your network can also interpret your data.
- "a layer of search" -> please reformulate this into something more meaningful. Perhaps: "It is clear that the lower graph contains less statements and can thus be processed far more efficiently than the top one." Please also see the SimpleBIM article above in this context; it is the exact same approach, while maintaining reference to the standard.
5. Checking engine and optimization strategies
- "The SPARQL queries are generated by a[n] SNL syntax-directed style and the transformation is a structural procedure" => This part of the process (SNL to SPARQL) is hardly presented, yet it has a tremendous impact on the scalability of this approach. Anyone implementing the SNL language will need this conversion from the user-oriented SNL to a formal language (e.g. SPARL) to be as efficient, inexpensive and scalable as possible. Can you please indicate in more detail how this works? And why one would not directly implement rules in SPARQL?
- Most, if not all, automated code compliance initiatives using semantic web technologies that rely on SPARQL, take strong advantage of the CONSTRUCT feature to encode the IF-THEN rules. The IF-THEN structure clearly maps well with those CONSTRUCT queries. This contrasts a lot with the SPARQL queries presented in this paper, which consist only of SELECT statements and FILTERs. This looks like an odd choice in the context of rule-checking. This choice also has not really been evaluated or considered? Can you please explain in the paper?
- Similarly, have you considered the use of SPIN?
- "We made the mappings automatically by adopting related domain knowledge." -> This seems to indicate that this step requires both a domain specialist and a programmer. How else can one 'adopt domain knowledge'? Is that true? Please specify in the paper.
- "Through the seamless and automatic transformation" -> I am far from convinced from this statement. Yes, it is seamless and automatic as soon as a programmer has implemented the conversion, but he will likely need to do this again and again (so, not that seamless and automatic after all). Please provide more evidence here.
- The analysis of the query performance is great, and I would like to encourage you to refer to the conclusions made in http://www.sciencedirect.com/science/article/pii/S1474034617301945. They are in line with what you have.
6. Applications
- The tool at http://sts.thss.tsinghua.edu.cn:8079/bimchecker/ was not available when I tested it. I hope that this web page contains datasets and not just a running program. It would be valuable to see the actual data (e.g. the IFC2X3 ontology used).
Textual remarks:
-----------------
1. Introduction:
- "Solibri Model Checker(SMC)" should be "Solibri Model Checker (SMC)" => please add a space before every '(' in the paper. There are several such typo's throughout the entire paper.
- "generic and transboundary model representation are" -> "generic and transboundary model representation[s] are"
- replace "scale" with "size"
- "They have various relationships and [are] intertwined"
- "The exist[ing] regulations are represented"
- "the variety of regulations request[s] a flexible"
- "template-based rule representations [seldom] meet these requests"
3. The SNL Language
- "However, it is not [easily understood] by human beings"
- "These methods [are less flexible and less extensible,] since the rules to be"
- "The SNL language[no -S] is organised by sentences"
- "to express [that] the rules should be obeyed in any case" / or better: "to express [which] rules should be obeyed in any case"
- "to denote [that] the requests should be obeyed" / or better "to denote [which] requests should be obeyed"
- "digital" -> "numerical"?
- "should be bigger or equal[no -S] to 10 m²"
- "It is formalized by requesting the value range of the data property 'area' of the entity LivingRoom." => this is unclear, please reformulate.
- "[easily understood]"
- "This SNL sentence[no -S] denotes a request"
4. Semantic extraction of BIM models
- "[Real-world] BIM models are often large in [size]" (first sentence)
- "It can autoamtically extract the [] entities, attributes and relations"
- "scale" -> "size"
- "Three BIM modes" -> "Three BIM models"
- "2th" -> "2nd"
- "only a small part of the model [is] related."
- "extraction of Z15-F011-MEP get[s]"
- "the the" -> "the"
5. Checking engine and optimization strategies
- "We first transform a[n] SNL sentence"
- "to facilitate user description, and implementation-independent" -> "to facilitate the description of rules by domain experts, and to facilitate the decription of implementation-independent rules".
- "related problematic component[no -S] set C."
- "by a[n] SNL syntax-directed style"
- "the spaces which are bedrooms, [but do not have a window (boundaryElement)]."
- "there are two key mappings, the concept [mapping] and the relation [mapping]"
- "The generated SPARQL queries [] work well in terms of"
- "cost much searching time [in our experience]"
- "scale" -> "size"
- "the order [] of the triples affect[s]"
- "and [place] the entities with more branches [first in] the query"
- "can be represented [in] different forms, [...], [but with] different time cost[s]."
- "For example, [T]he form [] 'FILTER [...] ?X' [can be] optimized as"
- "It can first avoid this part to be executed very late since the FILTER parts are executed [] last in Jena" => please reformulate
6. Applications
- "[in] Java 8"
- "[in] C#"
- Please remove section header 6.1. If there is no section 6.2, then it makes no sense to make a section 6.1.
- "We illustrate[no -S] some of the checking results []."
- "36 items which cover[no -S]"
- "We found [that] 5 itmes [] failed"
- please add a space before every '(' in the paper.
- "their insulation thickness[es]"
- "the tool found [that] 101 pipe fittings"
- "to [] a pipe identified as"
7. Conclusion
- "studied [in] recent years"
- "are still left" -> reformulate
- "to automatic[] rule checking of BIM models"
- "are eas[il]y read and understood by human beings"
- "can be given [effectively and fast]"
- "applied it []to"
- "and appl[y] [the suggested approach] to other practical [] buildings"
|