Semantic web based rule checking of real-world scale BIM models: a pragmatic method

Tracking #: 1621-2833

Authors: 
Hehua Zhang
Wenqi Zhao
Jianqiao Gu
Han Liu
Ming Gu

Responsible editor: 
Guest Editors ST Built Environment 2017

Submission type: 
Full Paper
Abstract: 
Rule checking is important to assure the integrity, correctness and usability of Building Information Models (BIMs) in Architecture, Engineering and Construction (AEC) projects. Semantic web based rule checking of BIM models are widely accepted and studied recent years. This technology has noteworthy advantages on interoperability, extensibility and logical basics. However, there are still some gaps to make it practical. One challenge is the efficiency problem on processing large-scale BIM models. The other is how to effectively input checking rules which can be understood by both human beings and machines. In this paper, we propose a pragmatic method to check real-world scale BIM models. In our framework, BIM models are transformed into a well-defined OWL model. Rules are formalized by a structured natural language (SNL) designed intentionally to describe building regulations. The checking engine is based on SPARQL queries on OWL models. We propose a rule-based model extraction method and optimization strategies on SPARQL statements, which can effectively improve the time efficiency and deal with large-scale applications. A prototype has been implemented and applied to BIM models of a real-world building project. We found out non-trivial problems in a totally automatic way, which helped to improve the quality of BIM models and verified the usability of our method.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Chi Zhang submitted on 16/May/2017
Suggestion:
Major Revision
Review Comment:

This paper addresses two important issues related to semantic rule checking in the AEC industry. The first one is effective input of rules, and the second one is efficiently processing real-world scale building models. To address the first issue, this research has developed a structured nature language (SNL), which can be converted to SPARQL to process building models. The second issue is addressed in two steps. The first step is to extract model subsets according to concepts defined in the rule library and then transform them to simplified OWL models. The second step is to optimise SPARQL queries according to some domain specific features.

First of all, I want to congratulate to authors for this development work which seems to have promising results on real-world building models. The general structure of the paper is fine. However, there are a number of major issues should be addressed before I would recommend this article for a publication. One major issue is that the literature study is not extensive and neglects a rich body of knowledge that has been accumulated within building and construction research community as well as in the Semantic Web field. Another major issue is that the article covers several issues without showing sufficient details and contributions in each part. Development work seems case-specific and very implementation focused and lack of clear description of more generalized methods. More detailed comments are as follows:

1. One of the announced contributions of this article is the Structured Natural Language (SNL) that has been described in section 3. However, converting controlled nature language to computable rules is not a novel idea, hence it is important to see how expressive and how intuitive the language is. The introduction in section 3 needs to be elaborated and many questions need to be answered. How expressive this language is? Which forms and expressions in SPARQL it supports? How much percentage of rules can be represented by it? The current presented language seems simple and many useful constructs in rules e.g. quantifiers, cardinality, aggregation operators do not exist. There are many existing research and development to convert (controlled) natural languages to computable rules and queries in the building industry as well as in the Semantic Web field. They are all ignored by this article. For example, RASE from Hjelseth and Jiansong Zhang’s research in the BIM field. There are also existing controlled natural languages like SBVR and SQUALL that either have been standardised or already have conversion patterns to SPARQL. Why not reuse them? Authors should prove the novelty of their approach.

2. In section 4, an approach to extract IFC subsets according to rule library is introduced. However, some steps of the extracting procedure are not clear and need additional explanations.

2.1. It seems related entities and attributes are derived from the rule library. How are all the terminologies and concepts used in the rule library structured and how are they mapped to IFC constructs?

2.2. In the second paragraph, “For the rules related to geometric computation, we need to extract the information such as…”, how do you judge whether a rule is related to geometric computation?

2.3. “For any r belonging to the RI, we extract the attribute set that belongs to the set A and referencing r (for example IFCRELDEFINESBYPROPERTIES) to get AI”. This needs additional explanations. There are many different IFC constructs associated with IfcRelationships. Propertysets, types, materials, classifications all have different structures, and they cannot all be simply derived by just extracting attributes from IfcRelationships.

2.4. Are the extracted IFC sub models still valid IFC models?

3. In section 4, the extracted IFC sub models are then transformed to simplified OWL models. Again some existing research that systematically simplifies ifcOWL graphs is ignored by this research such as IfcWoD ontology and SimpleBIM ontology. There are also questions need to be answered. How many relationships have been shorten? How different the OWL graphs are before and after the simplification process? How modular and flexible this simplification process is, can the same program be reused for other rule cases?

4. In section 5, three examples are presented to show how rules represented in the SNL language are transformed to SPARQL queries. It seems the transformation algorithm is very case specific. IFC can represent information in various ways, for example, the concept “Bedroom” can be represented by many different attributes and relationships. Authors should explain how to consider such problem. There are also some minor issues or mistakes in this part.

In the first example, the “Has” relationship between a space and a window is automatically recognized and converted to “hasBoundaryElement” as explained in this section, but in the third example, the “Has” relationship between a space and a window is converted to “isContaining”.

Some of the relation terms are confusing e.g. the “hasSubType” relationship between a building and a building storey.

5. In section 5, optimisation strategies have been introduced and significant performance improvements have been showed. However, the introduction of the query optimisation methods is too strategic and it is hard for other researchers to reproduce the results. Many questions need to be answered:

5.1 What is the query environment for RDF data (e.g. some triple store)? Are OWL reasoners used or just treat data as plain RDF?

5.2. “In our method, we cluster the triples by the querying entities, and make the entities with more branches in front of the query.” This needs to be much more elaborated to describe clear algorithms or to give examples.

5.3 In Table 2, model size and all the used query examples should be provided (in appendix for example).

5.4 At least one concrete query example should be presented to show how to optimise the query. It is recommended to use one of the query examples in Table 2.

6. Format and some minor issues:

6.1 In all the SPARQL query listings, it is recommended that classes start with upper case and properties start with lower case.

6.2 Size of models should be clearly provided. The size of all the RDF data should be provided in triples. For example, in section 5, “The added size of the 8 models are 713,372KB”, is it in Revit format, or in IFC, or in RDF? Please provide IFC model size and transformed RDF model size. Similar thing like The size of the original BIM model(Z15-F011MEP) is 93,596KB” in section 6.1.

6.3 There are some English issues throughout the paper, often with singular and plural forms. Another check would help.

Review #2
Anonymous submitted on 22/May/2017
Suggestion:
Major Revision
Review Comment:

In this paper the authors propose a framework for regulatory compliance checking using a structured natural language format. Overall I found the paper very interesting to read - but I have some concerns that should be addressed prior to publication:

-To me the introduction lacked a clear research goal - what does your approach seek to improve upon from other approaches? To me this needs to be in the context of the work already done - not just stating your approach + its performance.

- I was unclear about how the drafting of the SNL is actually performed. The reviewers point to the increased flexibility of this approach - but I am unsure if this is for those drafting the regulations? Who doe the authors envisage drafting the regulations? Regulation authors? Has any consultation been performed to see if this is acceptable to them?
- Do the authors have any reference to back up their assertion that a BIM is too big to be used in totality - modern tuple stores for IFCOwl etc... are efficient - so would like to see results to back up this claim.
-One point that I do not think is explained well enough in the paper is how the SNL is mapped to the SparlQL i.e. how doo you know Bedroom = IfcSpace (with name=Bedroom)? This must be an automated process to make checking feasible at scale so how is it done?
-How is the optimisation of SPARQL queries done? Automatically - semi automatically? The paper lacked detail on this element
- I also found myself wanting to see more experimental data. I.e. how much does your optimization reduce queries by? See the speed before optimisation and the speed after optimisation to me is essentiall to determine if this approach has any merit.

These issues should be clarified in the paper prior to acceptance.

Review #3
By Pieter Pauwels submitted on 11/Jun/2017
Suggestion:
Major Revision
Review Comment:

This paper deals with automated building regulation compliance checking using BIM models and semantic web technologies. This is a well-known subject of research in the area of architecture, engineering and construction (AEC). A formal language is proposed, namely Structured Natural Language (SNL), for expressing building regulations in a human-friendly AND formal manner. Each SNL expression can be transformed into SPARQL queries. By executing these SPARQL queries against BIM models, building regulations can be automatically checked. This approach is evaluated using the BIM model of the Z15 Tower in Beijing and 5 rule libraries.

(1) originality
The concept of automating compliance checking of building regulations is not new, neither is the usage of semantic web technologies for that purpose. The extent to which it is implemented, is broader than usual. In fact, seeing all parts of the usual process (model preparation, rule extraction, rule checking, reporting) implemented, including a real-world use case, is really great. Unfortunately, the application of semantic web technologies is fairly limited in the proposed approach (in contrast to previously made proposals in the literature). As a result, this proposal does not go beyond the state of the art for a semantic web domain (which is the core area of this journal). Namely, apart from the usage of Jena and the usage of SPARQL, not many components of the semantic web stack are being used. It comes as a big surprise that a paper about semantic web based does not make any use of the available rule languages or reasoning engines available in this domain. Even more, if there is an explicit formal ontology defined, it is not really used in the process. Most of the proposed approach consists of straight-forward programming, combined with executing SPARQL queries. Probably, this approach could just as well be done with a regular schemaless graph database, which are considered to have user-friendly query languages. So, that probably even defeats the need for an SNL language. Why use semantic web technologies in the suggested approach: what did you gain?

(2) significance of the results
As indicated, the completeness of the approach, covering all parts of the compliance checking of building regulations with BIM models, is very significant and much appreciated. The test case with a real building is also great. Evaluating and improving the performance of the approach is so much appreciated as well!! For the semantic web research domain, this test has probably little significance (see above). The last remaining aspect is the search for a language that can formalise building regulations, yet remains highly user-friendly. I personally do not believe in the possibility of creating such a language (formal and user-friendly are like fire and water), nor in the real need for such a language. End users in the industry, especially AEC, need GUIs, not a query language. User-friendliness depends on the interface, which is not formal anyway. So, a mapping needs to be made from interface to computer language. Actually, this is probably where the SNL language resides: a non-formal language that can be used in the interface to accommodate end users. Existing semantic rule languages and query languages (SPARQL) remain to be the formal languages needed to achieve an automated rule-checking process. This is where results are sought in a semantic web community. To be significant enough for this domain, a far better connection needs to be made with the diverse ontology, query, and rule languages in this semantic web domain.

(3) quality of writing
The structure of the paper is great. The style of writing is good. Spelling is okay.
The references are poorly formatted. Please use a consistent format, as requested by the journal, to structure references.

Detailed comments:
-----------------
1. Introduction
- "In these scenarios, generic and transboundary model representation are more welcome than the building specific representation like the standardization through Industry Foundation Classes(IFC) [6]." => Maybe I misinterpret, but this makes little sense. In order to "cross boundaries" of models, there is a strong need for defining "models with defined boundaries" in the first place, ideally standardised between a large enough collection of end users. The IFC data model is one of those bounded domain models. If it does not exist, you cannot cross its boundaries. The semantic web domain has several other such domain models defined as OWL ontologies that can be combined when going 'transboundary'. Standardisation and going beyond model boundaries go hand in hand.
- "Semantic web technologies have great advantages on ..., and 'logical basics'" => What are those 'logical basics' advantages? Don't you mean 'logical basis'? I wouldn't call semantic web technologies to be basic in terms of logic.
- "The other challenge is how to effectively input checking rules which can be understood by both human beings and machines." => Well, common query languages and rule languages can be understood by human beings and machines. So why develop yet another language? The problem is likely that domain experts are the ones that do not understand the language. But, well, they will likely never randomly understand a formal or structured language unless they learn it. So, proposing yet another structured language makes little sense to me. They would have to learn this as well. As the SNL is not that formal, it might be a solution to suggest is as a non-formal end user language to encode rules?
- Throughout the paper, and also in the introduction, you indicate that rule checking is performed by formulating SPARQL queries against . It seems to me that you should be referring to RDF graphs here, not OWL models. OWL is typically reserved to point to OWL ontologies, not to the actual individuals or instances. Throughout the paper, there is no real usage of an OWL ontology, as far as I can see, so I suggest to keep the term RDF graph throughout the paper.

2. Proposed method
- You suggest to build rule libraries in the SNL language. That seems quite 'off'. Would it not be a better choice to build libraries of the more formal SPARQL queries? The SNL representation is just an interface to the end user; it can probably be generated from the stored SPARQL queries? In addition, nobody is capable of parsing SNL rules; SPARQL queries can easily be exchanged and run in different systems.
- "It is then transformed into an OWL model" => you surely mean "RDF graph" here. It is only containing instances / individuals.
- "The feature of our research framework is to handle real-world scale [should be 'size'] BIM models. Therefore, we take SPARQL queries as the basis of [the] checking engine, rather than reasoning engines. Although taking [an] exist[ing] reasoning engine is an off-the-shelf solution, it can only process BIM models in very limited scale [size], by our experience." => This is nonsense. Then you must have a bad experience. All solutions in the semantic web deploy reasoning engines, on vast amounts of data, even if it were only to perform OWL reasoning (domain-range-subClassOf-...). Jena, which you use, includes reasoning engines. It can handle large-size models just fine. As with the querying, however, it is recommended to have a decent strategy (prepare the reasoning as you prepare the querying, allow only a certain level of expressiveness, indexing, ...) to make it perform even better. Perhaps you can take a look at the recent article in http://www.sciencedirect.com/science/article/pii/S1474034617301945? There are reasons to choose for a query language and not a rule language, but "able to handle large size models" is not one of them.

3. The SNL Language
- You distinguish 'declarative' and 'conditional' sentences in the SNL language. The declarative sentence is basically defined as a rule with a universal qualifier ('every' / forAll). What about sentences with existential qualifiers ('there is' / forSome)? Or, well, a bunch of other formal structures available in a formal rule language... Furthermore, the conditional sentences that you mention are typically considered to be part of a declarative language. So, this terminology and classification is odd.
- "The SNL supports two kinds of data types: string and digital". I don't know what you mean with 'digital', but I guess that you refer to numbers (integers, floats, doubles)? What about booleans, dates, and other data types commonly used everywhere?
- NOT is also a logical operator. It might make sense to place this in the list with AND and OR.
- The SNL rules are not that formal. As a result, they can be read by a human, but not so much by a machine. As an example "Every Bedroom Has Window". How will a machine automatically know how to translate this into a SPARQL rule? This will always require a human being programmer to manually rewrite this into a SPARQL query. This programmer will likely prefer to just have the original building regulation text to start from. As an example, the word after 'Every' could be 'Space', in which case there does not need to happen any 'Bedroom' regex filter on the name of the 'Space'. A machine does not automatically know that a 'Bedroom' is a kind of 'space'. It would need an ontology or schema for that. So, this is very far away from formal. As another example, what does '10' mean? Is that centimeters, meters, inches, ... Any semantic web developer would use a combination of ontology, instances, rules and queries in such a case.
- "Furthermore, the SNL language still keeps the accurate and strict semantics, so computers can process them too." => This is obviously not the case. Yes, the computer can parse the text and recognize some symbols ('Every') if the programmer implements the required programming code. But that is far away from a set of rules that can be input by any designer and be automatically converted into SPARQL queries.
- The rule on the top of page 3 is somewhat odd, in the sense that it concludes (THEN-part) that 'a Space has an Outlet', whereas it actually means to say that 'a Space have an outlet'. This is not common. It makes somewhat sense when the engine looks for the negatives (as explained in one of the later Sections), but it seems odd in this part of the paper.
- "Furthermore, since the SMC template language relies on choosing from the BIM models for the concept and relation description, it [cannot] be used independently as a formal description of building codes". => I agree with this sentence, but this sentence also marks a flaw / weakness in the SNL (see previous point). If the user is not supported in picking from a list of available terms, the SNL rule can contain anything, making it non-formal, and very expensive to reformat into checkable SPARQL queries (see above).
- I am missing a reference here to the work around NLP and building regulations, as documented in http://www.sciencedirect.com/science/article/pii/S0926580516301819.

4. Semantic extraction of BIM models
- It is really great to see you consider the extraction of subsets of a BIM model before starting the code compliance checking!! It is a pity to see that it is all implemented in code without considering any semantic web technologies. Do you think that would be a viable alternative, for example to maintain the link with a standard ontology like ifcOWL? Cfr. "Finally, the extracted sub-model M is transformed into the [RDF graph] [using the] Jena API[no -S]."
- "Considering the checking efficiency, we reorganize the structure of the [RDF graph], rather than keeping it another representation of IFC files like the work in the literature" => I can only strongly support, encourage and appreciate moving away from the standardised language for gaining performance in specific tasks, like rule checking. This is also the entire idea behind the SimpleBIM procedure as documented in https://biblio.ugent.be/publication/8041826 (please read this). In that case, however, a link is maintained to the actual ifcOWL standard. Would it not be better to take the same route and publish your model both in compliance with an ifcOWL standard and in a simplified version of that? In any case, it is recommended to maintain clear reference to 'a' (standard) ontology, so that people and machines outside your network can also interpret your data.
- "a layer of search" -> please reformulate this into something more meaningful. Perhaps: "It is clear that the lower graph contains less statements and can thus be processed far more efficiently than the top one." Please also see the SimpleBIM article above in this context; it is the exact same approach, while maintaining reference to the standard.

5. Checking engine and optimization strategies
- "The SPARQL queries are generated by a[n] SNL syntax-directed style and the transformation is a structural procedure" => This part of the process (SNL to SPARQL) is hardly presented, yet it has a tremendous impact on the scalability of this approach. Anyone implementing the SNL language will need this conversion from the user-oriented SNL to a formal language (e.g. SPARL) to be as efficient, inexpensive and scalable as possible. Can you please indicate in more detail how this works? And why one would not directly implement rules in SPARQL?
- Most, if not all, automated code compliance initiatives using semantic web technologies that rely on SPARQL, take strong advantage of the CONSTRUCT feature to encode the IF-THEN rules. The IF-THEN structure clearly maps well with those CONSTRUCT queries. This contrasts a lot with the SPARQL queries presented in this paper, which consist only of SELECT statements and FILTERs. This looks like an odd choice in the context of rule-checking. This choice also has not really been evaluated or considered? Can you please explain in the paper?
- Similarly, have you considered the use of SPIN?
- "We made the mappings automatically by adopting related domain knowledge." -> This seems to indicate that this step requires both a domain specialist and a programmer. How else can one 'adopt domain knowledge'? Is that true? Please specify in the paper.
- "Through the seamless and automatic transformation" -> I am far from convinced from this statement. Yes, it is seamless and automatic as soon as a programmer has implemented the conversion, but he will likely need to do this again and again (so, not that seamless and automatic after all). Please provide more evidence here.
- The analysis of the query performance is great, and I would like to encourage you to refer to the conclusions made in http://www.sciencedirect.com/science/article/pii/S1474034617301945. They are in line with what you have.

6. Applications
- The tool at http://sts.thss.tsinghua.edu.cn:8079/bimchecker/ was not available when I tested it. I hope that this web page contains datasets and not just a running program. It would be valuable to see the actual data (e.g. the IFC2X3 ontology used).

Textual remarks:
-----------------
1. Introduction:
- "Solibri Model Checker(SMC)" should be "Solibri Model Checker (SMC)" => please add a space before every '(' in the paper. There are several such typo's throughout the entire paper.
- "generic and transboundary model representation are" -> "generic and transboundary model representation[s] are"
- replace "scale" with "size"
- "They have various relationships and [are] intertwined"
- "The exist[ing] regulations are represented"
- "the variety of regulations request[s] a flexible"
- "template-based rule representations [seldom] meet these requests"

3. The SNL Language
- "However, it is not [easily understood] by human beings"
- "These methods [are less flexible and less extensible,] since the rules to be"
- "The SNL language[no -S] is organised by sentences"
- "to express [that] the rules should be obeyed in any case" / or better: "to express [which] rules should be obeyed in any case"
- "to denote [that] the requests should be obeyed" / or better "to denote [which] requests should be obeyed"
- "digital" -> "numerical"?
- "should be bigger or equal[no -S] to 10 m²"
- "It is formalized by requesting the value range of the data property 'area' of the entity LivingRoom." => this is unclear, please reformulate.
- "[easily understood]"
- "This SNL sentence[no -S] denotes a request"

4. Semantic extraction of BIM models
- "[Real-world] BIM models are often large in [size]" (first sentence)
- "It can autoamtically extract the [] entities, attributes and relations"
- "scale" -> "size"
- "Three BIM modes" -> "Three BIM models"
- "2th" -> "2nd"
- "only a small part of the model [is] related."
- "extraction of Z15-F011-MEP get[s]"
- "the the" -> "the"

5. Checking engine and optimization strategies
- "We first transform a[n] SNL sentence"
- "to facilitate user description, and implementation-independent" -> "to facilitate the description of rules by domain experts, and to facilitate the decription of implementation-independent rules".
- "related problematic component[no -S] set C."
- "by a[n] SNL syntax-directed style"
- "the spaces which are bedrooms, [but do not have a window (boundaryElement)]."
- "there are two key mappings, the concept [mapping] and the relation [mapping]"
- "The generated SPARQL queries [] work well in terms of"
- "cost much searching time [in our experience]"
- "scale" -> "size"
- "the order [] of the triples affect[s]"
- "and [place] the entities with more branches [first in] the query"
- "can be represented [in] different forms, [...], [but with] different time cost[s]."
- "For example, [T]he form [] 'FILTER [...] ?X' [can be] optimized as"
- "It can first avoid this part to be executed very late since the FILTER parts are executed [] last in Jena" => please reformulate

6. Applications
- "[in] Java 8"
- "[in] C#"
- Please remove section header 6.1. If there is no section 6.2, then it makes no sense to make a section 6.1.
- "We illustrate[no -S] some of the checking results []."
- "36 items which cover[no -S]"
- "We found [that] 5 itmes [] failed"
- please add a space before every '(' in the paper.
- "their insulation thickness[es]"
- "the tool found [that] 101 pipe fittings"
- "to [] a pipe identified as"

7. Conclusion
- "studied [in] recent years"
- "are still left" -> reformulate
- "to automatic[] rule checking of BIM models"
- "are eas[il]y read and understood by human beings"
- "can be given [effectively and fast]"
- "applied it []to"
- "and appl[y] [the suggested approach] to other practical [] buildings"

Review #4
Anonymous submitted on 24/Jul/2017
Suggestion:
Major Revision
Review Comment:

The authors present a simplified query language for rule checking. The system proposed is simple and as the authors suggest a pragmatic one. despite some mistakes in English, the writing is clear. However, the originality is the main issue for me. As presented, someone can see the whole research as an attempt to mask the complexity of SPARQL by a set of simplified terms. This is good. but as submitted, does not seem to be a well-presented original work. The authors did not review/compare their work to similar work. If They do, they should contrast their work to BIM and non BIM work. Other than the simplicity of the SNL, where is the contribution. Did the authors study/evaluate the validity of the proposed SNL in different situations. in other words, how can you show that this SNL is not only simple but also valuable and uniquely developed. Are additional rules/components needed or is what is presented is enough to cover all possible scenarios. the IFC/BIM extraction rules/method were not clear because of not so organized language. I encourage the authors to revise with clearer description and a discussion on the added-value of their methods and how is it different from existing systems. All in all, as written, the paper seems to showcase a programming efforts and not a methodical research with appropriate questioning and evaluation.
This manuscript was submitted as 'full paper' and should be reviewed along the usual dimensions for research contributions which include (1) originality, (2) significance of the results, and (3) quality of writing.