Semantic web based rule checking of real-world scale BIM models: a pragmatic method

Tracking #: 1704-2916

Hehua Zhang
Wenqi Zhao
Jianqiao Gu
Han Liu
Ming Gu

Responsible editor: 
Guest Editors ST Built Environment 2017

Submission type: 
Full Paper
Rule checking is important to assure the integrity, correctness and usability of Building Information Models (BIMs) in Architecture, Engineering and Construction (AEC) projects. Semantic web based rule checking of BIM models are widely accepted and studied recent years. This technology has noteworthy advantages on interoperability, extensibility and logical basics. However, there are still some gaps to make it practical. One challenge is the efficiency problem on processing large-scale BIM models. The other is how to effectively input checking rules which can be understood by both human beings and machines. In this paper, we propose a pragmatic method to check real-world scale BIM models. In our framework, BIM models are transformed into a well-defined OWL model. Rules are formalized by a structured natural language (SNL) designed intentionally to describe building regulations. The checking engine is based on SPARQL queries on OWL models. We propose a rule-based model extraction method and optimization strategies on SPARQL statements, which can effectively improve the time efficiency and deal with large-scale applications. A prototype has been implemented and applied to BIM models of a real-world building project. We found out non-trivial problems in a totally automatic way, which helped to improve the quality of BIM models and verified the usability of our method.
Full PDF Version: 

Reject (Two Strikes)

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Tom Beach submitted on 28/Jul/2017
Major Revision
Review Comment:

I thank the authors for submitting their revised manuscript. In reviewing I will firstly examine my
comments and then the comments of the other reviewers:

My Comments:

-Lack of Clear Research Goal: I am still not clear about the overall research goal, to me there are already many ways
(at least in academic literature) to automate regulatory compliance. What is the authors improvements on this? Is it more
adoptable by industry, does it perform better, is it easier to use? I miss some statement of the real novelty of the approach here.
To me now, academic literature has advanced to a point in this area that simply having a working system is not enough.
This point also effects the conclusion if your contribution focused on the SNL - then I would like to see feedback from the experts using
it, describing its improvement on other processes found in literature. If the work is about performance of the system only
then this must be made clear.

-Use of BIM Sub Models: I do not find the authors response regarding the justification for requiring
the use of a "sub model" approach for BIM to be convincing - I had expected
to either see addition of references to the paper or perhaps some
experimental data.

-Concept Mapping: It is appreciated that the authors explained their "concept mapping"
approach but I am unclear on how these mappings are generated - it seems that
currently the authors assume a static set of mappings ... to me this does not
hold as a premise as you cannot guarantee the naming of objects inside of an IFC file from
one project to another (or from one BIM tool to another).

-SparQL Optimisation: The addition of the new section is appreciated - but it requires significant improvement
in spelling and grammar.

-On a final point the authors quote that the SNL is compatible with 85% of regulations. How is this figure achieved.

I also have several comments on the authors responses to other reviewer's points.

- I would like to state that to me the SNL is still not well defined enough.
Can the authors provide further description of possibly a language definition (in EBNF or similar).

- I would also like to see more detail on the justification for developing SNL, when developing a new language to me
it must be strongly justified - as to why existing languages cannot be used. I agree that SparQL is too low level, but what
about existing rule languages i.e. spin DRL etc...?

All in all the paper is a great improvement but there are still additional clarifications that must be made prior to publication.

Review #2
Anonymous submitted on 22/Sep/2017
Major Revision
Review Comment:

This manuscript is the second revision. In this revision, the authors have discussed more about state of the art, provided more details regarding expressiveness of the SNL and query optimization strategies. It is an interesting case of semantic rule checking on real-world building models, and I encourage authors to continue this work. However, I do not recommend acceptance for this manuscript since authors have not addressed some major issues.

Originality: The idea of developing an SNL to write rules with high level concepts and transform them to SPARQL queries is similar to some existing work, such as the work mentioned by the authors in section 3 and the work as follows:

K.R. Bouzidi, B. Fies, C. Faron-Zucker, A. Zarli and N.L. Thanh, (2012). Semantic web approach to ease regulation compliance checking in construction industry. Future internet, pp.830-851.

In my opinion in this aspect the paper does not show sufficient added value.
The originality of this work can be attributed to extended application of semantic rule checking on real-world, large-scale building models, but authors should present advances in their approach (especially application of Semantic Web technologies), provide additional datasets about their cases and insightful discussion about results, limitations and challenges.

Significance of the results: In section 7, the authors have showed that a real-world, large building model can be checked effectively by using SPARQL queries to prove the applicability of this approach. In section 3, authors have also stated that 85% of the national building codes and more than 95% domain codes can be supported by the SNL regarding its expressiveness. There are a few major issues about these results:

First of all, the applicability of this approach is proved by checking a high-rise building model with a set of rules. However, in this process, the model extraction, model transformation and checking result generation has very limited relations with Semantic Web technologies and applications. The only thing that is highly related is the SPARQL query process, which is related to query optimization strategies described in section 6. This part is however presented without existing research and development related to e.g. query rewriting and indexation techniques. What is the contribution of the optimization strategies in comparison with existing work?

Secondly, the statement about the SNL’s expressiveness is not convincing. Authors should provide additional datasets or detailed description/discussion about it. In my opinion, this SNL is a user interface language rather than an executable layer and how many rules it can represent depends on how many concepts are formalized and mapped to data concepts. Mapping high level concepts to building data concepts is not trivial (not always like a space which has a name of “Bedroom” is a bedroom), on the contrary it is usually the most difficult and time-consuming part in the development of a rule checking system. In this research, it is realized by a configuration file, which is not specified and seems not advanced regarding Semantic Web applications.

Thirdly, the Revit building model is 93,596 KB. What about the IFC size of the model? They should not be equivalent. The extracted OWL model has 5531 entities and 402, 364 attributes. Do you mean the transformed RDF dataset has 402,364 triples? If yes, it is not really a large building model even if the original model is 93,596 KB. This part needs to be described clearly.

It is still a general description of programming and implementation work for a case. The authors should describe the value of their approach and the improvement regarding state of the art, and provide additional datasets to support their results, otherwise other researchers can hardly profit from this paper.

Quality of writing: The general structure of this manuscript is good. I suggest to provide detailed evaluation of the SNL e.g. expressiveness and usability in section 7, since it is part of the result. I enumerate a few detailed issues as follows:

Section 1:
Building information model (BIMs) -> Building Information Model (BIM) or Building Information Modeling (BIM).
“Representing design codes with rule description languages like SWRL [11], N3Logic [12], and then taking ontology reasoners like Jess [13] for checking are popular solutions.” Could you cite some work here?
“we propose a lightweighted method which rule checks big BIM models based domain knowledge on building codes and the feature of BIM models.” This sentence needs to be rephrased.

Section 3:
“logic based language” -> logic based languages
“…and SNL has no need to do re-logical and reorganization or other processing.” I am hardly convinced by this sentence. SNL rules are defined based on informally defined concepts, I do not think users who have no programming and data modelling experiences can define ready-to-use and consistent SNL rules that require no additional adjustment. Concepts need to be mapped to specific building models to make rules executable.

“…writing SQUALL statement will use the concept of RDF”. What do you mean by “concept of RDF” here? Do you mean the RDF vocabulary?

“reference among different building codes or variety of conditions in an item”. Authors did not show examples for this sentence.

Authors should provide grammar of this SNL language rather than to describe it in text.

“Currently, SNL is able to cover…” I think authors should provide additional datasets to prove it. By the way, what does “domain codes” mean here? There should be references for the building code GB 50016-2014 and GB50096-2011.

Section 4:
What is the difference between E and EI? Do you mean E is meta model level while EI is instances?
“…, we extract the attribute set that belongs to the set A and referencing r…”. I don’t quite understand this sentence.
The extraction process and algorithm is not clear enough, and I doubt the scalability of this approach. Authors need to present clearly or provide additional datasets about how the rule library is structured, how different concepts are mapped to IFC elements (not just saying it is based on a configuration file), and what are the connections with Semantic Web technologies.

Section 5: The transformation process between SNL rules and SPARQL queries is also related to mapping between concepts. How many concepts have been mapped? Can they all be mapped with the configuration file?

Section 6: “… we made BIM domain specific optimization strategies…” Are the strategies domain specific? They look applicable for any RDF graphs.
The second strategy is not presented clearly. How many structures have you refactored? What pre-queries did you define? Could you give an example?
In general, what are differences between your strategies and existing query rewriting methods?

Section 7: In section 6, authors said the pattern of “FILTER EXISTS {}” is transformed to normal graph patterns, but the example presented in Fig. 6 and Fig. 8 did not transform this pattern. I don’t see why the query in Fig.8 has much better performance than the query in Fig.6 does. What is the time difference between them? In my opinion, in this specific example the performance might be improved but the improvement is limited. The pattern “FILTER EXISTS {?x0 ifc2x3:hasBoundaryElement ?x}” needs to be refactored and put into a proper position in the graph pattern. In this case, it is related to how to combine two trees.
In the last paragraph of this section, execution time depends on how complex the rules are. I would appreciate to see some additional datasets about the experiment (e.g. building models, RDF datasets, query sets, SNL rules and original building code).