Review Comment:
The authors report on the iterative design of a mobile user interface for exploring a specific RDF-ized dataset. This dataset describes events that involved some type of political violence in the USA over the last 230 years. Events in this dataset are linked to named entities in DBpedia. The iterative design of this interface was informed by three user studies that were aimed at assessing its usability.
I was quite eager to read this paper. I agree that the community needs to get away from low-level representations of RDF graphs, whether they be as node-link diagrams or some other representation, and start to explore other types of representations for linked data. One promising avenue for this is to explore familiar representations. Geo-localized data and temporal data lend themselves quite well to this. The authors make a strong case for this direction of research in the introduction, which I really enjoyed reading. I had great expectations for the remainder of the paper, especially given that it seemed to be contributing some interesting user studies based on what I read in the abstract.
Unfortunately, I am afraid that the paper fails to deliver any tangible contribution beyond the introduction. First, there isn't much novelty in this work. Providing a panel with buttons and other widgets to filter a dataset of geolocalized/temporal events and displaying the result set on a map is not new. The software architecture is not new. If it is and there is some significant contribution in there, the authors need to explain it in much more detail. Doing all this on a mobile device is not new either. The authors do not claim that their contribution lies in the novelty of this UI, actually. Rather, the contribution would seem to lie in the iterative design & development process and the results from the user studies. This is perfectly fine, and such types of contributions are quite welcome, especially for this issue of SWJ. But if the contribution of a paper lies solely in its user studies, those have to be both rock solid in terms of experiment design & analysis, and they have to yield some insightful results. To my disappointment, the paper fails on both fronts, as detailed below.
But before I delve into the detail of the issues I have with the studies, there is one high-level issue I want to discuss: the rationale for studying these interfaces on a tablet is not strong enough, and makes the point of this whole enterprise a bit unclear. The authors justify designing an interface that will run on tablets rather than desktop computers on the sole basis that "haptic controls [...] could facilitate a richer, more engaging experience". This is a bit far-fetched. Why would domain experts (social scientists) work on a tablet for this sort of analysis, given that for many tasks those devices have more limited capabilities than desktop computers (as acknowledged by the authors themselves)? If the authors think that there are true advantages in using a tablet, specific hypotheses have to be formulated that capture this particular aspect of the UI, and those hypotheses have to be tested.
Another issue is that Section 3 is not covering related work effectively. It is trying to give a very broad overview of many research areas, but mostly makes high-level statements about visualization, UI design and mobile HCI, failing to reference truly relevant work. This is not what is expected from the Related Work section of a research paper.
Detailed comments about the iterative design and user studies:
- It is not clear how the application requirements get derived from the survey of existing applications. Besides, I would argue that one does not derive requirements just by looking at existing applications. This is not a proper design method. One needs to look at actual user needs and technological limitations, just to mention the most obvious.
- Related to the above, involving "authors, colleagues and friends" in the prototyping process is pretty awkward. It is ok to have some informal first steps, but why didn't you involve representative members of the actual target audience in the design process? This would probably have made the whole effort and end-result much more meaningful.
- The adherence of iOS's guidelines is anecdotal in this context.
- What led to the selection of these particular two interfaces (tag cloud and list) in the first prototype? This seems quite arbitrary. A related issue is that the flaws of the list UI discussed on p16 (selection of multiple values) are quite obvious. On mobile interfaces, lists that allow multiple selections will usually use tick marks to indicate selected items. Why didn't you do that in the first place? This would likely have addressed this issue. The more general point here is that section 5.3.4 isn't particularly insightful to the reader. This section is just reporting on a design choice that was obviously wrong in the first place. The reader does not learn anything new here.
- The experiment design is loosely described. Actually, the sections called "experimental design" are not at all describing the experimental design. You need to systematically report on the number of participants, how they were recruited, what characterizes them (what criteria were used to categorize them as experts or non-experts), and the apparatus (type of tablet, display resolution, desktop monitor resolution in expe #3). This is very important, both to assess the validity of the experiment design and the subsequent analysis of results, and for the sake of replication of the studies' results.
- Empirical results are not reported appropriately. In section 5.3.3 the authors report on two more-or-less random t-tests (in an imcomplete manner, actually), but this is the only place where we get some sort of statistical analysis. All other analyses are merely based on the comparison of mean values between conditions. This is unacceptable, as there is no way to assess the statistical differences of measures across conditions, making the whole reporting almost useless to the reader. Proper statistical analyses have to be performed on the data.
- The authors make some random observations about the data, sometimes making general claims that contradict what they have reported on. For instance, in the first experiment, they write that UI2 has performed better than UI1. Looking at Table 4, task completion times with UI1 are actually shorter for 2 out of the 3 tasks! Add to this that the mean task completion time over the three tasks is quite likely not significantly different between UI1 (272s) and UI2 (269s). For "familiar" users, UI1 was actually twice as fast overall. This is just the most obvious example of such problems in the paper. The interpretation of results is in several places not only misleading but plain wrong. Another instance of this issue is the first conclusion drawn in 5.4.7. Again, this is not acceptable for a research paper.
- There are several issues of counterbalancing conditions throughout the experiments (number of participants per condition, presentation order).
- Experiment 2: I fail to understand why the authors recruited people who participated in experiment 1 as novice users. What is the point? This would only contribute to reducing the differences with experts in an uncontrolled manner. Another question related to participants, as mentioned above, is what qualified some participants as "experts"? What were the selection criteria?
- The number of participants was somewhat low in all cases, which makes it unlikely that there would be strong observations to be made out of a proper results analysis.
- Comparing quantitative results across experiments (as the authors do with #1 and #2) can only _suggest_ possible facts. These comparisons do not _show_ or _demonstrate_ anything, simply because the conditions (UI, tasks, etc.), and the participants, were different across those studies.
- The Map4RDF interface needs to be described. Again, this is both to help readers interpret the results, and for the sake of reproducibility of the experiment.
- I do not understand how users' preferences were collected in experiment #3 (mobile vs. desktop applications). Actually, I am not even sure I understand what this means. Does it mean that (A) they preferred mobile applications vs. desktop applications generally speaking? Or that (B) they preferred one or the other just for this particular application? If (A), this isn't meaningful, as it is unlikely that one would have a preference for mobile vs. desktop independently from the application and its context of use. If (B), then the authors are extracting an additional factor for driving their analysis _from the measures_. This is questionable in terms of methodology. Beyond that, the observations in that respect are more or less stating the obvious.
In the end, I think that the community definitely needs to move in the direction suggested in the introduction of this paper. I strongly encourage the authors to keep working in this direction and resubmit their work once it has reached a higher level of maturity. But I can only recommend that before they resubmit, the authors read more literature about how to design, run, analyze and report on user studies. Example venues were such papers can be found include any SIGCHI journal and conference proceedings (CHI, UIST, CSCW, ToCHI) and other journals (IJHCS, etc.).
Finally, any future submission should be proofread. There are a lot of typos, grammar mistakes and missing words.
References to figures need to be fixed. Most of them are wrong.
Some positive aspects of the paper worth mentioning:
- Figures 9 and 10 are very nice and help quickly understand what were the differences between to successive design iterations.
- I liked the fact that the authors provided a detailed rationale for each of the tasks (questions) performed by participants.
|