Context-Aware Composition of Agent Policies by Markov Decision Process Entity Embeddings and Agent Ensembles

Tracking #: 3403-4617

Nicole Merkle
Ralf Mikut

Responsible editor: 
Agnieszka Lawrynowicz

Submission type: 
Full Paper
Computational agents support humans in many areas of life and are therefore found in heterogeneous contexts. This means that agents operate in rapidly changing environments and can be confronted with huge state and action spaces. In order to perform services and carry out activities satisfactorily, i.e. in a goal-oriented manner, agents require prior knowledge and therefore have to develop and pursue context-dependent policies. The problem here is that prescribing policies in advance is limited and inflexible, especially in dynamically changing environments. Moreover, the context (i.e. the external and internal state) of an agent determines its choice of actions. Since the environments in which agents operate can be stochastic and complex in terms of the number of states and feasible actions, activities are usually modelled in a simplified way by Markov decision processes so that, for example, agents with reinforcement learning are able to learn policies, i.e. state-action pairs, that help to capture the context and act accordingly to optimally perform activities. However, training policies for all possible contexts using reinforcement learning is time-consuming and requires e.g. demonstration learning or simulation of environments. A requirement and challenge for agents is to learn strategies quickly and respond immediately in cross-context environments and applications, e.g., the Internet, service robotics, cyber-physical systems, or medicine. In this work, we propose a novel simulation-based approach that enables a) the representation of heterogeneous contexts through knowledge graphs and entity embeddings and b) the context-aware composition of policies on demand by ensembles of agents running in parallel. The evaluation we performed on the "Virtual Home" dataset shows that agents that need to seamlessly switch between different contexts, e.g. in a home environment, can request on-the-fly composed policies that lead to the successful completion of context-appropriate activities without having to learn these policies in lengthy training steps and episodes, in contrast to agents that apply reinforcement learning. The presented approach enables both context-aware and cross-context applicability of untrained computational agents. Furthermore, the source code of the approach as well as the generated data, i.e. the trained embeddings and the semantic representation of domestic activities, is open source and openly accessible on Github and Figshare.
Full PDF Version: 

Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 04/May/2023
Major Revision
Review Comment:

(1) originality
As far as I can tell the work is relatively novel. The idea to use a combination of knowledge graphs and word embedding to teach agents state-action plans seems to me a novel direction.
However, I am asking as to whether the simulation-format really do move away from what the authors argues is a less "demonstration from the environment". It is simply a different format of demonstrations which is rather common in training household agent behavior.

The ontology bit in section 5, seems to make sense to me while using this kind of format is state-of-the-art.

(2) significance of the results
I found the paper quite ambitious in what has been attempted. It is an interesting topic that can have a substantial impact as the technologies for household agents become more available.
The results of the evaluation strikes me as (maybe even "too") impressive, although the comparison to only one RL algorithm needs 1. better motivation as to why this alone was picked (is it the best one?).
And 2. While I understand that the comparison is based on performance, if I understand the authors correctly, it is an unfair comparison. The RL agent is just one agent, whereas the approach here is to use an ensemble of agent to train appropriate state-action behavior. This needs to be better discussed and explained as part of the contribution.

(3) quality of writing
The paper can use a bit of a polish. While the English is good (to a non-native speaker's eyes), some sections read a bit too much like a student's essay and less like a scientific contribution. This is particularly apparent in section 3 and 4, which needs to be greatly improved for a publication in the journal. Further, the language contains several statements that generalize too much or lack references. There is also the problem that often the introduction of methods and concepts comes too late. One example is, unless I missed it, it is not clear that the ontology is written in SPARQL (and in the complementary datafiles there is a .json file) until section 5.4.

Another issue with the writing is also that it is not entirely clear how the work is motivated. The paper quickly discuss technical details without much positioning of the work into application areas or what the purpose of their work really is.
This is very clear in section 2. which describes the system flow to some detail, but does not really provide me with any understanding on what the whole thing is supposed to do. Since they are using household tasks as the basis for their investigation, I would have liked to see real examples of what their agents actually "do" so to speak.

This is also relevant to the result and the evaluation of the paper.

Exactly how the technical components of the MDP, the ontology and the embedding DNN are fitted together is also not entirely clear to me. While I suppose it is possible to represent MDP as ontologies and learn the probabilities from embeddings, I do not see the direct benefit nor how it has been done here. I am assuming this is what is explained in section 5.2. and 5.3. however, they (especially 5.2.) are a bit too general to be truly understood within the paper's setting. Perhaps all of this can be better explained by making section 2 more explicit and by adding an actual example from the household database.

The conclusion also needs to add a paragraph that summarizes their actual system to an appropriate detail.

Issues to fix:
- p.3.r.13-17: I don't think this is obvious or a direct consequence of your proposal. Tone it down.
- Fig. 6 needs to be improved. I don't understand what I am looking at.
- Why is the format of fig 7 restricted to the RL agent? Don't you need a comparison to your system as well?

Minor things to fix:
- consistency in using Section vs. Sec.
- Fig. 1 is very hard to follow. reduce the arrows and the size of things, it is not clear how information flows nor which order I am to look at it (despite the numbering).
- p1.r:29: I do not understand why "medicine" is thrown in there. What is referred to: Medical research? medical application? healthcare environments?
- p2.r28: Fix the "-" in the hypothesis. I would try to keep the hypothesis stand-alone without explanations within it.
- p3.r.43-45: I am not sure if this is supposed to be here.
- alg. 2: I think there is one xS too much ?
- the order of Fig 3 and 4. Right now, fig. 4 is referenced before 3.
- p14.r.1-2: superfluous use of etc. and e.g. - pick one.
- p15.r.1: experiments missing references.
- remove the several instances of "clearly" from the results section.
- p26.r.46: what kind of capacity? computer power?
- p26.r.50: further more -> Furthermore,

(4) whether the provided data artifacts are complete.
A. As far as I can tell, the complementary data and code looks correct and with good organisation.
B. It looks like it should be possible to use to code to re-run the experiments (I didn't try).
C. The files are locally uploaded on SWJ site and on Figshare (link in the README file).

Final comment:
Overall I think the work has a lot of potential and would be a good fit for the journal. However, the authors should take the time to improve the presentation of their work, that way its potential impact and influence to the reader will be higher.
Ultimately it was a very interesting read!

Review #2
Anonymous submitted on 23/Jul/2023
Minor Revision
Review Comment:

This paper proposes an approach to learning context-dependent policies for autonomous agents which enables the representation of heterogeneous contexts through knowledge graphs and entity embeddings and on-demand composition of policies by ensembles of agents running in parallel. This approach is validated in simulation using the Virtual Home environemnt. The main novelty lies in proposing an approach that althought based upon Markov Decision Processes (MDP), avoids the time-consuming reinforcement learning. This approach leverages entity embeddings, trained from task-specific datasets and a MDP knowledge graph.
The paper is written and structured reasonably well, with few minor issues, like defining the same acronym several times (e.g. MDP) or referring to algorithms that are presented a couple of pages later. The simulations show that the proposed approach was able to find feasible policies for all tested Virtual Home tasks with significantly fewer search steps than the DQNN algorithm. However, using only the DQNN as a baseline and using only one environment/context do not allow this work to show the advertised advantages of the new approach, aprticularly its ability to switch contexts quickly.
The described research seems to be reproducible, as the ZIP provided with the paper contains the code and a readme, while the stable link offers software for installation and testing, although only for WinOS.