We have written a paper, now on arXiv (https://arxiv.org/abs/2208.09554), that describes how we are integrating large-language models into Interactive Task Learning (ITL), where agents can learn new tasks thru interactive instruction.

The paper outlines the role of human instruction, search, and language models in ITL. Instruction is provided to the agent in natural language (“the goal is that carton of milk is in the fridge”). Search allows an agent to attempt to find sequences of actions that fulfill goals (like determining how to get a carton of milk from the table to the fridge given the goal).

The primary innovation we discuss is that the agent can now use the language model to generate both candidate goals and actions during task learning. For instance, the agent can ask the language model if it recommends a goal when the “milk is sitting on the table” and it’s been told tidy the kitchen. Human feedback often shifts from constructing goals for the agent to simply answering yes/no questions (Agent: “Is that goal that the carton of milk is in the fridge? Yes or no?”). Via an experiment described in the paper, we show that the language model can be used by the agent to significantly reduce the complexity of human instruction needed for agent learning while also ensuring that the agent learns the task correctly in only one learning experience (“one shot”).