We focus on extending realtime search algorithms for autonomous agents and for a multiagent world . Though realtime search provides an attractive framework for resource-bounded problem solving, the behavior of the problem solver is not rational enough for autonomous agents: the problem solver tends to perform superfluous actions before attaining the goal; and the problem solver cannot utilize and improve previous experiments. Other problems are that though the algorithms interleave planning and execution, they cannot be directly applied to a multiagent world; the problem solver cannot adapt to the dynamically changing goals; and the problem solver cannot cooperatively solve problems with other problem solvers.
The capability of learning is one of the salient features of realtime search. The major impediment is, however, the instability of the solution quality during convergence. Other problems are that (1) they try to find all optimal solutions even after obtaining fairly good solutions, and (2) they tend to move towards unexplored areas thus failing to balance exploration and exploitation.
This chapter proposes and analyzes two new realtime search algorithms to stabilize the convergence process. $B&E(B-search ( weighted realtime search) allows suboptimal solutions with $B&E(B error to reduce the total amount of learning performed . $B&D(B-search ( realtime search with upper bounds) utilizes the upper bounds of estimated costs, which become available after the problem is solved once. Guided by the upper bounds, $B&D(B-search can better control the tradeoff between exploration and exploitation.
Figure 16: Sample Tracks of Moving Target Search
We consider the case of heuristic search where the goal may change during the course of the search. For example, the goal may be a target that actively avoids the problem solver. A moving target search algorithm (MTS) is thus presented to solve this problem [1,2,3]. We prove that if the average speed of the target is slower than that of the problem solver, the problem solver is guaranteed to eventually reach the target in a connected problem space.
The original MTS algorithm was constructed using the fewest operations necessary to guarantee its completeness, and hence is not very efficient. To improve its efficiency, ideas are introduced from the area of resource-bounded planning, including (1) commitment to goals, and (2) deliberation for selecting plans. Experimental results demonstrate that the improved MTS is 10 to 20 times more efficient than the original MTS in uncertain situations.
In this algorithm, two problem solvers, starting from the initial and the goal states, physically move toward each other [4,5]. To evaluate the RTBS performance, two kinds of algorithms are proposed and are compared to realtime unidirectional search. One is called centralized RTBS where a supervisor always selects the best action from all possible moves of the two problem solvers. The other is called decoupled RTBS where no supervisor exists and the two problem solvers independently select their next moves.
Experiments on mazes and n-puzzles show that (1) in clear situations decoupled RTBS performs better, while in uncertain situations, centralized RTBS becomes more efficient, and that (2) RTBS is more efficient than realtime unidirectional search for 15- and 24-puzzles but not for randomly generated mazes. It will be shown that the selection of the problem solving organization equals the selection of the problem space, which determines the baseline of the organizational efficiency; once a difficult problem space is selected, local coordination among the problem solvers rarely overcomes the deficit.
We introduce a new problem solving paradigm, where multiple agents cooperatively achieve a shared goal. In this paradigm, during a distributed problem solving process, the reorganization of multiple agents is triggered when the agents observe undesired performance degradation. This reflective feedback between distributed problem solving and organization self-design characterizes organizational problem solving, and enables agents to maintain their efficiency in a dynamically changing environment. To understand the paradigm and to evaluate its various implementations, a simple research testbed called the tower building problem is given.
Figure 17: LRTA* and $B&E&D(B-search
The above research results have been published from Kluwer Academic Publisher .