blog




  • Essay / CACM - 546

    In (7) and (8), the CACM-RL and CACM-RL* valuations per share are presented. In (8), the transition probability is taken into account, based on the superimposed shaded regions in Figure 6. As can be noted in Table II, the most rewarded action in CACM-RL is a2, but in CACM-RL* is a1. Indeed, even if a2 transits faster, the transition a1 obtains a greater reward. There are several methods to evaluate the transition probability of each pair (state, action). One approach is to distribute a uniform sample in each cell per dimension, but the Monte Carlo method is more appropriate when working with systems with a large number of dimensions, especially to save resources and computational time at the expense a reduction in precision. In fact, evaluating 100 points per dimension in a 4D problem requires 1004 sample evaluations per state-action. The implementation of CACM-RL* consists of adding to CACM-RL, the analysis beyond the center of the functionality of the cell, in the reward functions as they may appear in the algorithm described in the table I. Lines 11 and 22 represent the rewa...