Multi-objective reinforcement learning framework for unknown stochastic & uncertain environments

Pinder, JM 2016, Multi-objective reinforcement learning framework for unknown stochastic & uncertain environments , PhD thesis, University of Salford.

Download (5MB) | Preview


This dissertation focuses on the problem of uncertainty handling during learning, by agents dealing in stochastic environments by means of Multi Objective Reinforcement Learning (MORL). Most previous investigations into multi objective reinforcement learning have proposed algorithms to deal with the learning performance issues but have neglected the uncertainty present in stochastic environments. The realisation that multiple long term objectives are exhibited in many risky and uncertain real-world decision making problems forms the principle motivation of this research. This dissertation proposes a novel modification to the single objective GPFRL algorithm (Hinojosa et al, 2008) where, the implementation of a linear scalarisation methodology provides a way to automatically find an optimal policy for multiple objectives under different kinds of uncertainty. The proposed Generalised Probabilistic Fuzzy Multi Objective Reinforcement Learning (GPFMORL) algorithm is further enhanced by the introduction of prospect theory to guarantee convergence by the means of risk evaluation. The simulated grid world increased in complexity as a further two complementary and conflicting objectives were specified whilst also introducing uncertainty in the form of stochastic cross winds. Results obtained from the GPFMORL grid world simulations were compared against two more classical multi objective algorithms, MOQ and MOSARSA, showing not only a stronger convergence but also a much faster one. Experiments performed on an actual Quad-Copter/Drone demonstrated that the proposed algorithm and developed framework are both feasible and promising for the control of Artificially Intelligent (AI) Unmanned Aerial Vehicles (UAV) in a variety of real-world multi objective applications such as; autonomous landing/delivery or search and rescue. Furthermore, the observed results of this work showed that the GPFMORL method can find its major real world application in the un-calibrated control of non-linear, multiple inputs, and multiple output systems, especially in multi objective situations with high uncertainty. Proposed novel case study research prototype examples include: Controlled Environment Agriculture for optimising Hydroponic Crop Growth by the proposed “Automated Solar Powered Environmental Controller” (ASPEC). Finally the “Robotic Dementia Medication Administration System” (RDMAS) attempts to optimise liquid medication dispensing via intelligent scheduling to more appropriate times of the day when the patient is more likely to remember to take their medication, based upon previous learned knowledge and experience.

Item Type: Thesis (PhD)
Contributors: Nefti-Meziani, S (Supervisor) and Theodoridis, T (Supervisor)
Schools: Schools > School of Computing, Science and Engineering
Funders: Engineering and Physical Sciences Research Council (EPSRC)
Depositing User: JM Pinder
Date Deposited: 08 Dec 2016 09:56
Last Modified: 21 Dec 2021 13:28

Actions (login required)

Edit record (repository staff only) Edit record (repository staff only)


Downloads per month over past year