dc.date.accessioned | 2023-01-28T16:41:54Z | |
dc.date.available | 2023-01-28T16:41:54Z | |
dc.date.created | 2023-01-19T11:55:03Z | |
dc.date.issued | 2022 | |
dc.identifier.citation | Meng, Li Yazidi, Anis Goodwin, Morten Engelstad, Paal . Expert Q-learning: Deep Reinforcement Learning with Coarse State Values from Offline Expert Examples. Proceedings of the Northern Lights Deep Learning Workshop. 2022 | |
dc.identifier.uri | http://hdl.handle.net/10852/99388 | |
dc.description.abstract | In this article, we propose a novel algorithm for deep reinforcement learning named Expert Q-learning. Expert Q-learning is inspired by Dueling Q-learning and aims to incorporate semi-supervised learning into reinforcement learning through splitting Q-values into state values and action advantages. We require that an offline expert assesses the value of a state in a coarse manner using three discrete values. An expert network is designed in addition to the Q-network, which updates each time following the regular offline minibatch update whenever the expert example buffer is not empty. Using the board game Othello, we compare our algorithm with the baseline Q-learning algorithm, which is a combination of Double Q-learning and Dueling Q-learning. Our results show that Expert Q-learning is indeed useful and more resistant to the overestimation bias. The baseline Q-learning algorithm exhibits unstable and suboptimal behavior in non-deterministic settings, whereas Expert Q-learning demonstrates more robust performance with higher scores, illustrating that our algorithm is indeed suitable to integrate state values from expert examples into Q-learning. | |
dc.language | EN | |
dc.publisher | Septentrio Academic Publishing | |
dc.rights | Attribution 4.0 International | |
dc.rights.uri | https://creativecommons.org/licenses/by/4.0/ | |
dc.title | Expert Q-learning: Deep Reinforcement Learning with Coarse State Values from Offline Expert Examples | |
dc.title.alternative | ENEngelskEnglishExpert Q-learning: Deep Reinforcement Learning with Coarse State Values from Offline Expert Examples | |
dc.type | Journal article | |
dc.creator.author | Meng, Li | |
dc.creator.author | Yazidi, Anis | |
dc.creator.author | Goodwin, Morten | |
dc.creator.author | Engelstad, Paal | |
cristin.unitcode | 185,15,30,30 | |
cristin.unitname | Seksjon for autonome systemer og sensorteknologier | |
cristin.ispublished | true | |
cristin.fulltext | original | |
cristin.qualitycode | 1 | |
dc.identifier.cristin | 2110224 | |
dc.identifier.bibliographiccitation | info:ofi/fmt:kev:mtx:ctx&ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.jtitle=Proceedings of the Northern Lights Deep Learning Workshop&rft.volume=&rft.spage=&rft.date=2022 | |
dc.identifier.jtitle | Proceedings of the Northern Lights Deep Learning Workshop | |
dc.identifier.volume | 3 | |
dc.identifier.pagecount | 9 | |
dc.identifier.doi | https://doi.org/10.7557/18.6237 | |
dc.type.document | Tidsskriftartikkel | |
dc.type.peerreviewed | Peer reviewed | |
dc.source.issn | 2703-6928 | |
dc.type.version | PublishedVersion | |