Expert Q-learning: Deep Reinforcement Learning with Coarse State Values from Offline Expert Examples

dc.date.accessioned	2023-01-28T16:41:54Z
dc.date.available	2023-01-28T16:41:54Z
dc.date.created	2023-01-19T11:55:03Z
dc.date.issued	2022
dc.identifier.citation	Meng, Li Yazidi, Anis Goodwin, Morten Engelstad, Paal . Expert Q-learning: Deep Reinforcement Learning with Coarse State Values from Offline Expert Examples. Proceedings of the Northern Lights Deep Learning Workshop. 2022
dc.identifier.uri	http://hdl.handle.net/10852/99388
dc.description.abstract	In this article, we propose a novel algorithm for deep reinforcement learning named Expert Q-learning. Expert Q-learning is inspired by Dueling Q-learning and aims to incorporate semi-supervised learning into reinforcement learning through splitting Q-values into state values and action advantages. We require that an offline expert assesses the value of a state in a coarse manner using three discrete values. An expert network is designed in addition to the Q-network, which updates each time following the regular offline minibatch update whenever the expert example buffer is not empty. Using the board game Othello, we compare our algorithm with the baseline Q-learning algorithm, which is a combination of Double Q-learning and Dueling Q-learning. Our results show that Expert Q-learning is indeed useful and more resistant to the overestimation bias. The baseline Q-learning algorithm exhibits unstable and suboptimal behavior in non-deterministic settings, whereas Expert Q-learning demonstrates more robust performance with higher scores, illustrating that our algorithm is indeed suitable to integrate state values from expert examples into Q-learning.
dc.language	EN
dc.publisher	Septentrio Academic Publishing
dc.rights	Attribution 4.0 International
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/
dc.title	Expert Q-learning: Deep Reinforcement Learning with Coarse State Values from Offline Expert Examples
dc.title.alternative	ENEngelskEnglishExpert Q-learning: Deep Reinforcement Learning with Coarse State Values from Offline Expert Examples
dc.type	Journal article
dc.creator.author	Meng, Li
dc.creator.author	Yazidi, Anis
dc.creator.author	Goodwin, Morten
dc.creator.author	Engelstad, Paal
cristin.unitcode	185,15,30,30
cristin.unitname	Seksjon for autonome systemer og sensorteknologier
cristin.ispublished	true
cristin.fulltext	original
cristin.qualitycode	1
dc.identifier.cristin	2110224
dc.identifier.bibliographiccitation	info:ofi/fmt:kev:mtx:ctx&ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.jtitle=Proceedings of the Northern Lights Deep Learning Workshop&rft.volume=&rft.spage=&rft.date=2022
dc.identifier.jtitle	Proceedings of the Northern Lights Deep Learning Workshop
dc.identifier.volume	3
dc.identifier.pagecount	9
dc.identifier.doi	https://doi.org/10.7557/18.6237
dc.type.document	Tidsskriftartikkel
dc.type.peerreviewed	Peer reviewed
dc.source.issn	2703-6928
dc.type.version	PublishedVersion