勷勤数学•专家报告
题 目:Spike variational reinforcement learning of equilibrium mean-variance investment strategy
报 告 人:王玲 副教授 (邀请人:杨舟)
中央财经大学
时 间:11月11日 16:00-17:00
地 点:数科院西楼二楼会议室
报告人简介:
王玲副教授毕业于香港中文大学统计系,中央财经大学保险学院。研究兴趣包括精算数学、随机控制和机器学习。研究结果在国际期刊《Journal of Risk and Insurance》、《European Journal of Operational Research》、《Insurance: Mathematics and Economics》、《Scandinavian Actuarial Journal》上发表,主持国家自然科学基金一项。
摘 要:
Most reinforcement learning (RL) algorithms are based on the dynamic programming principle (DPP). Related algorithms, such as Q-learning, often involve examining the reward function during the training procedure. However, time inconsistency, prevalent in many finance problems, such as the mean-variance (MV) investment problem, violates the DPP and thus introduces challenges to continuous-time RL. To address time inconsistency, one approach uses the Nash subgame perfect equilibrium among the current and all future selves of an investor. Consequently, the task transforms from optimizing the reward function to finding the equilibrium. The spike variation technique is crucial in solving stochastic control problems in continuous time without assuming DPP. In this paper, we propose an open-loop spike variational RL for equilibrium MV investment subject to the Shannon entropy regularizer. We provide sufficient conditions to characterize the exploratory policy. Under these conditions, we introduce the concept of spike variational RL and show that the corresponding Gaussian exploratory policy is unique. Our results feature explicit solutions for exploratory time-consistent MV investment problems for both constant and state-dependent risk aversions. Numerical experiments indicate that our RL algorithm performs comparably to existing methods for the constant risk aversion case as well as the state-dependent risk aversion case.
欢迎老师、同学们参加、交流!