

题      目:Randomized Optimal Stopping Problem in Continuous time and reinforcement learning algorithm

报  告  人:董玉超   副教授  (邀请人:杨舟 )


时      间:6月20日  10:00-11:00

地     点:数科院西楼会议室



摘      要:

       In this paper, we study the optimal stopping problem in the so-called exploratory framework, in which the agent takes actions randomly conditioning on current state and an entropy-regularized term is added to the reward functional.  Such a transformation  reduces the optimal stopping problem to a standard optimal control problem. For the American put option model, we derive the related HJB equation and prove its solvability. Furthermore, we give a convergence rate of policy iteration and compare our solution to the classical American put option problem. Our results indicate a balance between the convergence rate and bias in the choice of the temperature constant. Based on the theoretical analysis, a reinforcement learning algorithm is designed and numerical results are demonstrated for several models.