勷勤数学•专家报告
题 目:What Distributional Reinforcement Learning is Learning? A Stochastic Control Perspective
报 告 人:潘志成 副教授 (邀请人:彭小飞)
新加坡南洋理工大学
时 间:6月6日 10:20-11:20
地 点:数科院东楼401
报告人简介:
潘志成(Pun Chi Seng)教授现为新加坡南洋理工大学(NTU) 数理科学院的长聘副教授及助理院长(主管硕士项目),并兼任金融科技理学硕士项目主任。他于2018年创立该硕士项目,致力培育金融科技人才。
潘教授于2016年在香港中文大学取得统计学博士学位,其论文获得多项殊荣,包括2016年Nicola Bruti Liberati Prize(最佳定量金融博士论文)及中大理学院优秀博士论文奖等。他的高维投资组合研究在2015年亦获得INFORMS金融学组最佳学生论文奖。潘教授研究兴趣涵盖金融数学、大数据分析及人工智能在金融中的应用,发表多篇顶级期刊论文,并获得新加坡教育部、新加坡国家研究基金会、量子工程计划、NTU数据分析研究院等资助进行科研工作。
摘 要:
Distributional reinforcement learning (RL) emerges as a powerful tool for modeling risk-sensitive sequential decisions, where leveraging distribution functions in place of scalar value functions has allowed for the flexible incorporation of risk measures. However, due to the inherent time inconsistency (TIC) in the use of numerous risk measures in sequential decision making, the nature of controls under distributional RL has remained a mystery. For its use in the risk-sensitive problems in mathematical finance, this paper seeks to fill the research gap by building on the cumulative prospect theory (CPT)-based analysis of human gambling behavior and the emergence of three policy classes under TIC: precommitment, equilibrium, and dynamically optimal. We focus on the prevailing quantile-based distributional RL (QDRL) for CPT risk measures. Our theoretical results extend some results from the risk-insensitive QDRL theory to CPT prediction, from which we derive the characterization of QDRL control as an approximate equilibrium of an intrapersonal game. We empirically demonstrate the efficacy of our CPT QDRL algorithm in approaching the equilibrium. Finally, by further exploring the economic interpretation of the three policy classes in their handling of TIC, we devise some metrics and instances relevant for driving interesting patterns of interactions between these policies, including when and how the equilibrium may be more desirable than the precommitment.
欢迎老师、同学们参加、交流!