学术报告-张伏

学术报告


题      目:A Q-learning Algorithm for Discrete-time Linear-quadratic Control with Random Parameters of Unknown Distribution


报  告  人:张伏  副教授  (邀请人:杨舟 )

                                   上海理工大学


时      间:6月20日  09:00-10:00


地     点:数科院西楼二楼会议室


报告人简介:

        张伏,上海理工大学理学院副教授,特聘博士生导师。分别在中国矿业大学信息与计算科学专业获得理学学士学位、南京大学基础数学专业获理学硕士学位、复旦大学运筹与控制专业获理学博士。主要研究领域有随机分析、随机控制、金融数学与人工智能。主持国家自然科学基金面上项目1项,青年基金项目1项。多篇文章在《SIAM J. Contrl. Optimal》、《Stochastic Process. Appl.》、《Ann. Inst. Henri Poincaré Probab. Stat》、《Discrete Contin. Dyn. Syst.》等主流学术杂志发表。

摘      要:

       This talk studies an infinite horizon optimal control problem for discrete-time linear systems and quadratic criteria, both with random parameters which are independent and identically distributed with respect to time. A classical approach is to solve an algebraic Riccati equation that involves mathematical expectations and requires certain statistical information of the parameters. In this paper, we propose an iterative algorithm in the spirit of Q-learning for the situation where only one random sample of parameters emerges at each time step. The first theorem proves the equivalence of three properties: the convergence of the learning sequence, the well-posedness of the control problem, and the solvability of the algebraic Riccati equation. The second theorem shows that the adaptive feedback control in terms of the learning sequence stabilizes the system as long as the control problem is well-posed. Numerical examples are presented to illustrate our results. This is a joint work with Kai Du and Qingxin Meng.