City Metro Network Expansion with Reinforcement Learning

作者: Yu Wei , Minjia Mao , Xi Zhao , Jianhua Zou , Ping An

DOI: 10.1145/3394486.3403315

关键词: HeuristicsMobile phoneMarkov decision processField (computer science)Reinforcement learningFlow networkOperations researchProcess (engineering)Variance (accounting)Computer science

摘要: City metro network expansion, included in the transportation design, aims to design new lines based on existing network. Existing methods field of either (i) can hardly formulate this problem efficiently, (ii) depend expert guidance produce solutions, or (iii) appeal problem-specific heuristics which are difficult design. To address these limitations, we propose a reinforcement learning method for city expansion problem. In method, line as Markov decision process (MDP), characterizes sequential station selection. Then, train an actor-critic model next basis The actor is encoder-decoder with attention mechanism generate parameterized policy used select stations. critic estimates expected cumulative reward assist training by reducing variance. proposed does not require during since procedure only relies calculation tune better Also, it avoids difficulty designing formalizing Considering origin-destination (OD) trips and social equity, expand current Xi'an, China, real mobility information 24,770,715 mobile phone users whole city. results demonstrate advantages our compared approaches.

参考文章(27)
Volodymyr Mnih, Ioannis Antonoglou, Koray Kavukcuoglu, Daan Wierstra, Martin A. Riedmiller, Alex Graves, David Silver, Playing Atari with Deep Reinforcement Learning arXiv: Learning. ,(2013)
Gilbert Laporte, Juan A Mesa, Francisco A Ortega, Ignacio Sevillano, None, Maximizing Trip Coverage in the Location of a Single Rapid Transit Alignment Annals of Operations Research. ,vol. 136, pp. 49- 63 ,(2005) , 10.1007/S10479-005-2038-0
Zhongzhen Yang, Bin Yu, Chuntian Cheng, A Parallel Ant Colony Algorithm for Bus Network Optimization Computer-aided Civil and Infrastructure Engineering. ,vol. 22, pp. 44- 55 ,(2007) , 10.1111/J.1467-8667.2006.00469.X
Gilbert Laporte, Juan A Mesa, Francisco A Ortega, None, Optimization methods for the planning of rapid transit systems European Journal of Operational Research. ,vol. 122, pp. 1- 10 ,(2000) , 10.1016/S0377-2217(99)00016-8
Ivo Grondman, Lucian Busoniu, Gabriel A. D. Lopes, Robert Babuska, A Survey of Actor-Critic Reinforcement Learning: Standard and Natural Policy Gradients systems man and cybernetics. ,vol. 42, pp. 1291- 1307 ,(2012) , 10.1109/TSMCC.2012.2218595
Giuseppe Bruno, Michel Gendreau, Gilbert Laporte, A heuristic for the location of a rapid transit line Computers & Operations Research. ,vol. 29, pp. 1- 12 ,(2002) , 10.1016/S0305-0548(00)00051-4
Sepp Hochreiter, Jürgen Schmidhuber, Long short-term memory Neural Computation. ,vol. 9, pp. 1735- 1780 ,(1997) , 10.1162/NECO.1997.9.8.1735
Gabriel Gutiérrez-Jarpa, Carlos Obreque, Gilbert Laporte, Vladimir Marianov, Rapid transit network design for optimal cost and origin-destination demand capture Computers & Operations Research. ,vol. 40, pp. 3000- 3009 ,(2013) , 10.1016/J.COR.2013.06.013
Hélène Dufourd, Michel Gendreau, Gilbert Laporte, LOCATING A TRANSIT LINE USING TABU SEARCH Location Science. ,vol. 4, pp. 1- 19 ,(1996) , 10.1016/S0966-8349(96)00008-3