Parallel Exact Inference on a CPU-GPGPU Heterogenous System

作者: Hyeran Jeon , Yinglong Xia , Viktor K. Prasanna

DOI: 10.1109/ICPP.2010.15

关键词:

摘要: Exact inference is a key problem in exploring probabilistic graphical models. The computational complexity of increases dramatically with the parameters model. To achieve scalability over hundreds threads remains fundamental challenge. In this paper, we use lightweight scheduler hosted by CPU to allocate cliques junction trees GPGPU at run time. merges multiple small or splits large dynamically so as maximize utilization resources. We implement node level primitves on process assigned CPU. propose conflict free potential table organization and an efficient data layout for coalescing memory accesses. addition, develop double buffering based asynchronous transfer between overlap clique processing scheduling activities. Our implementation achieved 30X speedup compared state-of-the-art multicore processors.

参考文章(16)
M. A. Shwe, D. E. Heckerman, M. Henrion, E. J. Horvitz, H. P. Lehmann, G. F. Cooper, B. Middleton, Probabilistic diagnosis using a reformulation of the INTERNIST-1/QMR knowledge base. I. The probabilistic model and inference algorithms. Methods of Information in Medicine. ,vol. 30, pp. 241- 255 ,(1991) , 10.1055/S-0038-1634846
Hubert Nguyen, GPU Gems 3 ,(2007)
Yinglong Xia, Xiaojun Feng, Viktor K. Prasanna, Parallel Evidence Propagation on Multicore Processors Lecture Notes in Computer Science. pp. 377- 391 ,(2009) , 10.1007/978-3-642-03275-2_37
David M. Pennock, Logarithmic time parallel Bayesian inference uncertainty in artificial intelligence. pp. 431- 438 ,(1998)
S. L. Lauritzen, D. J. Spiegelhalter, Local computations with probabilities on graphical structures and their application to expert systems Journal of the royal statistical society series b-methodological. ,vol. 50, pp. 415- 448 ,(1990) , 10.1111/J.2517-6161.1988.TB01721.X
Yinan Li, Jack Dongarra, Stanimire Tomov, A Note on Auto-tuning GEMM for GPUs international conference on computational science. pp. 884- 892 ,(2009) , 10.1007/978-3-642-01970-8_89
Zhihui Du, Zhaoming Yin, David A. Bader, A tile-based parallel Viterbi algorithm for biological sequence alignment on GPU with CUDA ieee international symposium on parallel distributed processing workshops and phd forum. pp. 1- 8 ,(2010) , 10.1109/IPDPSW.2010.5470903
Brian Budge, Tony Bernardin, Jeff A Stuart, Shubhabrata Sengupta, Kenneth I Joy, John D Owens, None, Out-of-core Data Management for Path Tracing on Hybrid Resources Computer Graphics Forum. ,vol. 28, pp. 385- 396 ,(2009) , 10.1111/J.1467-8659.2009.01378.X
A. B. Kahn, Topological sorting of large networks Communications of The ACM. ,vol. 5, pp. 558- 562 ,(1962) , 10.1145/368996.369025
Viktor K. Prasanna, Yinglong Xia, Parallel exact inference on the cell broadband engine processor ieee international conference on high performance computing data and analytics. pp. 58- ,(2008) , 10.5555/1413370.1413429