作者: Hyeran Jeon , Yinglong Xia , Viktor K. Prasanna
DOI: 10.1109/ICPP.2010.15
关键词:
摘要: Exact inference is a key problem in exploring probabilistic graphical models. The computational complexity of increases dramatically with the parameters model. To achieve scalability over hundreds threads remains fundamental challenge. In this paper, we use lightweight scheduler hosted by CPU to allocate cliques junction trees GPGPU at run time. merges multiple small or splits large dynamically so as maximize utilization resources. We implement node level primitves on process assigned CPU. propose conflict free potential table organization and an efficient data layout for coalescing memory accesses. addition, develop double buffering based asynchronous transfer between overlap clique processing scheduling activities. Our implementation achieved 30X speedup compared state-of-the-art multicore processors.