#494 HyGCN: A GCN Accelerator with Hybrid Architecture


More

  • Aasheesh Kolli
  • Adwait Jog
  • Chita Das
  • Daniel A. Jiménez
  • Guoyang Chen
  • Jishen Zhao
  • Jung Ho Ahn
  • Lei Liu
  • Lide Duan
  • Mahmut Taylan Kandemir
  • Minsoo Rhu
  • Tao Li
  • Xiaowei Jiang
  • Yiran Chen
  • Yuan Xie
  • Yunji Chen

  • Aamer Jaleel (Nvidia)
  • Anand Sivasubramaniam (PSU)
  • Chita Das (PSU)
  • Daniel Jiménez (TAMU)
  • Gabriel Loh (AMD)
  • Jishen Zhao (UCSD)
  • Jung Ho Ahn (SNU)
  • Mahmut Taylan Kandemir (PSU)
  • Norm Jouppi (Google)
  • Paolo Faraboschi (HP)
  • Rajeev Balasubramonian (Uath)
  • Samira Khan (University of Virginia)
  • Song Han (MIT)
  • Steve Keckler (NVIDIA)
  • Tim Sherwood (UCSB)
  • Yiran Chen (Duke)
  • Yunji Chen (ICT)
  • Guangyu Sun (PKU)
  • Ravishankar Iyer (Intel)
  • Karthik Swaminathan (IBM)
  • Onur Mutlu (ETH)
  • Minsoo Rhu (KAIST)
  • Onur Kayiran (AMD)
  • Sudhanva Gurumurthi (AMD)
  • Zidong Du (ICT)
  • All (UCSB)
  • All (ICT)
  • All (Alibaba)
  • Jaleel, Aamer (Nvidia)
  • Bryan Black (AMD)
  • Rajeev Balasubramonian (Utah)
  • Kevin Cao (ASU)
  • Chen, Ke (Samsung)
  • Chen, Tianshi (Chinese Academy of Science)
  • Chen, Yunji (Chinese Academy of Science)
  • Chen, Wenguang (Tsinghua University)
  • Chen, Xiaoming (Tsinghua University)
  • Chong, Fred (University of Chicago)
  • Krish Chakrabarty (Duke)
  • Yiran Chen (University of Pitt)
  • Chita Das (Penn State, PSU)
  • Robert Dick (Michigan)
  • Paolo Faraboschi (HP Labs)
  • Diana Franklin (University of Chicago)
  • Mary Jane Irwin (Penn State, PSU)
  • Ravishankar Iyer (Intel)
  • Daniel Jimenez (TAMU)
  • Norm Jouppi (Google)
  • Changkyu Kim (Intel)
  • Eren Kursun (IBM)
  • Jian Li (IBM)
  • Helen Li (Duke)
  • Gabe Loh (AMD)
  • Onur Multu (CMU)
  • Naveen Muralimanohar (HP)
  • Vijaykrishnan Narayanan (Penn State, PSU)
  • Anand Sivasubramaniam (Penn State, PSU)
  • Jack Sampson (Penn State, PSU)
  • Marilyn Wolf (Princeton (Now GIT))
  • Jin Ouyang (Nvidia)
  • Jishen Zhao (UCSC)
  • Dimin Niu (Samsung)
  • Tao Zhang (Nvidia)
  • Mike Debole (IBM)
  • Xiaoxia Wu (Qualcomm)
  • Xiangyu Dong (Google)
  • Yibo Chen (Google)
  • Guangyu Sun (Peking University)
  • Balaji Vaidyanthan (Micron)
  • Ping Chi (Intel)
  • Jia Zhan (Uber)
  • Qiaosha Zou (Huawei)
  • Matt Poremba (AMD)

Accepted with Shepherd

[PDF] Submission (1.9MB) Aug 6, 2019, 4:48:42 PM AoE · 57bb23fd91195679dd8be9b528d3c82127133c951be2ced1c7a52220df8925e157bb23fd

Inspired by the broad use of graph data and powerful learning capability of neural networks, graph convolutional neural networks (GCNs) are proposed to analyze graph datasets using neural networks. The convolutional layers occupy the major execution time of GCNs through two primary execution phases: \emph{Aggregation} and \emph{Combination}. The former behaves as graph processing while the latter acts more like the neural networks. In order to identify the bottleneck when performing GCNs, we conduct quantitative characterizations that evidence the inefficiency in the conventional architectures. This is caused by the distinct, even opposed, memory access and computation patterns of the two phases, as well as the serialized processing of them. To address these issues, we propose the concept of GCN accelerator and implement it using a hybrid architecture. First, we build Edge- and MVM (matrix vector multiplication)-centric programming models for the dynamic & irregular \emph{Aggregation} phase and the static & intensive \emph{Combination} phases, respectively, to achieve the hardware transparency for programmers. Then, we design \emph{HyGCN} with two efficient processing engines to respectively accelerate the two phases. In \emph{Aggregation Engine}, besides the edge parallelism achieved by SIMD cores, we introduce the interval-shard graph partition to increase data reuse and the window sliding-shrinking method to decrease redundant accesses. In \emph{Combination Engine}, we build multigranular systolic arrays to perform MVMs that can flexibly be used in an independent way for lower latency or in a joint way for lower energy. At last, we further optimize the overall system via orchestrating the inter-engine pipeline and off-chip memory access coordination. Through extensive evaluation experiments, our work achieves significant improvements compared with the state-of-the-art software framework running on Intel Xeon CPU. We also analyze the optimization techniques and design space to give more insights for future researches on GCN hardwares.

M. Yan, L. Deng, X. Hu, L. Liang, Y. feng, X. Ye, Z. Zhang, D. Fan, Y. Xie
  • Mingyu Yan (State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences) <yanmingyu@ict.ac.cn>
  • Lei Deng (University of California, Santa Barbara) <leideng@ucsb.edu>
  • Xing Hu (University of California, Santa Barbara) <xinghu@ucsb.edu>
  • Ling Liang (Unicersity of Canofornia, Santa Barbara) <lingliang@ucsb.edu>
  • YuJing feng (Institute of computing technology, Chinese Academy of Sciences) <fengyujing@ict.ac.cn>
  • Xiaochun Ye (Chinese Academy of Sciences) <yexiaochun@ict.ac.cn>
  • Zhimin Zhang (State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences) <zzm@ict.ac.cn>
  • Dongrui Fan (Institute of Computing Technology, Chinese Academy of Sciences) <fandr@ict.ac.cn>
  • Yuan Xie (Univ. of California Santa Barbara) <yuanxie@ece.ucsb.edu>

Contacts

  • Accelerators, domain-specific architectures

To edit this submission, sign in using your email and password.

[Text] Reviews and comments in plain text