TY - GEN
T1 - FPGA-accelerated samplesort for large data sets
AU - Chen, Han
AU - Madaminov, Sergey
AU - Ferdman, Michael
AU - Milder, Peter
N1 - Publisher Copyright:
© 2020 Association for Computing Machinery.
PY - 2020/2/23
Y1 - 2020/2/23
N2 - Sorting is a fundamental operation in many applications such as databases, search, and social networks. Although FPGAs have been shown very effective at sorting data sizes that fit on chip, systems that sort larger data sets by shuffling data on and off chip are bottlenecked by costly merge operations or data transfer time. We propose a new technique for sorting large data sets, which uses a variant of the samplesort algorithm on a server with a PCIeconnected FPGA. Samplesort avoids merging by randomly sampling values to determine how to partition data into non-overlapping buckets that can be independently sorted. The key to our design is a novel parallel multi-stage hardware partitioner, which is a scalable high-throughput solution that greatly accelerates the samplesort partitioning step. Using samplesort for FPGA-accelerated sorting provides several advantages over mergesort, while also presenting a number of new challenges that we address with cooperation between the FPGA and the software running on the host CPU. We prototype our design using Amazon Web Services FPGA instances, which pair a Xilinx Virtex UltraScale+ FPGA with a high-performance server. Our experiments demonstrate that our prototype system sorts 230 key-value records with a speed of 7.2 GB/s, limited only by the on-board DRAM capacity and available PCIe bandwidth. When sorting 230 records, our system exhibits a 37.4x speedup over the widely used GNU parallel sort on an 8-thread state-of-the-art CPU.
AB - Sorting is a fundamental operation in many applications such as databases, search, and social networks. Although FPGAs have been shown very effective at sorting data sizes that fit on chip, systems that sort larger data sets by shuffling data on and off chip are bottlenecked by costly merge operations or data transfer time. We propose a new technique for sorting large data sets, which uses a variant of the samplesort algorithm on a server with a PCIeconnected FPGA. Samplesort avoids merging by randomly sampling values to determine how to partition data into non-overlapping buckets that can be independently sorted. The key to our design is a novel parallel multi-stage hardware partitioner, which is a scalable high-throughput solution that greatly accelerates the samplesort partitioning step. Using samplesort for FPGA-accelerated sorting provides several advantages over mergesort, while also presenting a number of new challenges that we address with cooperation between the FPGA and the software running on the host CPU. We prototype our design using Amazon Web Services FPGA instances, which pair a Xilinx Virtex UltraScale+ FPGA with a high-performance server. Our experiments demonstrate that our prototype system sorts 230 key-value records with a speed of 7.2 GB/s, limited only by the on-board DRAM capacity and available PCIe bandwidth. When sorting 230 records, our system exhibits a 37.4x speedup over the widely used GNU parallel sort on an 8-thread state-of-the-art CPU.
KW - FPGA
KW - Partitioning
KW - Samplesort
KW - Sorting
UR - https://www.scopus.com/pages/publications/85082051559
U2 - 10.1145/3373087.3375304
DO - 10.1145/3373087.3375304
M3 - Conference contribution
AN - SCOPUS:85082051559
T3 - FPGA 2020 - 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
SP - 222
EP - 232
BT - FPGA 2020 - 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
PB - Association for Computing Machinery, Inc
T2 - 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, FPGA 2020
Y2 - 23 February 2020 through 25 February 2020
ER -