TY - GEN
T1 - Progressive clustering of big data with GPU acceleration and visualization
AU - Wang, Jun
AU - Papenhausen, Eric
AU - Wang, Bing
AU - Ha, Sungsoo
AU - Zelenyuk, Alla
AU - Mueller, Klaus
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2017/10/25
Y1 - 2017/10/25
N2 - Clustering has become an unavoidable step in big data analysis. It may be used to arrange data into a compact format, making operations on big data manageable. However, clustering of big data requires not only the capability of handling data with large volume and high dimensionality, but also the ability to process streaming data, all of which are less developed in most current algorithms. Furthermore, big data processing is seldom interactive, which stands at conflict with users who seek answers immediately. The best one can do is to process incrementally, such that partial and, hopefully, accurate results can be available relatively quickly and are then progressively refined over time. We propose a clustering framework which uses Multi-Dimensional Scaling for layout and GPU acceleration to accomplish these goals. Our domain application is the clustering of mass spectral data of individual aerosol particles with 8 million data points of 450 dimensions each.
AB - Clustering has become an unavoidable step in big data analysis. It may be used to arrange data into a compact format, making operations on big data manageable. However, clustering of big data requires not only the capability of handling data with large volume and high dimensionality, but also the ability to process streaming data, all of which are less developed in most current algorithms. Furthermore, big data processing is seldom interactive, which stands at conflict with users who seek answers immediately. The best one can do is to process incrementally, such that partial and, hopefully, accurate results can be available relatively quickly and are then progressively refined over time. We propose a clustering framework which uses Multi-Dimensional Scaling for layout and GPU acceleration to accomplish these goals. Our domain application is the clustering of mass spectral data of individual aerosol particles with 8 million data points of 450 dimensions each.
KW - big data
KW - clustering
KW - GPU
KW - visualization
UR - https://www.scopus.com/pages/publications/85040189804
U2 - 10.1109/NYSDS.2017.8085036
DO - 10.1109/NYSDS.2017.8085036
M3 - Conference contribution
AN - SCOPUS:85040189804
T3 - 2017 New York Scientific Data Summit, NYSDS 2017 - Proceedings
BT - 2017 New York Scientific Data Summit, NYSDS 2017 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2017 New York Scientific Data Summit, NYSDS 2017
Y2 - 6 August 2017 through 9 August 2017
ER -