A C++17 Data Stream Processing Parallel Library for Multicores and GPUs
Overview
WindFlow is a library for parallel data stream processing on shared-memory systems and NVIDIA GPUs (distributed support is under development at the moment). Data stream processing is a popular computing paradigm supported by several existing open-source frameworks (e.g., Apache Storm, Apache Flink and Spark Streaming). All existing solutions target distributed systems (e.g., clusters of server machines) and are based on the Java Virtual Machine (JVM) for easing the development of streaming applications on distributed architectures. However, as recognized by recent publications [1,2], they are not effective to exploit at best the potential of scale-up architectures such as single machines equipped with several multi-core CPUs and co-processors like GPUs and FPGAs.
The library proposed in this web page tries to fill this gap by providing an easy-to-use tool with the following distinguishable features:
it provides a clear API fully compliant with the recent C++17 standard;
as in most of the existing systems, applications are represented by data-flow graphs of interconnected operators. The library provides common streaming operators like map, filter, flatmap and many others. To instantiate operators, the programmer must provide the business logic code required to do the processing (provided via lambda expressions, functor objects or plain functions). Therefore, operators run arbitrary user-defined functions and not only predefined transformations (like relational algebra operators of old-style Data Stream Management Systems);
the library provides window-based operators with a rich parallel semantics to accelerate continuous analytics on data streams;
operators can be composed using the MultiPipe programming construct, which provides a compositional interface to build applications (i.e., to create operators and to connect them).
The library is built on top of FastFlow (versions >= 3.0) [3], a parallel programming framework developed since 2010 by the Parallel Programming Models (PPMs) group at the Department of Computer Science, University of Pisa, Italy. The framework allows the programmer to build concurrency graphs of execution entities exchanging data references through efficient lock-free single-producer single-consumer queues. Further details about FastFlow can be found in the web page of the project. The figure below summarizes the software layers of the FastFlow ecosystem, with WindFlow built on top of the FastFlow's building blocks in addition to the high-level HPC parallel patterns provided by the framework (e.g., parallel-for, map-reduce, stencil-reduce, divide-and-conquer and others common of High Performance Computing workloads).
How to Cite
Below the list of the major scientific publications related to WindFlow:
G. Mencagli, M. Torquati, A. Cardaci, A. Fais, L. Rinaldi and M. Danelutto. WindFlow: High-Speed Continuous Stream Processing With Parallel Building Blocks. In IEEE Transactions on Parallel and Distributed Systems, vol. 32, no. 11, pp. 2748-2763, 1 Nov. 2021, doi: 10.1109/TPDS.2021.3073970
G. Mencagli, M. Torquati, D. Griebler, A. Fais, and M. Danelutto. General-purpose data stream processing on heterogeneous architectures with WindFlow. Journal of Parallel and Distributed Computing (JPDC). 2024, Elsevier. ISSN: 0743-7315, DOI: https://doi.org/10.1016/j.jpdc.2023.104782.
G. Mencagli, D. Griebler and M. Danelutto. Towards Parallel Data Stream Processing on System-on-Chip CPU+GPU Devices. In proceedings of the 30th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), Valladolid, Italy, 2022. ISBN: 978-166546958-6, DOI: 10.1109/PDP55904.2022.00014
References
Additional useful references:
Steffen Zeuch, Bonaventura Del Monte, Jeyhun Karimov, Clemens Lutz, Manuel Renz, Jonas Traub, Sebastian Breß, Tilmann Rabl, and Volker Markl. 2019. Analyzing efficient stream processing on modern hardware. Proc. VLDB Endow. 12, 5 (January 2019), 516-530. DOI: https://doi.org/10.14778/3303753.3303758
S. Zhang, B. He, D. Dahlmeier, A. C. Zhou and T. Heinze. Revisiting the Design of Data Stream Processing Systems on Multi-Core Processors. 2017 IEEE 33rd International Conference on Data Engineering (ICDE), San Diego, CA, 2017, pp. 659-670
Aldinucci, M. , Danelutto, M. , Kilpatrick, P. and Torquati, M. (2017). Fastflow: High‐Level and Efficient Streaming on Multicore. In Programming multi‐core and many‐core computing systems (eds S. Pllana and F. Xhafa). doi:10.1002/9781119332015.ch13