A C++17 Data Stream Processing Parallel Library for Multicores and GPUs 


WindFlow is a library for parallel data stream processing on shared-memory systems and NVIDIA GPUs. Data stream processing is a popular computing paradigm supported by several existing open-source frameworks (e.g., Apache Storm, Apache Flink and Spark Streaming). All the existing solutions target distributed systems (e.g., clusters of server machines) and are based on the Java Virtual Machine (JVM) for easing the development of streaming applications on distributed architectures. However, as recognized by recent publications [1,2], they are not effective to exploit at best the potential of scale-up architectures such as single machines equipped with several multi-core CPUs and co-processors like GPUs and FPGAs.

The library proposed in this web page tries to fill this gap by providing an easy-to-use tool with the following distinguishable features:

  • it provides a clear API fully compliant with the recent C++17 standard;
  • as in most of the existing systems, applications are represented by data-flow graphs of interconnected operators. To this end, the library provides common streaming operators like map, filter, flatmap and many others. To instantiate operators, the programmer must provide the business logic code required to do the processing (provided via lambda expressions, functor objects or plain functions). Therefore, operators can run arbitrary user-defined functions and not only predefined tasks (like relational algebra operators of old-style Data Stream Management Systems);
  • the library provides rich parallel operators that are easy to use and allow the developer to design complex parallel solutions. Some of these operators target sliding-window computations, a common paradigm in stream processing applications;
  • operators can be composed using the MultiPipe programming construct, which provides a compositional interface to build applications (i.e., to create operators and to connect them).

The library is built on top of FastFlow (version 3.x) [3], a parallel programming framework developed since 2010 by the Parallel Programming Models (PPMs) group at the Department of Computer Science, University of Pisa, Italy. The framework allows the programmer to build concurrency graphs of execution entities (nodes, each executed by a dedicated thread) exchanging data references through efficient lock-free single-producer single-consumer queues. Further details about FastFlow can be found in the web page of the project. The figure below summarizes the software layers of the FastFlow ecosystem, with WindFlow built on top of the FastFlow's building blocks aside to the high-level parallel patterns provided by the framework (e.g., parallel-for, map-reduce, stencil-reduce, divide-and-conquer and others common of High Performance Computing workloads).

FastFlow's Ecosystem

Cite our work

The main paper describing the WindFlow API and its run-time system has been published in IEEE Transactions on Parallel and Distributed Systems:

  • G. Mencagli, M. Torquati, A. Cardaci, A. Fais, L. Rinaldi, and M. Danelutto. WindFlow: High-Speed Continuous Stream Processing with Parallel Building Blocks. IEEE Transactions on Parallel and Distributed Systems, 2021, IEEE. ISSN: 1045-9219, DOI: 10.1109/TPDS.2017.2679197


Additional useful references:

  1. Steffen Zeuch, Bonaventura Del Monte, Jeyhun Karimov, Clemens Lutz, Manuel Renz, Jonas Traub, Sebastian Breß, Tilmann Rabl, and Volker Markl. 2019. Analyzing efficient stream processing on modern hardware. Proc. VLDB Endow. 12, 5 (January 2019), 516-530. DOI: https://doi.org/10.14778/3303753.3303758
  2. S. Zhang, B. He, D. Dahlmeier, A. C. Zhou and T. Heinze. Revisiting the Design of Data Stream Processing Systems on Multi-Core Processors. 2017 IEEE 33rd International Conference on Data Engineering (ICDE), San Diego, CA, 2017, pp. 659-670
  3. Aldinucci, M. , Danelutto, M. , Kilpatrick, P. and Torquati, M. (2017). Fastflow: High‐Level and Efficient Streaming on Multicore. In Programming multi‐core and many‐core computing systems (eds S. Pllana and F. Xhafa). doi:10.1002/9781119332015.ch13