We present and analyze a parallel implementation of a parallel-in-time collocation method based
on
-circulant
preconditioned Richardson iterations. While many papers explore this family of
single-level, time-parallel “all-at-once” integrators from various perspectives,
performance results of actual parallel runs are still scarce. This leaves a critical
gap, because the efficiency and applicability of any parallel method heavily
rely on the actual parallel performance, with only limited guidance from
theoretical considerations. Further, challenges like selecting good parameters,
finding suitable communication strategies, and performing a fair comparison to
sequential time-stepping methods can be easily missed. In this paper, we first
extend the original idea of these fixed point iterative approaches based on
-circulant
preconditioners to high-order collocation methods, adding yet another level of
parallelization in time “across the method”. We derive an adaptive strategy to select a new
-circulant
preconditioner for each iteration during runtime for balancing convergence rates,
round-off errors, and inexactness of inner system solves for the individual time-steps.
After addressing these more theoretical challenges, we present an open-source space-
and time-parallel implementation and evaluate its performance for two different test
problems.