10.4122/1.1000000387
Naff, Richard
Richard
Naff
rlnaff@usgs.gov
Wilson, John
John
Wilson
johndw@usgs.gov
Naff, Richard
Richard
Naff
rlnaff@usgs.gov
A Comparison of Preconditioning Techniques for Parallelized PCG Solvers for the CCFD Problem
XVI International Conference on Computational Methods in Water Resources
2006
2006
Parallel algorithms for solving sparse symmetric matrix systems that might result
from the cell-centered finite difference (CCFD) scheme are compared. Parallelization
is based in partitioning the mass matrix such that each partition is controlled by a
separate process. These processes may then be distributed among a networked cluster
of processors using the standard Message-Passing Interface (MPI). MPI software allows
for multiple, simultaneous processes to coordinate and exchange information for some
central purpose. In this study, partitioning of the mass matrix is based in
decomposing the CCFD domain into non-overlapping subdomains; each subdomain
corresponds to a partition of the mass matrix. The subdomain partitions are numbered
alternately, using a red/black numbering scheme. Partitions are linked by the CCFD
coefficients corresponding to cell edges that coincide with subdomain boundaries
internal to the domain. A major portion of this work examines the best way to handle
connectivity information between partitions.
Algorithms considered in this study are based in the preconditioned conjugate
gradient scheme (PCG) and differ only in the preconditioning used. Parallelization
of the PCG solver entails running essentially identical conjugate-gradient loops on
separate processes for every subdomain partition. These loops must exchange
information globally to calculate inner products and locally for sharing connectivity
information between partitions. The classic incomplete Cholesky preconditioner with
zero fill (IC(0)) is used as the standard for comparison. Another preconditioning
scheme considered is based principally in an approximate block Gaussian (BG)
iterative solution to the problem. In both these preconditioners, arrays containing
connectivity information are passed between processes corresponding to adjacent
partitions. Because of this need to pass connectivity information, the incomplete
Cholesky preconditioner is limited, in practice, to a zero fill application. This
limitation can be partially alleviated by using a BG iteration as a preconditioner,
which requires approximate solves of individual matrix partitions. These approximate
solves can be carried out using any number of preconditioners, including IC(0);
however, BG iteration involves an additional level of approximation, giving the
result that BG iteration with approximate IC(0) solves is less efficient than using
simple IC(0) as the preconditioner in the parallel PCG algorithm. Preconditioners
based in the Jacobi scheme are also considered as they require no connectivity
information. These preconditioners represent a trade off between lower communication
costs at the expense of increased work to obtain convergence. We are presently
exploring variants of these methods for use as preconditioners in the parallelized
conjugate gradient scheme; we expect to report on the results of this work.