The fftMPI library computes multi-dimensional FFTs in parallel where the FFT grid is distributed across processors. Features and limitations of the library are as follows:
In the fftMPI context, a "tiling" of the 2d or 3d FFT grid is how it is distributed across processors. Imagine a N1 x N2 or N1 x N2 x N3 grid partitioned into P tiles, where P is the number of MPI tasks (processors). Each tile is a "rectangle" of grid points in 2d or "brick" of grid points in 3d. Each processor's tile can be any size or shape, including empty. The P individual tiles cannot overlap; their union is the entire grid. This means each point of the global FFT grid is owned by a unique processor.
A 2d FFT is performed as a set of N2 1d FFTs in the first dimension, followed by N1 1d FFTs in the 2nd dimension. A 3d FFT is performed as N2*N3 1d FFTs in the first dimension, then N1*N3 1d FFTs in the 2nd, then N1*N2 1d FFTs in the third dimension.
The FFT result can overwrite the input values (in-place FFT) or be written to new memory. However note that fftMPI also allocates additional memory internally to buffer MPI send and recv messages.
The 1d FFTs are not computed by fftMPI itself, but rather by calls each processor makes to an external library. As listed above, fftMPI has support for these libraries:
What fftMPI encodes is the parallel communication necessary to remap grid data between processors. This involves both sending/receiving data between processors and re-ordering data on-processor, between each stage of 1d FFT computations. This distributes the sets of 1d FFT computations across processors, and stores the data for individual 1d FFTs contiguously on each processor.