preamble:
i have large array (one dim) , need solve evolution equation (wave-like eq). i need calculate integral @ each value of array, store resulting array of integral , apply integration again array, , on (in simple words, apply integral on grid of values, store new grid, apply integration again , on).
i used mpi-io spread on nodes: there shared .dat file on disc, each mpi copy reads file (as source integration), performs integration , writes again shared file. procedure repeats again , again. works fine. time consuming part integration , file reading-writing negligible.
current problem:
now moved 1024 (16x64 cpu) hpc cluster , i'm facing opposite problem: calculation time negligible read-write process!!!
i tried reduce number of mpi processes: use 16 mpi process (to spread on nodes) + 64 threads openmp parallelize computation inside of each node.
again, reading , writing processes time consuming part now.
question
how should modify program, in order utilize full power of 1024 cpus minimal loss?
the important point, cannot move next step without completing entire 1d array.
my thoughts:
instead of reading-writing, can ask rank=0 (master rank) send-receive entire array nodes (mpi_bcast). so, instead of each node i/o, 1 node it.
thanks in advance!!!
i here , here. fortran code second site here , c code here.
the idea don't give entire array each processor. give each processor piece works on, overlap between processors can handle mutual boundaries.
also, right save computation disk every often. , mpi-io that. think way go. codes in links allow run without reading every time. and, money, writing out data every single time overkill.
Comments
Post a Comment