Except that on a LOT of systems these days this is the only way to run the
test cases since interactive parallel jobs are not allowed or are limited to
a short amount of time that is not sufficient to get the tests completed. On
some, but not all, systems one can request an interactive job through the
queuing system and conceivably run the tests this way but of course the same
issues surround what the 'mpirun' command looks like.
This is the bit I do not think will work and I do not think can be made
anywhere near general enough. Note a number of systems, for example CRAY,
specify both the size of the job (in cores) and the number of cores per node
to use. Would it not make more sense to have several parallel test targets.

E.g. (better names are probably possible)

Tests that can run on any number of cpus.

Tests that only run on 2 cpus.

Tests that only run on 4 cpus.

etc. Of course this doesn't deal with the fact that a lot of the test cases
will run on up to 4 cpus say and that is all but happily run on 1,2,3 or 4
cpus. The current approach that some test cases have, which is to skip the
test if the number of cpus is not compatible is a bad idea I think since
this just results in these test cases never being run.

