Re: [AMBER-Developers] Quick and CUDA in AmberTools

From: Ross Walker <>
Date: Tue, 23 Feb 2016 11:28:56 -0800

>>>> One thing to note is that pmemd.cuda from AMBER 16 onwards is going to need a minimum of NVCC v7.5 so it would probably be best if we just restrict the whole of AMBER + AmberTools to require a minimum of NVCC 7.5 for simplicity.
>>> OK. I guess you worried about the cuda-5.0... I currently writing tests
>>> and I'm checking that I can have the same results for cuda > 5. I don't
>>> mind going directly to 7.5, that will save me some work.
>> Cool - so if I change the configure script to test for 7.5 and later or quit and request one upgrades NVCC there would be no objects yes?
> That should be OK for me, although I haven't tested 7.5 yet.
> Btw, my compilation time problem comes also by the fact that ./configure
> -cuda builds a config.h that requires nvcc to compile for many
> architectures (5 gencode if I count correctly). Reducing this number
> could also help since nvcc recompiles the same for each architecture
> (and it's not parallel!). Maybe through configure?

Yeah this is something we should think about but I don't know an easy solution to it that keeps things simple from a user perspective. I.e. I don't think we want to go down the path of having different executables for different GPUs or requiring people to specify which GPU they have when they configure since just explaining to someone which options they need would be a mission in itself. Even I have not been able to rationalize things in my own head - e.g. sm_53 is for M60 while sm_52 is for Titan-X, M40 etc but I can't see any real difference in the feature set between M60 and M40 other than they give M60 the magical 'marketing' acronym 'GRID' to imply in some mythical way that it is designed for use in the cloud / grid in some form.

There are also many examples in the wild of mixed architecture clusters and even mixed architecture nodes so having a single executable that works everywhere is desirable from a user perspective. That said I am a flip of a coin away from deciding if we drop Fermi support [C2050, 2070, M2090, GTX 2XX etc] allowing us to unlock a bunch of additional optimizations that can't currently be done and allow us to drop the sm_2X options. That said this is only a small 'temporary' fix since as soon as Pascal comes out there will be a bunch more SM options added on the end.

What about the idea of offering people pre-built binaries. We've shied away from this in the past but it might be something we want to consider - NAMD do this and it works well in my experience. Plus the diversity of architectures these days is way less than it used to be in the past - x86_64 + nvidia GPU - with the libraries included in the binary distribution would encompass about 95% of all architectures people run AMBER on I would think.

Or just provide the object precompiled for that one file and see if there is a smart way to detect at configure time if it usable or not?

All the best

|\oss Walker

| Associate Research Professor |
| San Diego Supercomputer Center |
| Adjunct Associate Professor |
| Dept. of Chemistry and Biochemistry |
| University of California San Diego |
| NVIDIA Fellow |
| | |
| Tel: +1 858 822 0854 | EMail:- |

Note: Electronic Mail is not secure, has no guarantee of delivery, may not be read every day, and should not be used for urgent or sensitive issues.

AMBER-Developers mailing list
Received on Tue Feb 23 2016 - 11:30:05 PST
Custom Search