Re: [AMBER-Developers] New GB simulation engine: mdgx.cuda

From: David Cerutti <>
Date: Thu, 23 May 2019 15:33:03 -0400

The code requires Amber18, to be more specific a branch based on Amber18 /
AmberTools19. If you do not have access to the code repository, you should
be able to download the latest AmberTools. I can then tar up the mdgx code
as well as the updated configure / configure2 scripts for you to untar in
the ${AMBERHOME} root directory. Let me know if you need me to do this.


On Thu, May 23, 2019 at 3:22 PM Ilyas Yildirim <> wrote:

> Dear David,
> We are interested to benchmark your code in our system. One of my postdoc,
> Kye Won, (cc'ed here) is studying several different compounds targeting RNA
> molecules using pmemd.cuda. We want to see how efficient your code is. We
> currently have amber16; is your code requiring amber18? If not, can we have
> access to the code? Thanks.
> Best regards,
> Ilyas Yildirim, Ph.D.
> ---------------------------------------------------------
> = Assistant Professor =
> = Department of Chemistry & Biochemistry =
> = Florida Atlantic University =
> = 5353 Parkside Drive, Building MC17, Jupiter, FL 33458 =
> =
> = Phone: +1(561)799-8325 | E-mail: =
> ---------------------------------------------------------
> = Website: =
> = =
> = =
> ---------------------------------------------------------
> ------------------------------
> *From:* David Cerutti <>
> *Sent:* Thursday, May 23, 2019 1:23:43 PM
> *To:* AMBER Developers Mailing List
> *Subject:* [AMBER-Developers] New GB simulation engine: mdgx.cuda
> Dear Developers,
> I am pleased to announce the beta release of my latest contribution to the
> Amber fleet of simulators, mdgx.cuda in the mdgxCuda branch of the
> repository. The new &pptd module is intended to simulate small systems 928
> atoms or fewer, in GB or vacuum conditions. As with pmemd.cuda GB, there
> is no cutoff to be concerned about: all particles interact with all other
> particles. The twist is that the program simulates more than one system at
> a time: dozens or even hundreds. It devotes one block of the GPU grid to
> each system and runs all of dynamics in just one kernel, which can proceed
> for thousands of steps before either moving on to a different system in the
> master list or at last shutting down.
> For individual simulations, the engine can proceed at a significant
> fraction of the speed of pmemd, and for very small systems it can even push
> many simulations at a faster pace than pmemd can use the entire card to
> push just one. An RTX-6000 (similar to a 2080Ti) running mdgx.cuda can
> push 72 copies of a 900 atom system with igb=8 at about 15% of the pace
> that pmemd can push each of them (total throughput 10x greater than
> pmemd.cuda). That same RTX-6000 can push 72 15-residue, 225-atom systems
> each at the pace pmemd.cuda can do just one (72x greater throughput), and
> for tiny oligopeptides the speedup is even greater (the card can produce
> hundreds of microseconds of aggregate trajectory per day). The thing is
> also designed to "gear down" when hundreds of copies are in play, devoting
> smaller blocks to each system in order to get the best overall output (this
> feature works as intended, but will get more polishing soon).
> There are efforts underway to add temperature and Hamiltonian REMD to the
> module (replicas at different temperatures and interpolation between
> end-point topologies are already supported, it's the exchange that isn't
> yet ready). The new module is not limited to many copies of a single
> system, however: an investigator with dozens of small peptides or
> oligosaccharides can queue them up in the same input deck. mdgx.cuda will
> turn the GPU into a miniature Beowulf cluster and use the GPU block
> scheduler as the queueing system to keep the entire card busy as long as
> possible.
> The new engine supports RATTLE (the SHAKE equivalent for mdgx's Velocity
> Verlet integrator) as well as a multiple time-stepping scheme which appears
> to have a slight speed advantage over RATTLE that grows with smaller system
> sizes. Either method can simulate systems at a 4fs time step, with the MTS
> method (which updates all bonds, angles, and 1-4 interactions as part of
> the short step) appearing to have an advantage in energy conservation as
> well. A Langevin thermostat is provided to cover a multitude of sins.
> At present I am working out a few details regarding register usage and
> kernel branching. (The dynamics kernel is pretty stuffed--if we want to add
> more features I'm going to have to find ways to keep the register pressure
> down so we don't have to drop our thread counts and thus overall speed.)
> The company that funded the development of this software, Rubryc
> Therapeutics Inc., is using the engine to great effect, getting 20x the
> product out of their GTX-1080Ti cluster with 30x more results expected from
> a new RTX cluster. We are also applying this in a collaboration with a
> group at UC Davis to study some glycopeptides in the gas phase--in these
> cases an RTX-2080Ti GPU appears to be worth about 900 CPU cores running
> pmemd.
> If I can get beta testors for the mdgxCuda branch in the repository, I
> would love feedback and stress testing. Extra features are also possible,
> but as mentioned the kernel is getting pretty stuffed so it will take some
> care to keep the GPU performance up while adding new capabilities. To get
> started, switch to the mdgxCuda branch, configure amber with -cuda and
> compile, then run mdgx.cuda -PPTD to see the on-board manual describing the
> inputs. The attached test case will show the operation of the program on
> an array of systems. To run the test case, do:
> ${AMBERHOME}/bin/mdgx.cuda -O -i <>
> As with other things mdgx, there are some niceties in there like system and
> input sanity checking, also auto-detecting available GPUs and being polite
> if all are taken, that can make their way into pmemd. (However, for
> logistical reasons, this multi-simulation capability will be unique to
> mdgx--the pmemd code would take extensive rewriting, including a new GB
> engine and major changes in the Fortran layer, to support this feature.)
> The CPU and GPU versions of the mdgx GB code are designed to work on the
> same fp32 / int32 accumulation precision model, but there is no DPFP
> version as yet.
> Dave
AMBER-Developers mailing list
Received on Thu May 23 2019 - 13:00:02 PDT
Custom Search