Re: [AMBER-Developers] Problem with Fortran-CUDA interface from Yinglong Miao on 2017-11-05 (Amber Developers Archive Nov 2017)

From: Yinglong Miao <yinglong.miao.gmail.com>
Date: Sun, 5 Nov 2017 16:25:24 -0600

Hi Dave,

Yes, the Fortran and cuda extern “C” functions do write to different places. The bug you mentioned was about renaming the gamd logfile if I recall correctly. I have now merged the most recent changes in master. As I checked the code more carefully, I realized the gamd_weights_and_energy were not updated for output in every ntwx steps. That affected my initial judgement when I compared the energy values in mdout and gamd.log files. When I test with ntwx=1, the energy values do match exactly as you said. We should be fine now.

Thanks,
Yinglong

> On Nov 3, 2017, at 4:11 PM, David Cerutti <dscerutti.gmail.com> wrote:
>
> I am running the example with my current branch and I do not see the
> behavior that you describe. I did recently patch the master about that bug
> I mentioned; you might try merging in the most recent changes to see if
> that fixes the behavior. I do find that the Fortran and extern "C"
> functions are writing to different places--the Fortran output is going to
> mdout, the C output is going to the terminal window. If I correlate the
> outputs, they match exactly.
>
> Dave
>
>
> On Fri, Nov 3, 2017 at 3:13 PM, Yinglong Miao <yinglong.miao.gmail.com <mailto:yinglong.miao.gmail.com>>
> wrote:
>
>> Hi Dave,
>>
>> The test system is alanine dipeptide as attached. I have been working on my
>> own mods, with the master branch checked out about one month ago. But the
>> problem should be detected in Amber16 also.
>>
>> It would be great if you are able to fix it as you clean the code, and
>> figure out what's the cause ... thanks!
>>
>> Yinglong
>>
>>
>> On Fri, Nov 3, 2017 at 1:53 PM, David Cerutti <dscerutti.gmail.com> wrote:
>>
>>> Can you provide the system on which this occurs? I am doing a major
>> sweep
>>> of the code and am nearly finished rebuilding the way bonded interactions
>>> are computed. As such, the GaMD routines are on my list of things to
>>> incorporate. We spotted a bug earlier, which has since been patched,
>> that
>>> involved sending values to extern "C" functions as literals, not
>> pointers,
>>> but that doesn't seem to be the case here. However, that problem was
>> only
>>> detected after I tried to pass additional arguments to the function,
>> which
>>> then got turned to gobbledegook. Are you using code from Amber16, master
>>> branch, or your own mods?
>>>
>>> Dave
>>>
>>>
>>> On Fri, Nov 3, 2017 at 10:35 AM, Yinglong Miao <yinglong.miao.gmail.com>
>>> wrote:
>>>
>>>> Hi All,
>>>>
>>>> I run into this problem with the Fortran-CUDA interface. Basically, in
>>> the
>>>> pme_force.F90, I called a function in the cuda/gpu.cpp as shown below.
>>> The
>>>> calculations inside the gpu function run fine, but it seems the
>> potential
>>>> energies got changed *strangely* in the first place when they are
>> passed
>>>> for calling the function:
>>>>
>>>> pme_force.F90:
>>>> write(*,'(a,2f22.12)') "Debug-p1) (pot_ene%total,
>>> pot_ene%dihedral)
>>>> = ", &
>>>> pot_ene%total, pot_ene%dihedral
>>>> call gpu_calculate_and_apply_gamd_weights(pot_ene%total,
>>>> pot_ene%dihedral, &
>>>>
>>> pot_ene%gamd_boost,num_gamd_
>>>> lag)
>>>> ...
>>>>
>>>> cuda/gpu.cpp:
>>>> extern "C" void gpu_calculate_and_apply_gamd_weights_(double*
>>>> pot_ene_tot, double* dih_ene_tot,
>>>> double*
>>> gamd_ene_tot,
>>>> double*
>>> num_gamd_lag)
>>>> {
>>>> PRINTMETHOD("gpu_calculate_and_apply_gamd_weights");
>>>> double tboost = 0.0;
>>>> double fwgtd = 1.0;
>>>> double fwgt = 1.0;
>>>> double tboostall = 0.0;
>>>> double temp0 = gpu->sim.gamd_temp0;
>>>> double ONE_KB = 1.0 / (temp0 * KB);
>>>> printf("Debug-GPU-p0) (pot_ene_tot, dih_ene_tot) = (%12.5f,
>> %12.5f)\n",
>>>> *pot_ene_tot, *dih_ene_tot);
>>>> ...
>>>>
>>>> Output:
>>>> Debug-p1) (pot_ene%total, pot_ene%dihedral) = -5991.862400868107
>>>> 9.501277353615
>>>> Debug-GPU-p0) (pot_ene_tot, dih_ene_tot) = ( -5988.16828, 9.89661)
>>>> =========
>>>>
>>>> As you can see, the energy values before and after they are passed are
>>>> different. And this problem appeared to depend on the simulation
>> length.
>>>> The energy differences are negligible when I test the code with several
>>>> thousand steps, but get larger with hundreds of thousand steps or more
>>> like
>>>> the above. Has anyone come across similar issues before? My workstation
>>> has
>>>> a new NVIDIA Quadro P5000 GPU card. Could this be related to the
>>> hardware?
>>>> If not, how may I fix it?
>>>>
>>>> Any suggestions will be much appreciated,
>>>> Yinglong
>>>>
>>>> Yinglong Miao, Ph.D.
>>>> Assistant Professor
>>>> Center for Computational Biology and
>>>> Department of Molecular Biosciences
>>>> University of Kansas
>>>> http://miao.compbio.ku.edu
>>>>
>>>> _______________________________________________
>>>> AMBER-Developers mailing list
>>>> AMBER-Developers.ambermd.org
>>>> http://lists.ambermd.org/mailman/listinfo/amber-developers
>>>>
>>> _______________________________________________
>>> AMBER-Developers mailing list
>>> AMBER-Developers.ambermd.org
>>> http://lists.ambermd.org/mailman/listinfo/amber-developers
>>>
>>
>> _______________________________________________
>> AMBER-Developers mailing list
>> AMBER-Developers.ambermd.org <mailto:AMBER-Developers.ambermd.org>
>> http://lists.ambermd.org/mailman/listinfo/amber-developers <http://lists.ambermd.org/mailman/listinfo/amber-developers>
>>
>>
> _______________________________________________
> AMBER-Developers mailing list
> AMBER-Developers.ambermd.org <mailto:AMBER-Developers.ambermd.org>
> http://lists.ambermd.org/mailman/listinfo/amber-developers <http://lists.ambermd.org/mailman/listinfo/amber-developers>
_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers
Received on Sun Nov 05 2017 - 14:30:02 PST