Re: [AMBER-Developers] [AMBER] pmemd.cuda error: invalid argument launching kernel kgBuildSpecial2RestNBPreList

From: Kurt A. O'Hearn via AMBER-Developers <amber-developers.ambermd.org>
Date: Fri, 21 Mar 2025 12:56:38 -0400

Stupid question / sanity check -- is CUDA v12.4.0 compiled for ARM? 
Last time I was on the Grace Hopper nodes on MSU HPCC, only CUDA v12.1.1
was available.

P.S. What not install the latest CUDA point releases in each series to
get any bug fixes?  It only seems CUDA v11.x.0 / v12.x.0 are available,
in general.  See the CUDA archive for checking for latest releases:
https://developer.nvidia.com/cuda-toolkit-archive

That is, I'd try v12.4.1 on ARM first to see if there were any bugfixes.

Thank you.

On 3/21/25 12:27 PM, David A Case via AMBER-Developers wrote:
> On Fri, Mar 21, 2025, Gross, Craig via AMBER wrote:
>>
>> I am looking for help with a pmemd.cuda issue one of the users at our
>> computing center has found. We have built Amber using Amber24 (update 3)
>> and AmberTools24 (update 8) with CUDA 12.4.0 on an AMD EPYC 9654 (Genoa)
>> with an NVIDIA H200 GPU. This version passes all built-in tests.
>>
>> However, when the user runs their example, they only get the output:
>>
>> ```
>> Error: invalid argument launching kernel kgBuildSpecial2RestNBPreList
>> ```
>
> Thanks for the detailed bug report.  I'm cc-ing this to the Amber
> developers
> mailing list, looking especially for folks who know something about
> kgBuildSpecial2RestNBPreList.  Having a test example that fails for
> you is a
> real help.
>
> This could, of course, be H200-specific, which may limit the number of
> people that can help.  But Amber developers that have access to A100, or
> other fairly modern machines might see if the test cases fails on their
> machines.
>
> ....regards...dave case
>
>>
>> Their example works as expected on our Intel Xeon 8260 (Cascade Lake)
>> system with an NVIDA V100S GPU using Amber22 (update 5) and
>> AmberTools23 (update 6) with CUDA 12.1.1. This was configured using a
>> similar command to the one shown below.
>>
>> I have seen two other recent emails on the mailing list with this
>> same error output
>> (https://urldefense.com/v3/__http://archive.ambermd.org/202502/0046.html__;!!HXCxUKc!18BBKc29KW_J6fjnOSiWtu_Z2-CfaZ5gp62PXG3d8t6-MrNNTVqFRuOiZhILbIbWe6qIeeuoImjoo3Qp1WuMbGXnJsnr$
>> ,<https://urldefense.com/v3/__http://archive.ambermd.org/202502/0046.html__;!!HXCxUKc!18BBKc29KW_J6fjnOSiWtu_Z2-CfaZ5gp62PXG3d8t6-MrNNTVqFRuOiZhILbIbWe6qIeeuoImjoo3Qp1WuMbGXnJsnr$
>> >
>> https://urldefense.com/v3/__http://archive.ambermd.org/202501/0105.html__;!!HXCxUKc!18BBKc29KW_J6fjnOSiWtu_Z2-CfaZ5gp62PXG3d8t6-MrNNTVqFRuOiZhILbIbWe6qIeeuoImjoo3Qp1WuMbCnOsJkh$
>> ) but no resolution.
>>
>> For reference, the CMake command we used to configure Amber is copied
>> below (closely mirroring the configuration used by
>> EasyBuild<https://urldefense.com/v3/__https://github.com/easybuilders/easybuild-easyblocks/blob/d3caef14e26e1445102e0f060be0c52ce7cceab1/easybuild/easyblocks/a/amber.py*L123__;Iw!!HXCxUKc!18BBKc29KW_J6fjnOSiWtu_Z2-CfaZ5gp62PXG3d8t6-MrNNTVqFRuOiZhILbIbWe6qIeeuoImjoo3Qp1WuMbBXChAVr$
>> > which we normally use to install Amber), and I have attached the
>> list of build dependencies/versions. The failing example (donated by
>> our system's user) can be found using this Google Drive
>> link<https://urldefense.com/v3/__https://drive.google.com/file/d/1xHkgI34TY-nm-2Io8n970k9ChXCSK_ma/view?usp=sharing__;!!HXCxUKc!18BBKc29KW_J6fjnOSiWtu_Z2-CfaZ5gp62PXG3d8t6-MrNNTVqFRuOiZhILbIbWe6qIeeuoImjoo3Qp1WuMbCOzi4HY$
>> >. This example can be run with the command:
>>
>> ```
>> pmemd.cuda -O -i min.in -p input.parm7 -c input.rst7 -o output.out -r
>> output.rst7 -ref input.rst7
>> ```
>>
>> I unfortunately am only familiar with the installation side of Amber,
>> but I can discuss with our user-base if any subject-area knowledge
>> would be helpful in debugging this issue. If I can provide any other
>> information, please let me know. Thank you!
>>
>> === BEGIN CMAKE COMMAND ===
>>
>> cmake $AMBER_PREFIX/amber24_src \
>> -DCMAKE_INSTALL_PREFIX=$AMBER_PREFIX/amber24 \
>> -DCMAKE_INSTALL_LOCALSTATEDIR=$AMBER_PREFIX/amber24/var \
>> -DCMAKE_INSTALL_RUNSTATEDIR=$AMBER_PREFIX/amber24/var/run \
>> -DCMAKE_INSTALL_SYSCONFDIR=$AMBER_PREFIX/amber24/etc \
>> -DCMAKE_POLICY_DEFAULT_CMP0094=NEW \
>> -DCMAKE_VERBOSE_MAKEFILE=ON \
>> -DCMAKE_FIND_USE_PACKAGE_REGISTRY=OFF \
>> -DBOOST_ROOT=$EBROOTBOOST \
>> -DBoost_NO_SYSTEM_PATHS=ON \
>> -DMPI=FALSE \
>> -DOPENMP=TRUE \
>> -DBLA_VENDOR=FlexiBLAS \
>> -DCUDA=TRUE \
>> -DNCCL=TRUE \
>> -DDOWNLOAD_MINICONDA=FALSE \
>> -DPYTHON_EXECUTABLE=$EBROOTPYTHON/bin/python \
>> -DFORCE_EXTERNAL_LIBS='nccl;fftw;netcdf;netcdf-fortran;zlib;boost;pnetcdf'
>> \
>> -DUSE_FFT=TRUE \
>> -DCHECK_UPDATES=FALSE \
>> -DCHECK_UPDATES=FALSE \
>> -DTRUST_SYSTEM_LIBS=TRUE \
>> -DINSTALL_TESTS=TRUE \
>> -DCOMPILER=AUTO
>>
>> === END CMAKE COMMAND ===
>>
>> Best,
>> Craig Gross
>>
>
>> GCCcore/13.2.0
>> zlib/1.2.13-GCCcore-13.2.0
>> binutils/2.40-GCCcore-13.2.0
>> GCC/13.2.0
>> numactl/2.0.16-GCCcore-13.2.0
>> XZ/5.4.4-GCCcore-13.2.0
>> libxml2/2.11.5-GCCcore-13.2.0
>> libpciaccess/0.17-GCCcore-13.2.0
>> hwloc/2.9.2-GCCcore-13.2.0
>> OpenSSL/1.1
>> libevent/2.1.12-GCCcore-13.2.0
>> UCX/1.18.0-GCCcore-13.2.0
>> libfabric/1.19.0-GCCcore-13.2.0
>> PMIx/4.2.6-GCCcore-13.2.0
>> UCC/1.3.0-GCCcore-13.2.0
>> OpenMPI/4.1.6-GCC-13.2.0
>> OpenBLAS/0.3.24-GCC-13.2.0
>> FlexiBLAS/3.3.1-GCC-13.2.0
>> FFTW/3.3.10-GCC-13.2.0
>> gompi/2023b
>> FFTW.MPI/3.3.10-gompi-2023b
>> ScaLAPACK/2.2.0-gompi-2023b-fb
>> foss/2023b
>> ncurses/6.4-GCCcore-13.2.0
>> cURL/8.3.0-GCCcore-13.2.0
>> libarchive/3.7.2-GCCcore-13.2.0
>> CMake/3.27.6-GCCcore-13.2.0
>> Bison/3.8.2
>> M4/1.4.19
>> flex/2.6.4
>> make/4.4.1-GCCcore-13.2.0
>> bzip2/1.0.8-GCCcore-13.2.0
>> Tcl/8.6.13-GCCcore-13.2.0
>> SQLite/3.43.1-GCCcore-13.2.0
>> libffi/3.4.4-GCCcore-13.2.0
>> Python/3.11.5-GCCcore-13.2.0
>> gfbf/2023b
>> cffi/1.15.1-GCCcore-13.2.0
>> cryptography/41.0.5-GCCcore-13.2.0
>> virtualenv/20.24.6-GCCcore-13.2.0
>> Python-bundle-PyPI/2023.10-GCCcore-13.2.0
>> pybind11/2.11.1-GCCcore-13.2.0
>> SciPy-bundle/2023.11-gfbf-2023b
>> Perl/5.38.0-GCCcore-13.2.0
>> gzip/1.13-GCCcore-13.2.0
>> lz4/1.9.4-GCCcore-13.2.0
>> zstd/1.5.5-GCCcore-13.2.0
>> ICU/74.1-GCCcore-13.2.0
>> Boost/1.83.0-GCC-13.2.0
>> libreadline/8.2-GCCcore-13.2.0
>> libpng/1.6.40-GCCcore-13.2.0
>> Brotli/1.1.0-GCCcore-13.2.0
>> freetype/2.13.2-GCCcore-13.2.0
>> NASM/2.16.01-GCCcore-13.2.0
>> libjpeg-turbo/3.0.1-GCCcore-13.2.0
>> jbigkit/2.1-GCCcore-13.2.0
>> libdeflate/1.19-GCCcore-13.2.0
>> LibTIFF/4.6.0-GCCcore-13.2.0
>> giflib/5.2.1-GCCcore-13.2.0
>> libwebp/1.3.2-GCCcore-13.2.0
>> OpenJPEG/2.5.0-GCCcore-13.2.0
>> LittleCMS/2.15-GCCcore-13.2.0
>> Pillow/10.2.0-GCCcore-13.2.0
>> Qhull/2020.2-GCCcore-13.2.0
>> matplotlib/3.8.2-gfbf-2023b
>> Szip/2.1.1-GCCcore-13.2.0
>> HDF5/1.14.3-gompi-2023b
>> netCDF/4.9.2-gompi-2023b
>> netCDF-Fortran/4.6.1-gompi-2023b
>> PnetCDF/1.12.3-gompi-2023b
>> Tk/8.6.13-GCCcore-13.2.0
>> Tkinter/3.11.5-GCCcore-13.2.0
>> expat/2.5.0-GCCcore-13.2.0
>> util-linux/2.39-GCCcore-13.2.0
>> fontconfig/2.14.2-GCCcore-13.2.0
>> xorg-macros/1.20.0-GCCcore-13.2.0
>> X11/20231019-GCCcore-13.2.0
>> CUDA/12.4.0
>> NCCL/2.12.12-GCCcore-13.2.0-CUDA-12.4.0
>> GDRCopy/2.4-GCCcore-13.2.0
>> UCX-CUDA/1.18.0-GCCcore-13.2.0-CUDA-12.4.0
>
>
> _______________________________________________
> AMBER-Developers mailing list
> AMBER-Developers.ambermd.org
> https://urldefense.com/v3/__http://lists.ambermd.org/mailman/listinfo/amber-developers__;!!HXCxUKc!18BBKc29KW_J6fjnOSiWtu_Z2-CfaZ5gp62PXG3d8t6-MrNNTVqFRuOiZhILbIbWe6qIeeuoImjoo3Qp1WuMbCvrc-KV$


-- 
Respectfully,
Kurt A. O'Hearn, Ph.D
Dept. of Computer Science and Engineering
Michigan State University
_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers
Received on Fri Mar 21 2025 - 10:00:03 PDT
Custom Search