[SOLVED] Slackware64 current (15.0) OpenCL added to stock Mesa amdgpu

kingbeowulf · 04-07-2020, 01:42 AM

Slackware is missing the libraries for OpenCL GPU compute. These are available either via AMDGPU-PRO (precompiled binaries) or from ROCm. AMD only provides amdgpu-pro installers only for a few Linux distributions via their native package managers, resulting in complex and convoluted installation (ugh), as well as having other issues; while ROCm build system is hideously complex and also only supports a few linux distributions. Also, ROCm does not yet officially support newer AMD NAVI GPUs.

What follows is an attempt to blend in just the OpenCL components to the existing Slackware Mesa, amdgpu and libOpenCL.so components. I think I have it working - at least BOINC OpenCL projects seem to work on my RX 5700 XT. I'm not sure yet if the RX590 is working with OpenCL.

For anyone who wishes to help me test, etc. here is the slackbuild script and AMDGPU-PRO Ubuntu archive:
http://www.linuxgalaxy.org/files/sbo...amdgpu-opencl/

The script is loosely based on bassmadrigal's amdgpu-pro driver slackbuild:
*UPDATE: fixed slack-desc issue*
*UPDATE: upgraded for amdgpu-pro 20.10 driver*

Code:

#!/bin/sh

# Slackware build script for amdgpu-opencl-driver
#
# Copyright 2020  Edward W. Koenig <kingbeowulf -at- gmail.com>
# All rights reserved.
#
# Redistribution and use of this script, with or without modification, is
# permitted provided that the following conditions are met:
#
# 1. Redistributions of this script must retain the above copyright
#    notice, this list of conditions and the following disclaimer.
#
#  THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR IMPLIED
#  WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
#  MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.  IN NO
#  EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
#  SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
#  PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
#  OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
#  WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
#  OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
#  ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

PRGNAM=amdgpu-opencl
SRCNAM=amdgpu-pro
SRCVER=${SRCVER:-19.50-967956}
DISTRO=${DISTRO:-ubuntu-18.04}
VERSION=${VERSION:-19.50}
BUILD=${BUILD:-2}
TAG=${TAG:-_kb}
COMPAT32=${COMPAT32:-no}

if [ -z "$ARCH" ]; then
  case "$( uname -m )" in
    i?86) ARCH=i586 ;;
    arm*) ARCH=arm ;;
       *) ARCH=$( uname -m ) ;;
  esac
fi

CWD=$(pwd)
TMP=${TMP:-/tmp/SBo}
PKG=$TMP/package-$PRGNAM
OUTPUT=${OUTPUT:-/tmp}

if [ "$ARCH" = "i586" ]; then
  DEBARCH="i386"
  DRIARCH="i386"
  LIBDIRSUFFIX=""
elif [ "$ARCH" = "x86_64" ]; then
  DEBARCH="amd64"
  DRIARCH="x86_64"
  LIBDIRSUFFIX="64"
else
  echo "Package for $(uname -m) architecture is not available."
  exit 1
fi

set -eu

rm -rf $PKG
mkdir -p $TMP $PKG $OUTPUT
cd $TMP
rm -rf ${SRCNAM}-${SRCVER}
mkdir -p $TMP $PKG $OUTPUT $PKG/install

# Extract main tarball
tar -xvf $CWD/${SRCNAM}-${SRCVER}-${DISTRO}.tar.xz
cd $PKG

# For loop to extract all the opencl related .deb archives from the main tarball
for i in $TMP/${SRCNAM}-${SRCVER}-${DISTRO}/clinfo-${SRCNAM}_${SRCVER}_${DEBARCH}.deb \
         $TMP/${SRCNAM}-${SRCVER}-${DISTRO}/opencl-${SRCNAM}-comgr_${SRCVER}_${DEBARCH}.deb \
     $TMP/${SRCNAM}-${SRCVER}-${DISTRO}/opencl-${SRCNAM}-dev_${SRCVER}_${DEBARCH}.deb \
     $TMP/${SRCNAM}-${SRCVER}-${DISTRO}/opencl-${SRCNAM}-icd_${SRCVER}_${DEBARCH}.deb \
     $TMP/${SRCNAM}-${SRCVER}-${DISTRO}/opencl-${SRCNAM}_${SRCVER}_${DEBARCH}.deb \
     $TMP/${SRCNAM}-${SRCVER}-${DISTRO}/opencl-orca-${SRCNAM}-icd_${SRCVER}_${DEBARCH}.deb
do
  ar p $i data.tar.xz | unxz | tar xv
done

# move to sane locations
mkdir -p $PKG/usr/{bin,lib$LIBDIRSUFFIX} $PKG/usr/doc/$PRGNAM-$VERSION 
mv $PKG/opt/${SRCNAM}/bin/clinfo $PKG/usr/bin/clinfo
mv $PKG/opt/${SRCNAM}/lib/x86_64-linux-gnu/* $PKG/usr/lib$LIBDIRSUFFIX
mv $PKG/usr/share/doc/* $PKG/usr/doc/$PRGNAM-$VERSION
rm -rf $PKG/usr/share
rm -rf $PKG/opt

# Now, let's get multilib set up if the system has it
if [ "$COMPAT32" == "yes" ]; then

  # Let's set up 32-bit locations
  mkdir -p $PKG/usr/bin/32 $PKG/usr/lib
  cd $PKG
  for i in $TMP/${SRCNAM}-${SRCVER}-${DISTRO}/clinfo-${SRCNAM}_${SRCVER}_i386.deb \
       $TMP/${SRCNAM}-${SRCVER}-${DISTRO}/opencl-orca-${SRCNAM}-icd_${SRCVER}_i386.deb
  do
    ar p $i data.tar.xz | unxz | tar xv
  done

  mkdir -p $PKG/usr/bin/32 $PKG/usr/lib
  mv $PKG/opt/${SRCNAM}/bin/clinfo $PKG/usr/bin/32/clinfo
  mv $PKG/opt/${SRCNAM}/lib/i386-linux-gnu/* $PKG/usr/lib
  rm -rf $PKG/usr/share
  rm -rf $PKG/opt

fi

# Set proper permissions
chown -R root:root .
find -L . \
 \( -perm 777 -o -perm 775 -o -perm 750 -o -perm 711 -o -perm 555 \
  -o -perm 511 \) -exec chmod 755 {} \; -o \
 \( -perm 666 -o -perm 664 -o -perm 640 -o -perm 600 -o -perm 444 \
  -o -perm 440 -o -perm 400 \) -exec chmod 644 {} \;

# Strip binaries and libraries
find $PKG -print0 | xargs -0 file | grep -e "executable" -e "shared object" | grep ELF \
  | cut -f 1 -d : | xargs strip --strip-unneeded 2> /dev/null || true
  
# Add the SlackBuild to the doc directory
cat $CWD/$PRGNAM.SlackBuild > $PKG/usr/doc/$PRGNAM-$VERSION/$PRGNAM.SlackBuild

mkdir -p $PKG/install
cat $CWD/slack-desc > $PKG/install/slack-desc
cat $CWD/doinst.sh > $PKG/install/doinst.sh

cd $PKG
/sbin/makepkg -l y -c n $OUTPUT/$PRGNAM-$VERSION-$ARCH-$BUILD$TAG.${PKGTYPE:-tgz}

The clinfo output look ok except for the weird error at the end:

Code:

$ clinfo
Number of platforms:                 2
  Platform Profile:                 FULL_PROFILE
  Platform Version:                 OpenCL 2.1 AMD-APP (3004.6)
  Platform Name:                 AMD Accelerated Parallel Processing
  Platform Vendor:                 Advanced Micro Devices, Inc.
  Platform Extensions:                 cl_khr_icd cl_amd_event_callback cl_amd_offline_devices 
  Platform Profile:                 FULL_PROFILE
  Platform Version:                 OpenCL 1.1 Mesa 20.0.4
  Platform Name:                 Clover
  Platform Vendor:                 Mesa
  Platform Extensions:                 cl_khr_icd


  Platform Name:                 AMD Accelerated Parallel Processing
Number of devices:                 1
  Device Type:                     CL_DEVICE_TYPE_GPU
  Vendor ID:                     1002h
  Board name:                     AMD Radeon RX 5700 XT
  Device Topology:                 PCI[ B#5, D#0, F#0 ]
  Max compute units:                 20
  Max work items dimensions:             3
    Max work items[0]:                 1024
    Max work items[1]:                 1024
    Max work items[2]:                 1024
  Max work group size:                 256
  Preferred vector width char:             4
  Preferred vector width short:             2
  Preferred vector width int:             1
  Preferred vector width long:             1
  Preferred vector width float:             1
  Preferred vector width double:         1
  Native vector width char:             4
  Native vector width short:             2
  Native vector width int:             1
  Native vector width long:             1
  Native vector width float:             1
  Native vector width double:             1
  Max clock frequency:                 2200Mhz
  Address bits:                     64
  Max memory allocation:             4244635648
  Image support:                 Yes
  Max number of images read arguments:         128
  Max number of images write arguments:         64
  Max image 2D width:                 16384
  Max image 2D height:                 16384
  Max image 3D width:                 2048
  Max image 3D height:                 2048
  Max image 3D depth:                 2048
  Max samplers within kernel:             16
  Max size of kernel argument:             1024
  Alignment (bits) of base address:         2048
  Minimum alignment (bytes) for any datatype:     128
  Single precision floating point capability
    Denorms:                     Yes
    Quiet NaNs:                     Yes
    Round to nearest even:             Yes
    Round to zero:                 Yes
    Round to +ve and infinity:             Yes
    IEEE754-2008 fused multiply-add:         Yes
  Cache type:                     Read/Write
  Cache line size:                 64
  Cache size:                     16384
  Global memory size:                 8573157376
  Constant buffer size:                 4244635648
  Max number of constant args:             8
  Local memory type:                 Scratchpad
  Local memory size:                 65536
  Max pipe arguments:                 16
  Max pipe active reservations:             16
  Max pipe packet size:                 4244635648
  Max global variable size:             3820172032
  Max global variable preferred total size:     8573157376
  Max read/write image args:             64
  Max on device events:                 1024
  Queue on device max size:             8388608
  Max on device queues:                 1
  Queue on device preferred size:         262144
  SVM capabilities:                 
    Coarse grain buffer:             Yes
    Fine grain buffer:                 Yes
    Fine grain system:                 No
    Atomics:                     No
  Preferred platform atomic alignment:         0
  Preferred global atomic alignment:         0
  Preferred local atomic alignment:         0
  Kernel Preferred work group size multiple:     32
  Error correction support:             0
  Unified memory for Host and Device:         0
  Profiling timer resolution:             1
  Device endianess:                 Little
  Available:                     Yes
  Compiler available:                 Yes
  Execution capabilities:                 
    Execute OpenCL kernels:             Yes
    Execute native function:             No
  Queue on Host properties:                 
    Out-of-Order:                 No
    Profiling :                     Yes
  Queue on Device properties:                 
    Out-of-Order:                 Yes
    Profiling :                     Yes
  Platform ID:                     0x7f57e1283f10
  Name:                         gfx1010
  Vendor:                     Advanced Micro Devices, Inc.
  Device OpenCL C version:             OpenCL C 2.0 
  Driver version:                 3004.6 (PAL,LC)
  Profile:                     FULL_PROFILE
  Version:                     OpenCL 2.0 AMD-APP (3004.6)
  Extensions:                     cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_khr_gl_depth_images cl_amd_device_attribute_query cl_amd_media_ops cl_amd_media_ops2 cl_khr_image2d_from_buffer cl_khr_subgroups cl_khr_gl_event cl_khr_depth_images cl_khr_mipmap_image cl_khr_mipmap_image_writes cl_amd_copy_buffer_p2p 


  Platform Name:                 Clover
Number of devices:                 1
  Device Type:                     CL_DEVICE_TYPE_GPU
  Vendor ID:                     1002h
  Max compute units:                 40
  Max work items dimensions:             3
    Max work items[0]:                 256
    Max work items[1]:                 256
    Max work items[2]:                 256
  Max work group size:                 256
  Preferred vector width char:             16
  Preferred vector width short:             8
  Preferred vector width int:             4
  Preferred vector width long:             2
  Preferred vector width float:             4
  Preferred vector width double:         2
  Native vector width char:             16
  Native vector width short:             8
  Native vector width int:             4
  Native vector width long:             2
  Native vector width float:             4
  Native vector width double:             2
  Max clock frequency:                 2200Mhz
  Address bits:                     64
  Max memory allocation:             6871947673
  Image support:                 No
  Max size of kernel argument:             1024
  Alignment (bits) of base address:         32768
  Minimum alignment (bytes) for any datatype:     128
  Single precision floating point capability
    Denorms:                     No
    Quiet NaNs:                     Yes
    Round to nearest even:             Yes
    Round to zero:                 No
    Round to +ve and infinity:             No
    IEEE754-2008 fused multiply-add:         No
  Cache type:                     None
  Cache line size:                 0
  Cache size:                     0
  Global memory size:                 8589934592
  Constant buffer size:                 2147483647
  Max number of constant args:             16
  Local memory type:                 Scratchpad
  Local memory size:                 32768
ERROR: clBuildProgram(-11)

I do not know if the COMPAT32 even works. Also, there is an odd issue with the slack-desc: it's not displaying when installpkg runs.

Any testing, advice, fixes etc are welcome. I've used Nvidia GPUs for 20+ years, and I'm new to the amdgpu world.

jaos · 04-07-2020, 07:29 PM

I use a small script with a few tweaks that works great with folding@home on my 5700XT. Sorry for the formatting.

Quote:

#!/bin/bash
set -ex

prefix='amdgpu-pro-'
postfix='-ubuntu-18.04'
customname='openclcustombuild'
major='19.50'
minor='967956'
amdver='2.4.99'
sharedarch="x86_64-linux-gnu"
shared="opt/amdgpu-pro/lib/${sharedarch}"
bin="opt/amdgpu-pro/bin"
dldir="$(pwd)"
srcdir="$(pwd)/${0/.sh/}"
pkgdir="$(pwd)/${prefix}${customname}_${major}-${minor}"

if [ -e "${srcdir}" ]; then
sudo rm -rf "${srcdir}"
fi
if [ -e "${pkgdir}" ]; then
sudo rm -rf "${pkgdir}"
fi
mkdir -p "${srcdir}/opencl"
mkdir -p "${srcdir}/libdrm"
mkdir -p "${pkgdir}/install"
mkdir -p "${pkgdir}/usr/lib64/"
mkdir -p "${pkgdir}/usr/bin"
mkdir -p "${pkgdir}/opt/amdgpu/share/libdrm"

pushd "${srcdir}"
# unpack installer
tar -xf "${dldir}/${prefix}${major}-${minor}${postfix}.tar.xz"

# setup opencl
pushd "${srcdir}/opencl"
ar x "${srcdir}/${prefix}${major}-${minor}${postfix}/clinfo-amdgpu-pro_${major}-${minor}_amd64.deb"
tar -xf data.tar.xz
ar x "${srcdir}/${prefix}${major}-${minor}${postfix}/libopencl1-amdgpu-pro_${major}-${minor}_amd64.deb"
tar -xf data.tar.xz
ar x "${srcdir}/${prefix}${major}-${minor}${postfix}/opencl-amdgpu-pro-comgr_${major}-${minor}_amd64.deb"
tar -xf data.tar.xz
ar x "${srcdir}/${prefix}${major}-${minor}${postfix}/opencl-amdgpu-pro-dev_${major}-${minor}_amd64.deb"
tar -xf data.tar.xz
ar x "${srcdir}/${prefix}${major}-${minor}${postfix}/opencl-amdgpu-pro-icd_${major}-${minor}_amd64.deb"
tar -xf data.tar.xz
ar x "${srcdir}/${prefix}${major}-${minor}${postfix}/opencl-orca-amdgpu-pro-icd_${major}-${minor}_amd64.deb"
tar -xf data.tar.xz

pushd ${shared}
sed -i "s|libdrm_amdgpu|libdrm_amdgpo|g" libamdocl-orca64.so
popd
popd

# setup libdrm
pushd "${srcdir}/libdrm"
ar x "${srcdir}/${prefix}${major}-${minor}${postfix}/libdrm-amdgpu-amdgpu1_${amdver}-${minor}_amd64.deb"
tar -xf data.tar.xz
pushd ${shared/amdgpu-pro/amdgpu}
rm "libdrm_amdgpu.so.1"
mv "libdrm_amdgpu.so.1.0.0" "libdrm_amdgpo.so.1.0.0"
ln -s "libdrm_amdgpo.so.1.0.0" "libdrm_amdgpo.so.1"
popd
popd

# install to pkgdir
mv "${srcdir}/opencl/etc" "${pkgdir}/"
mv "${srcdir}/opencl/${shared}/libamdocl64.so" "${pkgdir}/usr/lib64/"
mv "${srcdir}/opencl/${shared}/libamd_comgr.so" "${pkgdir}/usr/lib64/"
mv "${srcdir}/opencl/${shared}/libamdocl-orca64.so" "${pkgdir}/usr/lib64/"
mv "${srcdir}/opencl/${shared}/libamdocl12cl64.so" "${pkgdir}/usr/lib64/"

mv "${srcdir}/opencl/${shared}/libOpenCL.so.1" "${pkgdir}/usr/lib64/"
mv "${srcdir}/opencl/${shared}/libOpenCL.so" "${pkgdir}/usr/lib64/"
mv "${srcdir}/opencl/${shared}/libcltrace.so" "${pkgdir}/usr/lib64/"
mv "${srcdir}/opencl/${bin}/clinfo" "${pkgdir}/usr/bin"

mv "${srcdir}/libdrm/${shared/amdgpu-pro/amdgpu}/libdrm_amdgpo.so.1.0.0" "${pkgdir}/usr/lib64/"
mv "${srcdir}/libdrm/${shared/amdgpu-pro/amdgpu}/libdrm_amdgpo.so.1" "${pkgdir}/usr/lib64/"

pushd "${pkgdir}/opt/amdgpu/share/libdrm"
ln -s /usr/share/libdrm/amdgpu.ids amdgpu.ids
popd

# prepare the package
pushd "${pkgdir}"
cat ${dldir}/slack-desc > install/slack-desc
popd
popd

if [ -e "${pkgdir}" ]; then
sudo chown -R root: "${pkgdir}"
pushd "${pkgdir}"
sudo /sbin/makepkg -l y -c n "${dldir}/${prefix}${customname}_${major}-${minor}.txz"
popd
rm -rf "${srcdir}"
sudo rm -rf "${pkgdir}"
fi

bassmadrigal · 04-07-2020, 07:59 PM

Quote:

Originally Posted by kingbeowulf

Also, there is an odd issue with the slack-desc: it's not displaying when installpkg runs.

I ran into this issue when I was developing the amdgpu-pro SlackBuild. There is a dash in the version string, so pkgtools sees the package name as amdgpu-opencl-19.50 and the version as 967956.

I had to add the following to the SlackBuild:

Code:

VERSION=${VERSION:-17.40_492261}
SRCVER=$(echo $VERSION | tr _ - )

And then anytime I needed to reference the source tarballs, I used SRCVER instead of VERSION.

kingbeowulf · 04-07-2020, 11:44 PM

Quote:

Originally Posted by bassmadrigal

I ran into this issue when I was developing the amdgpu-pro SlackBuild. There is a dash in the version string, so pkgtools sees the package name as amdgpu-opencl-19.50 and the version as 967956.

I had to add the following to the SlackBuild:

Code:

VERSION=${VERSION:-17.40_492261}
SRCVER=$(echo $VERSION | tr _ - )

And then anytime I needed to reference the source tarballs, I used SRCVER instead of VERSION.

well...heck...I saw that and forgot to put it back in. silly me.

kingbeowulf · 04-08-2020, 10:47 PM

Quote:

Originally Posted by jaos

I use a small script with a few tweaks that works great with folding@home on my 5700XT. Sorry for the formatting.

wow. popd/pushd I haven't done that since my Z80 assembly days.

So essentially the same. Mine's more to the style for Slackbuilds.org. Since amdgpu-pro includes the open source bits as well, I wanted to only pull in the minimal requirements. libdrm_amdgpu and amdgpu.ids are already included in Slackware-current.

kingbeowulf · 04-08-2020, 11:01 PM

So it "seems" to work, at least for BOINC. What befuddles me is the clinfo error on one box but not another. Both run the same patchlevel of Slackware64-current mulilib as of 06-Mar-2020

In the OP, the clinfo output (see above) ends with

Code:

ERROR: clBuildProgram(-11)

This is the older i7-6850K Intel X99 chipset (PCI-E 3.0) and XFX RX 5700 XT THICC Ultra (PCI-E 4.0)

On the newer Ryzen 7 3800X AMD X570 chipset (PCI-E 4.0) XFX RX 590 OC+ (PCI-E 3.0) the clinfo output is error free.

Code:

$ clinfo
Number of platforms:				 2
  Platform Profile:				 FULL_PROFILE
  Platform Version:				 OpenCL 2.1 AMD-APP (3004.6)
  Platform Name:				 AMD Accelerated Parallel Processing
  Platform Vendor:				 Advanced Micro Devices, Inc.
  Platform Extensions:				 cl_khr_icd cl_amd_event_callback cl_amd_offline_devices 
  Platform Profile:				 FULL_PROFILE
  Platform Version:				 OpenCL 1.1 Mesa 20.0.4
  Platform Name:				 Clover
  Platform Vendor:				 Mesa
  Platform Extensions:				 cl_khr_icd


  Platform Name:				 AMD Accelerated Parallel Processing
Number of devices:				 1
  Device Type:					 CL_DEVICE_TYPE_GPU
  Vendor ID:					 1002h
  Board name:					 Radeon RX 590 Series
  Device Topology:				 PCI[ B#8, D#0, F#0 ]
  Max compute units:				 36
  Max work items dimensions:			 3
    Max work items[0]:				 1024
    Max work items[1]:				 1024
    Max work items[2]:				 1024
  Max work group size:				 256
  Preferred vector width char:			 4
  Preferred vector width short:			 2
  Preferred vector width int:			 1
  Preferred vector width long:			 1
  Preferred vector width float:			 1
  Preferred vector width double:		 1
  Native vector width char:			 4
  Native vector width short:			 2
  Native vector width int:			 1
  Native vector width long:			 1
  Native vector width float:			 1
  Native vector width double:			 1
  Max clock frequency:				 1580Mhz
  Address bits:					 64
  Max memory allocation:			 4244635648
  Image support:				 Yes
  Max number of images read arguments:		 128
  Max number of images write arguments:		 8
  Max image 2D width:				 16384
  Max image 2D height:				 16384
  Max image 3D width:				 2048
  Max image 3D height:				 2048
  Max image 3D depth:				 2048
  Max samplers within kernel:			 16
  Max size of kernel argument:			 1024
  Alignment (bits) of base address:		 2048
  Minimum alignment (bytes) for any datatype:	 128
  Single precision floating point capability
    Denorms:					 No
    Quiet NaNs:					 Yes
    Round to nearest even:			 Yes
    Round to zero:				 Yes
    Round to +ve and infinity:			 Yes
    IEEE754-2008 fused multiply-add:		 Yes
  Cache type:					 Read/Write
  Cache line size:				 64
  Cache size:					 16384
  Global memory size:				 8468344832
  Constant buffer size:				 4244635648
  Max number of constant args:			 8
  Local memory type:				 Scratchpad
  Local memory size:				 32768
  Max pipe arguments:				 0
  Max pipe active reservations:			 0
  Max pipe packet size:				 0
  Max global variable size:			 0
  Max global variable preferred total size:	 0
  Max read/write image args:			 0
  Max on device events:				 0
  Queue on device max size:			 0
  Max on device queues:				 0
  Queue on device preferred size:		 0
  SVM capabilities:				 
    Coarse grain buffer:			 No
    Fine grain buffer:				 No
    Fine grain system:				 No
    Atomics:					 No
  Preferred platform atomic alignment:		 0
  Preferred global atomic alignment:		 0
  Preferred local atomic alignment:		 0
  Kernel Preferred work group size multiple:	 64
  Error correction support:			 0
  Unified memory for Host and Device:		 0
  Profiling timer resolution:			 1
  Device endianess:				 Little
  Available:					 Yes
  Compiler available:				 Yes
  Execution capabilities:				 
    Execute OpenCL kernels:			 Yes
    Execute native function:			 No
  Queue on Host properties:				 
    Out-of-Order:				 No
    Profiling :					 Yes
  Queue on Device properties:				 
    Out-of-Order:				 No
    Profiling :					 No
  Platform ID:					 0x7fc352f29e50
  Name:						 Ellesmere
  Vendor:					 Advanced Micro Devices, Inc.
  Device OpenCL C version:			 OpenCL C 1.2 
  Driver version:				 3004.6
  Profile:					 FULL_PROFILE
  Version:					 OpenCL 1.2 AMD-APP (3004.6)
  Extensions:					 cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_image2d_from_buffer cl_khr_spir cl_khr_gl_event 


  Platform Name:				 Clover
Number of devices:				 1
  Device Type:					 CL_DEVICE_TYPE_GPU
  Vendor ID:					 1002h
  Max compute units:				 36
  Max work items dimensions:			 3
    Max work items[0]:				 256
    Max work items[1]:				 256
    Max work items[2]:				 256
  Max work group size:				 256
  Preferred vector width char:			 16
  Preferred vector width short:			 8
  Preferred vector width int:			 4
  Preferred vector width long:			 2
  Preferred vector width float:			 4
  Preferred vector width double:		 2
  Native vector width char:			 16
  Native vector width short:			 8
  Native vector width int:			 4
  Native vector width long:			 2
  Native vector width float:			 4
  Native vector width double:			 2
  Max clock frequency:				 1580Mhz
  Address bits:					 64
  Max memory allocation:			 6871947673
  Image support:				 No
  Max size of kernel argument:			 1024
  Alignment (bits) of base address:		 32768
  Minimum alignment (bytes) for any datatype:	 128
  Single precision floating point capability
    Denorms:					 No
    Quiet NaNs:					 Yes
    Round to nearest even:			 Yes
    Round to zero:				 No
    Round to +ve and infinity:			 No
    IEEE754-2008 fused multiply-add:		 No
  Cache type:					 None
  Cache line size:				 0
  Cache size:					 0
  Global memory size:				 8589934592
  Constant buffer size:				 2147483647
  Max number of constant args:			 16
  Local memory type:				 Scratchpad
  Local memory size:				 32768
  Kernel Preferred work group size multiple:	 64
  Error correction support:			 0
  Unified memory for Host and Device:		 0
  Profiling timer resolution:			 0
  Device endianess:				 Little
  Available:					 Yes
  Compiler available:				 Yes
  Execution capabilities:				 
    Execute OpenCL kernels:			 Yes
    Execute native function:			 No
  Queue on Host properties:				 
    Out-of-Order:				 No
    Profiling :					 Yes
  Platform ID:					 0x7fc34a6e0900
  Name:						 Radeon RX 590 Series (POLARIS10, DRM 3.35.0, 5.4.30, LLVM 10.0.0)
  Vendor:					 AMD
  Device OpenCL C version:			 OpenCL C 1.1 
  Driver version:				 20.0.4
  Profile:					 FULL_PROFILE
  Version:					 OpenCL 1.1 Mesa 20.0.4
  Extensions:					 cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_fp64 cl_khr_fp16

Could it be a motherbord chipset PIC-E issue? I'll try swapping cards and see where the error goes.

jedrek.b · 04-13-2020, 01:47 PM

Hello, I just installed yesterday current64 (Sun Apr 12 20:02:28 UTC 2020 ). Build and installed kingbeowulf opencl package and it works. I tested on darktable 3.0.1.

here is clinfo

Code:

Number of platforms:				 2
  Platform Profile:				 FULL_PROFILE
  Platform Version:				 OpenCL 1.1 Mesa 20.0.4
  Platform Name:				 Clover
  Platform Vendor:				 Mesa
  Platform Extensions:				 cl_khr_icd
  Platform Profile:				 FULL_PROFILE
  Platform Version:				 OpenCL 2.1 AMD-APP (3004.6)
  Platform Name:				 AMD Accelerated Parallel Processing
  Platform Vendor:				 Advanced Micro Devices, Inc.
  Platform Extensions:				 cl_khr_icd cl_amd_event_callback cl_amd_offline_devices 


  Platform Name:				 Clover
Number of devices:				 1
  Device Type:					 CL_DEVICE_TYPE_GPU
  Vendor ID:					 1002h
  Max compute units:				 36
  Max work items dimensions:			 3
    Max work items[0]:				 256
    Max work items[1]:				 256
    Max work items[2]:				 256
  Max work group size:				 256
  Preferred vector width char:			 16
  Preferred vector width short:			 8
  Preferred vector width int:			 4
  Preferred vector width long:			 2
  Preferred vector width float:			 4
  Preferred vector width double:		 2
  Native vector width char:			 16
  Native vector width short:			 8
  Native vector width int:			 4
  Native vector width long:			 2
  Native vector width float:			 4
  Native vector width double:			 2
  Max clock frequency:				 1366Mhz
  Address bits:					 64
  Max memory allocation:			 6871947673
  Image support:				 No
  Max size of kernel argument:			 1024
  Alignment (bits) of base address:		 32768
  Minimum alignment (bytes) for any datatype:	 128
  Single precision floating point capability
    Denorms:					 No
    Quiet NaNs:					 Yes
    Round to nearest even:			 Yes
    Round to zero:				 No
    Round to +ve and infinity:			 No
    IEEE754-2008 fused multiply-add:		 No
  Cache type:					 None
  Cache line size:				 0
  Cache size:					 0
  Global memory size:				 8589934592
  Constant buffer size:				 2147483647
  Max number of constant args:			 16
  Local memory type:				 Scratchpad
  Local memory size:				 32768
  Kernel Preferred work group size multiple:	 64
  Error correction support:			 0
  Unified memory for Host and Device:		 0
  Profiling timer resolution:			 0
  Device endianess:				 Little
  Available:					 Yes
  Compiler available:				 Yes
  Execution capabilities:				 
    Execute OpenCL kernels:			 Yes
    Execute native function:			 No
  Queue on Host properties:				 
    Out-of-Order:				 No
    Profiling :					 Yes
  Platform ID:					 0x7f5433288900
  Name:						 Radeon RX 580 Series (POLARIS10, DRM 3.35.0, 5.4.31, LLVM 10.0.0)
  Vendor:					 AMD
  Device OpenCL C version:			 OpenCL C 1.1 
  Driver version:				 20.0.4
  Profile:					 FULL_PROFILE
  Version:					 OpenCL 1.1 Mesa 20.0.4
  Extensions:					 cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_fp64 cl_khr_fp16


  Platform Name:				 AMD Accelerated Parallel Processing
Number of devices:				 1
  Device Type:					 CL_DEVICE_TYPE_GPU
  Vendor ID:					 1002h
  Board name:					 Radeon RX 580 Series
  Device Topology:				 PCI[ B#3, D#0, F#0 ]
  Max compute units:				 36
  Max work items dimensions:			 3
    Max work items[0]:				 1024
    Max work items[1]:				 1024
    Max work items[2]:				 1024
  Max work group size:				 256
  Preferred vector width char:			 4
  Preferred vector width short:			 2
  Preferred vector width int:			 1
  Preferred vector width long:			 1
  Preferred vector width float:			 1
  Preferred vector width double:		 1
  Native vector width char:			 4
  Native vector width short:			 2
  Native vector width int:			 1
  Native vector width long:			 1
  Native vector width float:			 1
  Native vector width double:			 1
  Max clock frequency:				 1366Mhz
  Address bits:					 64
  Max memory allocation:			 4244635648
  Image support:				 Yes
  Max number of images read arguments:		 128
  Max number of images write arguments:		 8
  Max image 2D width:				 16384
  Max image 2D height:				 16384
  Max image 3D width:				 2048
  Max image 3D height:				 2048
  Max image 3D depth:				 2048
  Max samplers within kernel:			 16
  Max size of kernel argument:			 1024
  Alignment (bits) of base address:		 2048
  Minimum alignment (bytes) for any datatype:	 128
  Single precision floating point capability
    Denorms:					 No
    Quiet NaNs:					 Yes
    Round to nearest even:			 Yes
    Round to zero:				 Yes
    Round to +ve and infinity:			 Yes
    IEEE754-2008 fused multiply-add:		 Yes
  Cache type:					 Read/Write
  Cache line size:				 64
  Cache size:					 16384
  Global memory size:				 8415264768
  Constant buffer size:				 4244635648
  Max number of constant args:			 8
  Local memory type:				 Scratchpad
  Local memory size:				 32768
  Max pipe arguments:				 0
  Max pipe active reservations:			 0
  Max pipe packet size:				 0
  Max global variable size:			 0
  Max global variable preferred total size:	 0
  Max read/write image args:			 0
  Max on device events:				 0
  Queue on device max size:			 0
  Max on device queues:				 0
  Queue on device preferred size:		 0
  SVM capabilities:				 
    Coarse grain buffer:			 No
    Fine grain buffer:				 No
    Fine grain system:				 No
    Atomics:					 No
  Preferred platform atomic alignment:		 0
  Preferred global atomic alignment:		 0
  Preferred local atomic alignment:		 0
  Kernel Preferred work group size multiple:	 64
  Error correction support:			 0
  Unified memory for Host and Device:		 0
  Profiling timer resolution:			 1
  Device endianess:				 Little
  Available:					 Yes
  Compiler available:				 Yes
  Execution capabilities:				 
    Execute OpenCL kernels:			 Yes
    Execute native function:			 No
  Queue on Host properties:				 
    Out-of-Order:				 No
    Profiling :					 Yes
  Queue on Device properties:				 
    Out-of-Order:				 No
    Profiling :					 No
  Platform ID:					 0x7f530be8de50
  Name:						 Ellesmere
  Vendor:					 Advanced Micro Devices, Inc.
  Device OpenCL C version:			 OpenCL C 1.2 
  Driver version:				 3004.6
  Profile:					 FULL_PROFILE
  Version:					 OpenCL 1.2 AMD-APP (3004.6)
  Extensions:					 cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_image2d_from_buffer cl_khr_spir cl_khr_gl_event

and darktable-cltest

Code:

0.023650 [opencl_init] opencl related configuration options:
0.023697 [opencl_init] 
0.023701 [opencl_init] opencl: 1
0.023704 [opencl_init] opencl_scheduling_profile: 'very fast GPU'
0.023706 [opencl_init] opencl_library: ''
0.023709 [opencl_init] opencl_memory_requirement: 768
0.023712 [opencl_init] opencl_memory_headroom: 400
0.023715 [opencl_init] opencl_device_priority: '*/!0,*/*/*/!0,*'
0.023717 [opencl_init] opencl_mandatory_timeout: 200
0.023720 [opencl_init] opencl_size_roundup: 16
0.023722 [opencl_init] opencl_async_pixelpipe: 0
0.023724 [opencl_init] opencl_synch_cache: active module
0.023727 [opencl_init] opencl_number_event_handles: 25
0.023729 [opencl_init] opencl_micro_nap: 1000
0.023731 [opencl_init] opencl_use_pinned_memory: 0
0.023734 [opencl_init] opencl_use_cpu_devices: 0
0.023736 [opencl_init] opencl_avoid_atomics: 0
0.023738 [opencl_init] 
0.023888 [opencl_init] found opencl runtime library 'libOpenCL'
0.023922 [opencl_init] opencl library 'libOpenCL' found on your system and loaded
0.168163 [opencl_init] found 2 platforms
0.168202 [opencl_init] found 2 devices
0.168262 [opencl_init] discarding device 0 `Radeon RX 580 Series (POLARIS10, DRM 3.35.0, 5.4.31, LLVM 10.0.0)' - The OpenCL driver doesn't provide image support. See also 'clinfo' output.
0.168281 [opencl_init] device 1 `Ellesmere' supports image sizes of 16384 x 16384
0.168284 [opencl_init] device 1 `Ellesmere' allows GPU memory allocations of up to 4048MB
[opencl_init] device 1: Ellesmere 
     GLOBAL_MEM_SIZE:          8023MB
     MAX_WORK_GROUP_SIZE:      256
     MAX_WORK_ITEM_DIMENSIONS: 3
     MAX_WORK_ITEM_SIZES:      [ 1024 1024 1024 ]
     DRIVER_VERSION:           3004.6
     DEVICE_VERSION:           OpenCL 1.2 AMD-APP (3004.6)
0.274470 [opencl_init] options for OpenCL compiler: -w  -DAMD=1 -I"/usr/share/darktable/kernels"

My PC is HP Z840 Workstation with Radeon RX 580 8Gb RAM. This is huge for me, I have been using rocm compiled my self, but this way easier solution.
I played with gimp with opencl enabled and i think it works too.

kingbeowulf · 04-19-2020, 09:36 PM

just uploaded amdgpu-opencl-20.10. still works! YMMV.

kingbeowulf · 04-19-2020, 10:12 PM

I think I have the answer to clinfo kicking out an error for one of the Navi GPU (RX 5700 XT) OpenCL platforms. For amdgpu-pro, the OpenCL platform is "AMD Accelerated Parallel Processing". For the Mesa OpenCL included in Slackware-current, the OpenCL platform is "Clover". The Mesa OpenCL code is somewhat mature enough for the RX 590 GPU to have basic functionality, so that clinfo output is ok. The Mesa OpenCL implementation is missing some important functionality functions; hence, the need for amdgpu-pro (OpenCL extensions). For the RX 5700 XT, platform clover does not yet support the Navi GPU and thus the error.

The amdgpu-opencl slackbuild script provided here looks to work plenty good enough. I'll include it in Slackbuilds.org when Slackware-current goes live. Both a full amdgpu-pro or ROCm may be more complete, but this script is a lot simpler, and it does not need a reinstall/rebuild whenever Mesa or the kernel is updated!

Svoboda · 04-21-2020, 04:36 PM

Gentlemen, gave this thing a go, works on my AMD Radeon RX 560D seemingly.

Toodles !

gdiazlo · 04-25-2020, 12:09 PM

It seems to works here with an Asrock AMD Radeon RX590. Thanks for this work!

Svoboda · 04-26-2020, 08:12 AM

I did some extensive testing, works with my Vulkan games, enhances performance in a tremendous manner when I do video editing, and in non Vulkan apps, and went as far as enabling features such as TearFree rendering and AMD FreeSync on the PC, it is a an amazing success.

This allows the Asus TUF FX505DY to have a complete support by Slackware, thanks for the work man!

xor_ebx_ebx · 04-26-2020, 08:16 PM

Appears to be working on my system, with a 5700 non-XT. Tested with hashcat. I had to use the --force option because hashcat doesn't like Mesa, but was able to break a few test MD5 hashes no problem. I saw similar speeds to what I get on Windows in the benchmarks, maybe a little faster. So far, doesn't look like anything is broken

Definitely glad somebody else has done the hard work for me here

kingbeowulf · 04-27-2020, 02:26 AM

Thanks to all that took a whack at testing the script and functionality. Greatly appreciated. When it comes to hardware and drivers it's often hard for any individual to test more than a few hardware and use case examples.

larrystorch · 04-27-2020, 03:17 PM

Thank you so much kingbeowulf for making these scripts for the community. I was fearful I would never be able to use an AMD card in Blender.

After I moved the mesa.icd out of the way in /etc/OpenCL/vendors, Blender detected my card and used it with no problems