cuBLAS使用(4)
创始人
2025-05-29 05:44:46
0

在本章中,我们将介绍执行矩阵-矩阵运算的第三级基本线性代数子程序(BLAS 3)函数。

 cublasgemm()

cublasStatus_t cublasSgemm(cublasHandle_t handle,cublasOperation_t transa, cublasOperation_t transb,int m, int n, int k,const float           *alpha,const float           *A, int lda,const float           *B, int ldb,const float           *beta,float           *C, int ldc)
cublasStatus_t cublasDgemm(cublasHandle_t handle,cublasOperation_t transa, cublasOperation_t transb,int m, int n, int k,const double          *alpha,const double          *A, int lda,const double          *B, int ldb,const double          *beta,double          *C, int ldc)
cublasStatus_t cublasCgemm(cublasHandle_t handle,cublasOperation_t transa, cublasOperation_t transb,int m, int n, int k,const cuComplex       *alpha,const cuComplex       *A, int lda,const cuComplex       *B, int ldb,const cuComplex       *beta,cuComplex       *C, int ldc)
cublasStatus_t cublasZgemm(cublasHandle_t handle,cublasOperation_t transa, cublasOperation_t transb,int m, int n, int k,const cuDoubleComplex *alpha,const cuDoubleComplex *A, int lda,const cuDoubleComplex *B, int ldb,const cuDoubleComplex *beta,cuDoubleComplex *C, int ldc)
cublasStatus_t cublasHgemm(cublasHandle_t handle,cublasOperation_t transa, cublasOperation_t transb,int m, int n, int k,const __half *alpha,const __half *A, int lda,const __half *B, int ldb,const __half *beta,__half *C, int ldc)

此函数支持64位整数接口。
此函数执行矩阵-矩阵乘法

Param.

Memory

In/out

Meaning

handle

input

handle to the cuBLAS library context.

transa

input

operation op(A) that is non- or (conj.) transpose.

transb

input

operation op(B) that is non- or (conj.) transpose.

m

input

number of rows of matrix op(A) and C.

n

input

number of columns of matrix op(B) and C.

k

input

number of columns of op(A) and rows of op(B).

alpha

host or device

input

scalar used for multiplication.

A

device

input

array of dimensions lda x k with lda>=max(1,m) if transa == CUBLAS_OP_N and lda x m with lda>=max(1,k) otherwise.

lda

input

leading dimension of two-dimensional array used to store the matrix A.

B

device

input

array of dimension ldb x n with ldb>=max(1,k) if transb == CUBLAS_OP_N and ldb x k with ldb>=max(1,n) otherwise.

ldb

input

leading dimension of two-dimensional array used to store matrix B.

beta

host or device

input

scalar used for multiplication. If beta==0C does not have to be a valid input.

C

device

in/out

array of dimensions ldc x n with ldc>=max(1,m).

ldc

input

leading dimension of a two-dimensional array used to store the matrix C.

The possible error values returned by this function and their meanings are listed below.

Error Value

Meaning

CUBLAS_STATUS_SUCCESS

the operation completed successfully

CUBLAS_STATUS_NOT_INITIALIZED

the library was not initialized

CUBLAS_STATUS_INVALID_VALUE

  • If mnk < 0 or

  • if transatransb != CUBLAS_OP_NCUBLAS_OP_CCUBLAS_OP_T or

  • if lda < max(1, m) if transa == CUBLAS_OP_N and lda < max(1, k) otherwise or

  • if ldb < max(1, k) if transb == CUBLAS_OP_N and ldb < max(1, n) otherwise or

  • if ldc < max(1, m) or

  • if alphabeta == NULL or

  • C == NULL if C needs to be scaled

CUBLAS_STATUS_ARCH_MISMATCH

in the case of cublasHgemm the device does not support math in half precision.

CUBLAS_STATUS_EXECUTION_FAILED

the function failed to launch on the GPU

// CUDA runtime 库 + CUBLAS 库
#include "cuda_runtime.h"
#include "cublas_v2.h"
#include 
#include using namespace std;// 定义测试矩阵的维度
int const A_ROW = 5;
int const A_COL = 6;
int const B_ROW = 6;
int const B_COL = 7;int main()
{// 定义状态变量cublasStatus_t status;float *h_A,*h_B,*h_C;   //存储于内存中的矩阵h_A = (float*)malloc(sizeof(float)*A_ROW*A_COL);  //在内存中开辟空间h_B = (float*)malloc(sizeof(float)*B_ROW*B_COL);h_C = (float*)malloc(sizeof(float)*A_ROW*B_COL);// 为待运算矩阵的元素赋予 0-10 范围内的随机数for (int i=0; i

 cublasgemm3m()

cublasStatus_t cublasCgemm3m(cublasHandle_t handle,cublasOperation_t transa, cublasOperation_t transb,int m, int n, int k,const cuComplex       *alpha,const cuComplex       *A, int lda,const cuComplex       *B, int ldb,const cuComplex       *beta,cuComplex       *C, int ldc)
cublasStatus_t cublasZgemm3m(cublasHandle_t handle,cublasOperation_t transa, cublasOperation_t transb,int m, int n, int k,const cuDoubleComplex *alpha,const cuDoubleComplex *A, int lda,const cuDoubleComplex *B, int ldb,const cuDoubleComplex *beta,cuDoubleComplex *C, int ldc)

此函数支持64位整数接口。
此函数使用高斯复杂度降低算法执行复矩阵-矩阵乘法。这可使性能提高多达25%

therwise.

lda

input

leading dimension of two-dimensional array used to store the matrix A.

B

device

input

array of dimension ldb x n with ldb>=max(1,k) if transb == CUBLAS_OP_N and ldb x k with ldb>=max(1,n) otherwise.

ldb

input

leading dimension of two-dimensional array used to store matrix B.

beta

host or device

input

scalar used for multiplication. If beta==0C does not have to be a valid input.

C

device

in/out

array of dimensions ldc x n with ldc>=max(1,m).

ldc

input

leading dimension of a two-dimensional array used to store the matrix C.

The possible error values returned by this function and their meanings are listed below.

Error Value

Meaning

CUBLAS_STATUS_SUCCESS

the operation completed successfully

CUBLAS_STATUS_NOT_INITIALIZED

the library was not initialized

CUBLAS_STATUS_INVALID_VALUE

  • If mnk < 0 or

  • if transatransb != CUBLAS_OP_NCUBLAS_OP_CCUBLAS_OP_T or

  • if lda < max(1, m) if transa == CUBLAS_OP_N and lda < max(1, k) otherwise or

  • if ldb < max(1, k) if transb == CUBLAS_OP_N and ldb < max(1, n) otherwise or

  • if ldc < max(1, m) or

  • if alphabeta == NULL or

  • C == NULL if C needs to be scaled

CUBLAS_STATUS_ARCH_MISMATCH

the device has a compute capabilites lower than 5.0

CUBLAS_STATUS_EXECUTION_FAILED

the function failed to launch on the GPU

 cublasgemmBatched()

cublasStatus_t cublasHgemmBatched(cublasHandle_t handle,cublasOperation_t transa,cublasOperation_t transb,int m, int n, int k,const __half           *alpha,const __half           *Aarray[], int lda,const __half           *Barray[], int ldb,const __half           *beta,__half           *Carray[], int ldc,int batchCount)
cublasStatus_t cublasSgemmBatched(cublasHandle_t handle,cublasOperation_t transa,cublasOperation_t transb,int m, int n, int k,const float           *alpha,const float           *Aarray[], int lda,const float           *Barray[], int ldb,const float           *beta,float           *Carray[], int ldc,int batchCount)
cublasStatus_t cublasDgemmBatched(cublasHandle_t handle,cublasOperation_t transa,cublasOperation_t transb,int m, int n, int k,const double          *alpha,const double          *Aarray[], int lda,const double          *Barray[], int ldb,const double          *beta,double          *Carray[], int ldc,int batchCount)
cublasStatus_t cublasCgemmBatched(cublasHandle_t handle,cublasOperation_t transa,cublasOperation_t transb,int m, int n, int k,const cuComplex       *alpha,const cuComplex       *Aarray[], int lda,const cuComplex       *Barray[], int ldb,const cuComplex       *beta,cuComplex       *Carray[], int ldc,int batchCount)
cublasStatus_t cublasZgemmBatched(cublasHandle_t handle,cublasOperation_t transa,cublasOperation_t transb,int m, int n, int k,const cuDoubleComplex *alpha,const cuDoubleComplex *Aarray[], int lda,const cuDoubleComplex *Barray[], int ldb,const cuDoubleComplex *beta,cuDoubleComplex *Carray[], int ldc,int batchCount)

此函数支持64位整数接口。
此函数执行一批矩阵的矩阵-矩阵乘法。该批被认为是“均匀的,”即所有实例对于它们各自的A、B和C矩阵具有相同的维数(m,n,k)、前导维数(lda,ldb,ldc)和转置(transa,transb).批处理的每个实例的输入矩阵和输出矩阵的地址是从调用方传递给函数的指针数组中读取的。

Param.

Memory

In/out

Meaning

handle

input

handle to the cuBLAS library context.

transa

input

operation op(A[i]) that is non- or (conj.) transpose.

transb

input

operation op(B[i]) that is non- or (conj.) transpose.

m

input

number of rows of matrix op(A[i]) and C[i].

n

input

number of columns of op(B[i]) and C[i].

k

input

number of columns of op(A[i]) and rows of op(B[i]).

alpha

host or device

input

scalar used for multiplication.

Aarray

device

input

array of pointers to array, with each array of dim. lda x k with lda>=max(1,m) if transa==CUBLAS_OP_N and lda x m with lda>=max(1,k) otherwise.

All pointers must meet certain alignment criteria. Please see below for details.

lda

input

leading dimension of two-dimensional array used to store each matrix A[i].

Barray

device

input

array of pointers to array, with each array of dim. ldb x n with ldb>=max(1,k) if transb==CUBLAS_OP_N and ldb x k with ldb>=max(1,n) max(1,) otherwise.

All pointers must meet certain alignment criteria. Please see below for details.

ldb

input

leading dimension of two-dimensional array used to store each matrix B[i].

beta

host or device

input

scalar used for multiplication. If beta == 0C does not have to be a valid input.

Carray

device

in/out

array of pointers to array. It has dimensions ldc x n with ldc>=max(1,m). Matrices C[i] should not overlap; otherwise, undefined behavior is expected.

All pointers must meet certain alignment criteria. Please see below for details.

ldc

input

leading dimension of two-dimensional array used to store each matrix C[i].

batchCount

input

number of pointers contained in Aarray, Barray and Carray.

If math mode enables fast math modes when using cublasSgemmBatched(), pointers (not the pointer arrays) placed in the GPU memory must be properly aligned to avoid misaligned memory access errors. Ideally all pointers are aligned to at least 16 Bytes. Otherwise it is recommended that they meet the following rule:

  • if k%4==0 then ensure intptr_t(ptr) % 16 == 0,

The possible error values returned by this function and their meanings are listed below.

Error Value

Meaning

CUBLAS_STATUS_SUCCESS

the operation completed successfully

CUBLAS_STATUS_NOT_INITIALIZED

the library was not initialized

CUBLAS_STATUS_INVALID_VALUE

  • If mnkbatchCount < 0 or

  • if transatransb != CUBLAS_OP_NCUBLAS_OP_CCUBLAS_OP_T or

  • if lda < max(1, m) if transa == CUBLAS_OP_N and lda < max(1, k) otherwise or

  • if ldb < max(1, k) if transb == CUBLAS_OP_N and ldb < max(1, n) otherwise or

  • if ldc < max(1, m)

CUBLAS_STATUS_EXECUTION_FAILED

the function failed to launch on the GPU

CUBLAS_STATUS_ARCH_MISMATCH

cublasHgemmBatched is only supported for GPU with architecture capabilities equal or greater than 5.3

 cublasgemmStridedBatched()

cublasStatus_t cublasHgemmStridedBatched(cublasHandle_t handle,cublasOperation_t transa,cublasOperation_t transb,int m, int n, int k,const __half           *alpha,const __half           *A, int lda,long long int          strideA,const __half           *B, int ldb,long long int          strideB,const __half           *beta,__half                 *C, int ldc,long long int          strideC,int batchCount)
cublasStatus_t cublasSgemmStridedBatched(cublasHandle_t handle,cublasOperation_t transa,cublasOperation_t transb,int m, int n, int k,const float           *alpha,const float           *A, int lda,long long int          strideA,const float           *B, int ldb,long long int          strideB,const float           *beta,float                 *C, int ldc,long long int          strideC,int batchCount)
cublasStatus_t cublasDgemmStridedBatched(cublasHandle_t handle,cublasOperation_t transa,cublasOperation_t transb,int m, int n, int k,const double          *alpha,const double          *A, int lda,long long int          strideA,const double          *B, int ldb,long long int          strideB,const double          *beta,double                *C, int ldc,long long int          strideC,int batchCount)
cublasStatus_t cublasCgemmStridedBatched(cublasHandle_t handle,cublasOperation_t transa,cublasOperation_t transb,int m, int n, int k,const cuComplex       *alpha,const cuComplex       *A, int lda,long long int          strideA,const cuComplex       *B, int ldb,long long int          strideB,const cuComplex       *beta,cuComplex             *C, int ldc,long long int          strideC,int batchCount)
cublasStatus_t cublasCgemm3mStridedBatched(cublasHandle_t handle,cublasOperation_t transa,cublasOperation_t transb,int m, int n, int k,const cuComplex       *alpha,const cuComplex       *A, int lda,long long int          strideA,const cuComplex       *B, int ldb,long long int          strideB,const cuComplex       *beta,cuComplex             *C, int ldc,long long int          strideC,int batchCount)
cublasStatus_t cublasZgemmStridedBatched(cublasHandle_t handle,cublasOperation_t transa,cublasOperation_t transb,int m, int n, int k,const cuDoubleComplex *alpha,const cuDoubleComplex *A, int lda,long long int          strideA,const cuDoubleComplex *B, int ldb,long long int          strideB,const cuDoubleComplex *beta,cuDoubleComplex       *C, int ldc,long long int          strideC,int batchCount)

此函数支持64位整数接口。
此函数执行一批矩阵的矩阵-矩阵乘法。该批被认为是“均匀的,”即所有实例对于它们各自的A、B和C矩阵具有相同的维数(m,n,k)、前导维数(lda,ldb,ldc)和转置(transa,transb).批处理的每个实例的输入矩阵A、B和输出矩阵C位于相对于它们在前一实例中的位置的固定数量的元素偏移处。第一个实例中指向A、B和C矩阵的指针由用户传递给函数沿着同时传递的还有元素数量的偏移量-- strideA、strideB和strideC,它们决定了输入和输出矩阵在未来实例中的位置。

Param.

Memory

In/out

Meaning

handle

input

handle to the cuBLAS library context.

transa

input

operation op(A[i]) that is non- or (conj.) transpose.

transb

input

operation op(B[i]) that is non- or (conj.) transpose.

m

input

number of rows of matrix op(A[i]) and C[i].

n

input

number of columns of op(B[i]) and C[i].

k

input

number of columns of op(A[i]) and rows of op(B[i]).

alpha

host or device

input

scalar used for multiplication.

A

device

input

* pointer to the A matrix corresponding to the first instance of the batch, with dimensions lda x k with lda>=max(1,m) if transa==CUBLAS_OP_N and lda x m with lda>=max(1,k) otherwise.

lda

input

leading dimension of two-dimensional array used to store each matrix A[i].

strideA

input

Value of type long long int that gives the offset in number of elements between A[i] and A[i+1]

B

device

input

* pointer to the B matrix corresponding to the first instance of the batch, with dimensions ldb x n with ldb>=max(1,k) if transb==CUBLAS_OP_N and ldb x k with ldb>=max(1,n) max(1,) otherwise.

ldb

input

leading dimension of two-dimensional array used to store each matrix B[i].

strideB

input

Value of type long long int that gives the offset in number of elements between B[i] and B[i+1]

beta

host or device

input

scalar used for multiplication. If beta == 0C does not have to be a valid input.

C

device

in/out

* pointer to the C matrix corresponding to the first instance of the batch, with dimensions ldc x n with ldc>=max(1,m). Matrices C[i] should not overlap; otherwise, undefined behavior is expected.

ldc

input

leading dimension of two-dimensional array used to store each matrix C[i].

strideC

input

Value of type long long int that gives the offset in number of elements between C[i] and C[i+1]

batchCount

input

number of GEMMs to perform in the batch.

The possible error values returned by this function and their meanings are listed below.

Error Value

Meaning

CUBLAS_STATUS_SUCCESS

the operation completed successfully

CUBLAS_STATUS_NOT_INITIALIZED

the library was not initialized

CUBLAS_STATUS_INVALID_VALUE

  • If mnkbatchCount < 0 or

  • if transatransb != CUBLAS_OP_NCUBLAS_OP_CCUBLAS_OP_T or

  • if lda < max(1, m) if transa == CUBLAS_OP_N and lda < max(1, k) otherwise or

  • if ldb < max(1, k) if transb == CUBLAS_OP_N and ldb < max(1, n) otherwise or

  • if ldc < max(1, m)

CUBLAS_STATUS_EXECUTION_FAILED

the function failed to launch on the GPU

CUBLAS_STATUS_ARCH_MISMATCH

cublasHgemmStridedBatched is only supported for GPU with architecture capabilities equal or greater than 5.3

  cublassymm()

cublasStatus_t cublasSsymm(cublasHandle_t handle,cublasSideMode_t side, cublasFillMode_t uplo,int m, int n,const float           *alpha,const float           *A, int lda,const float           *B, int ldb,const float           *beta,float           *C, int ldc)
cublasStatus_t cublasDsymm(cublasHandle_t handle,cublasSideMode_t side, cublasFillMode_t uplo,int m, int n,const double          *alpha,const double          *A, int lda,const double          *B, int ldb,const double          *beta,double          *C, int ldc)
cublasStatus_t cublasCsymm(cublasHandle_t handle,cublasSideMode_t side, cublasFillMode_t uplo,int m, int n,const cuComplex       *alpha,const cuComplex       *A, int lda,const cuComplex       *B, int ldb,const cuComplex       *beta,cuComplex       *C, int ldc)
cublasStatus_t cublasZsymm(cublasHandle_t handle,cublasSideMode_t side, cublasFillMode_t uplo,int m, int n,const cuDoubleComplex *alpha,const cuDoubleComplex *A, int lda,const cuDoubleComplex *B, int ldb,const cuDoubleComplex *beta,cuDoubleComplex *C, int ldc)

此函数支持64位整数接口。
此函数执行对称矩阵-矩阵乘法

Param.

Memory

In/out

Meaning

handle

input

handle to the cuBLAS library context.

side

input

indicates if matrix A is on the left or right of B.

uplo

input

indicates if matrix A lower or upper part is stored, the other symmetric part is not referenced and is inferred from the stored elements.

m

input

number of rows of matrix C and B, with matrix A sized accordingly.

n

input

number of columns of matrix C and B, with matrix A sized accordingly.

alpha

host or device

input

scalar used for multiplication.

A

device

input

array of dimension lda x m with lda>=max(1,m) if side == CUBLAS_SIDE_LEFT and lda x n with lda>=max(1,n) otherwise.

lda

input

leading dimension of two-dimensional array used to store matrix A.

B

device

input

array of dimension ldb x n with ldb>=max(1,m).

ldb

input

leading dimension of two-dimensional array used to store matrix B.

beta

host or device

input

scalar used for multiplication, if beta == 0 then C does not have to be a valid input.

C

device

in/out

array of dimension ldc x n with ldc>=max(1,m).

ldc

input

leading dimension of two-dimensional array used to store matrix C.

The possible error values returned by this function and their meanings are listed below.

Error Value

Meaning

CUBLAS_STATUS_SUCCESS

the operation completed successfully

CUBLAS_STATUS_NOT_INITIALIZED

the library was not initialized

CUBLAS_STATUS_INVALID_VALUE

  • If mn < 0 or

  • if side != CUBLAS_SIDE_LEFTCUBLAS_SIDE_RIGHT or

  • if uplo != CUBLAS_FILL_MODE_LOWERCUBLAS_FILL_MODE_UPPER or

  • if lda < max(1, m) if side == CUBLAS_SIDE_LEFT and lda < max(1, n) otherwise or

  • if ldb < max(1, m) or

  • if ldc < max(1, m) or

  • if alpha == NULL or beta == NULL or

  • C == NULL if C needs to be scaled

CUBLAS_STATUS_EXECUTION_FAILED

the function failed to launch on the GPU

 cublassyrk()

cublasStatus_t cublasSsyrk(cublasHandle_t handle,cublasFillMode_t uplo, cublasOperation_t trans,int n, int k,const float           *alpha,const float           *A, int lda,const float           *beta,float           *C, int ldc)
cublasStatus_t cublasDsyrk(cublasHandle_t handle,cublasFillMode_t uplo, cublasOperation_t trans,int n, int k,const double          *alpha,const double          *A, int lda,const double          *beta,double          *C, int ldc)
cublasStatus_t cublasCsyrk(cublasHandle_t handle,cublasFillMode_t uplo, cublasOperation_t trans,int n, int k,const cuComplex       *alpha,const cuComplex       *A, int lda,const cuComplex       *beta,cuComplex       *C, int ldc)
cublasStatus_t cublasZsyrk(cublasHandle_t handle,cublasFillMode_t uplo, cublasOperation_t trans,int n, int k,const cuDoubleComplex *alpha,const cuDoubleComplex *A, int lda,const cuDoubleComplex *beta,cuDoubleComplex *C, int ldc)

此函数支持64位整数接口。
此函数执行对称秩- K更新

Param.

Memory

In/out

Meaning

handle

input

handle to the cuBLAS library context.

uplo

input

indicates if matrix C lower or upper part is stored, the other symmetric part is not referenced and is inferred from the stored elements.

trans

input

operation op(A) that is non- or transpose.

n

input

number of rows of matrix op(A) and C.

k

input

number of columns of matrix op(A).

alpha

host or device

input

scalar used for multiplication.

A

device

input

array of dimension lda x k with lda>=max(1,n) if trans == CUBLAS_OP_N and lda x n with lda>=max(1,k) otherwise.

lda

input

leading dimension of two-dimensional array used to store matrix A.

beta

host or device

input

scalar used for multiplication, if beta==0 then C does not have to be a valid input.

C

device

in/out

array of dimension ldc x n, with ldc>=max(1,n).

ldc

input

leading dimension of two-dimensional array used to store matrix C.

The possible error values returned by this function and their meanings are listed below.

Error Value

Meaning

CUBLAS_STATUS_SUCCESS

the operation completed successfully

CUBLAS_STATUS_NOT_INITIALIZED

the library was not initialized

CUBLAS_STATUS_INVALID_VALUE

  • If nk < 0 or

  • if trans != CUBLAS_OP_NCUBLAS_OP_CCUBLAS_OP_T or

  • if uplo != CUBLAS_FILL_MODE_LOWERCUBLAS_FILL_MODE_UPPER or

  • if lda < max(1, n) if trans == CUBLAS_OP_N and lda < max(1, k) otherwise or

  • if ldc < max(1, n) or

  • if alpha == NULL or beta == NULL or

  • C == NULL if C needs to be scaled

CUBLAS_STATUS_EXECUTION_FAILED

the function failed to launch on the GPU

cublassyr2k()

cublasStatus_t cublasSsyr2k(cublasHandle_t handle,cublasFillMode_t uplo, cublasOperation_t trans,int n, int k,const float           *alpha,const float           *A, int lda,const float           *B, int ldb,const float           *beta,float           *C, int ldc)
cublasStatus_t cublasDsyr2k(cublasHandle_t handle,cublasFillMode_t uplo, cublasOperation_t trans,int n, int k,const double          *alpha,const double          *A, int lda,const double          *B, int ldb,const double          *beta,double          *C, int ldc)
cublasStatus_t cublasCsyr2k(cublasHandle_t handle,cublasFillMode_t uplo, cublasOperation_t trans,int n, int k,const cuComplex       *alpha,const cuComplex       *A, int lda,const cuComplex       *B, int ldb,const cuComplex       *beta,cuComplex       *C, int ldc)
cublasStatus_t cublasZsyr2k(cublasHandle_t handle,cublasFillMode_t uplo, cublasOperation_t trans,int n, int k,const cuDoubleComplex *alpha,const cuDoubleComplex *A, int lda,const cuDoubleComplex *B, int ldb,const cuDoubleComplex *beta,cuDoubleComplex *C, int ldc)

Param.

Memory

In/out

Meaning

handle

input

handle to the cuBLAS library context.

uplo

input

indicates if matrix C lower or upper part, is stored, the other symmetric part is not referenced and is inferred from the stored elements.

trans

input

operation op(A) that is non- or transpose.

n

input

number of rows of matrix op(A), op(B) and C.

k

input

number of columns of matrix op(A) and op(B).

alpha

host or device

input

scalar used for multiplication.

A

device

input

array of dimension lda x k with lda>=max(1,n) if transa == CUBLAS_OP_N and lda x n with lda>=max(1,k) otherwise.

lda

input

leading dimension of two-dimensional array used to store matrix A.

B

device

input

array of dimensions ldb x k with ldb>=max(1,n) if transb == CUBLAS_OP_N and ldb x n with ldb>=max(1,k) otherwise.

ldb

input

leading dimension of two-dimensional array used to store matrix B.

beta

host or device

input

scalar used for multiplication, if beta==0, then C does not have to be a valid input.

C

device

in/out

array of dimensions ldc x n with ldc>=max(1,n).

ldc

input

leading dimension of two-dimensional array used to store matrix C.

cublastrmm()

cublasStatus_t cublasStrmm(cublasHandle_t handle,cublasSideMode_t side, cublasFillMode_t uplo,cublasOperation_t trans, cublasDiagType_t diag,int m, int n,const float           *alpha,const float           *A, int lda,const float           *B, int ldb,float                 *C, int ldc)
cublasStatus_t cublasDtrmm(cublasHandle_t handle,cublasSideMode_t side, cublasFillMode_t uplo,cublasOperation_t trans, cublasDiagType_t diag,int m, int n,const double          *alpha,const double          *A, int lda,const double          *B, int ldb,double                *C, int ldc)
cublasStatus_t cublasCtrmm(cublasHandle_t handle,cublasSideMode_t side, cublasFillMode_t uplo,cublasOperation_t trans, cublasDiagType_t diag,int m, int n,const cuComplex       *alpha,const cuComplex       *A, int lda,const cuComplex       *B, int ldb,cuComplex             *C, int ldc)
cublasStatus_t cublasZtrmm(cublasHandle_t handle,cublasSideMode_t side, cublasFillMode_t uplo,cublasOperation_t trans, cublasDiagType_t diag,int m, int n,const cuDoubleComplex *alpha,const cuDoubleComplex *A, int lda,const cuDoubleComplex *B, int ldb,cuDoubleComplex       *C, int ldc)

Param.

Memory

In/out

Meaning

handle

input

handle to the cuBLAS library context.

side

input

indicates if matrix A is on the left or right of B.

uplo

input

indicates if matrix A lower or upper part is stored, the other part is not referenced and is inferred from the stored elements.

trans

input

operation op(A) that is non- or (conj.) transpose.

diag

input

indicates if the elements on the main diagonal of matrix A are unity and should not be accessed.

m

input

number of rows of matrix B, with matrix A sized accordingly.

n

input

number of columns of matrix B, with matrix A sized accordingly.

alpha

host or device

input

scalar used for multiplication, if alpha==0 then A is not referenced and B does not have to be a valid input.

A

device

input

array of dimension lda x m with lda>=max(1,m) if side == CUBLAS_SIDE_LEFT and lda x n with lda>=max(1,n) otherwise.

lda

input

leading dimension of two-dimensional array used to store matrix A.

B

device

input

array of dimension ldb x n with ldb>=max(1,m).

ldb

input

leading dimension of two-dimensional array used to store matrix B.

C

device

in/out

array of dimension ldc x n with ldc>=max(1,m).

ldc

input

leading dimension of two-dimensional array used to store matrix C.

The possible error values returned by this function and their meanings are listed below.

Error Value

Meaning

CUBLAS_STATUS_SUCCESS

the operation completed successfully

CUBLAS_STATUS_NOT_INITIALIZED

the library was not initialized

CUBLAS_STATUS_INVALID_VALUE

  • If mn < 0 or

  • if trans != CUBLAS_OP_NCUBLAS_OP_CCUBLAS_OP_T or

  • if uplo != CUBLAS_FILL_MODE_LOWERCUBLAS_FILL_MODE_UPPER or

  • if side != CUBLAS_SIDE_LEFTCUBLAS_SIDE_RIGHT or

  • if lda < max(1, m) if side == CUBLAS_SIDE_LEFT and lda < max(1, n) otherwise or

  • if ldb < max(1, m) or

  • C == NULL if C needs to be scaled

CUBLAS_STATUS_EXECUTION_FAILED

the function failed to launch on the GPU

 cublastrsm()

cublasStatus_t cublasStrsm(cublasHandle_t handle,cublasSideMode_t side, cublasFillMode_t uplo,cublasOperation_t trans, cublasDiagType_t diag,int m, int n,const float           *alpha,const float           *A, int lda,float           *B, int ldb)
cublasStatus_t cublasDtrsm(cublasHandle_t handle,cublasSideMode_t side, cublasFillMode_t uplo,cublasOperation_t trans, cublasDiagType_t diag,int m, int n,const double          *alpha,const double          *A, int lda,double          *B, int ldb)
cublasStatus_t cublasCtrsm(cublasHandle_t handle,cublasSideMode_t side, cublasFillMode_t uplo,cublasOperation_t trans, cublasDiagType_t diag,int m, int n,const cuComplex       *alpha,const cuComplex       *A, int lda,cuComplex       *B, int ldb)
cublasStatus_t cublasZtrsm(cublasHandle_t handle,cublasSideMode_t side, cublasFillMode_t uplo,cublasOperation_t trans, cublasDiagType_t diag,int m, int n,const cuDoubleComplex *alpha,const cuDoubleComplex *A, int lda,cuDoubleComplex *B, int ldb)

Param.

Memory

In/out

Meaning

handle

input

handle to the cuBLAS library context.

side

input

indicates if matrix A is on the left or right of X.

uplo

input

indicates if matrix A lower or upper part is stored, the other part is not referenced and is inferred from the stored elements.

trans

input

operation op(A) that is non- or (conj.) transpose.

diag

input

indicates if the elements on the main diagonal of matrix A are unity and should not be accessed.

m

input

number of rows of matrix B, with matrix A sized accordingly.

n

input

number of columns of matrix B, with matrix A is sized accordingly.

alpha

host or device

input

scalar used for multiplication, if alpha==0 then A is not referenced and B does not have to be a valid input.

A

device

input

array of dimension lda x m with lda>=max(1,m) if side == CUBLAS_SIDE_LEFT and lda x n with lda>=max(1,n) otherwise.

lda

input

leading dimension of two-dimensional array used to store matrix A.

B

device

in/out

array. It has dimensions ldb x n with ldb>=max(1,m).

ldb

input

leading dimension of two-dimensional array used to store matrix B.

The possible error values returned by this function and their meanings are listed below.

Error Value

Meaning

CUBLAS_STATUS_SUCCESS

the operation completed successfully

CUBLAS_STATUS_NOT_INITIALIZED

the library was not initialized

CUBLAS_STATUS_INVALID_VALUE

  • If m < 0 or n < 0 or

  • if trans != CUBLAS_OP_NCUBLAS_OP_CCUBLAS_OP_T or

  • if uplo != CUBLAS_FILL_MODE_LOWERCUBLAS_FILL_MODE_UPPER or

  • if side != CUBLAS_SIDE_LEFTCUBLAS_SIDE_RIGHT or

  • if diag != CUBLAS_DIAG_NON_UNITCUBLAS_DIAG_UNIT or

  • if lda < max(1, m) if side == CUBLAS_SIDE_LEFT and lda < max(1, n) otherwise or

  • if ldb < max(1, m) or

  • alpha == NULL

CUBLAS_STATUS_EXECUTION_FAILED

the function failed to launch on the GPU

 cublastrsmBatched()

cublasStatus_t cublasStrsmBatched( cublasHandle_t    handle,cublasSideMode_t  side,cublasFillMode_t  uplo,cublasOperation_t trans,cublasDiagType_t  diag,int m,int n,const float *alpha,const float *const A[],int lda,float *const B[],int ldb,int batchCount);
cublasStatus_t cublasDtrsmBatched( cublasHandle_t    handle,cublasSideMode_t  side,cublasFillMode_t  uplo,cublasOperation_t trans,cublasDiagType_t  diag,int m,int n,const double *alpha,const double *const A[],int lda,double *const B[],int ldb,int batchCount);
cublasStatus_t cublasCtrsmBatched( cublasHandle_t    handle,cublasSideMode_t  side,cublasFillMode_t  uplo,cublasOperation_t trans,cublasDiagType_t  diag,int m,int n,const cuComplex *alpha,const cuComplex *const A[],int lda,cuComplex *const B[],int ldb,int batchCount);
cublasStatus_t cublasZtrsmBatched( cublasHandle_t    handle,cublasSideMode_t  side,cublasFillMode_t  uplo,cublasOperation_t trans,cublasDiagType_t  diag,int m,int n,const cuDoubleComplex *alpha,const cuDoubleComplex *const A[],int lda,cuDoubleComplex *const B[],int ldb,int batchCount);

Param.

Memory

In/out

Meaning

handle

input

handle to the cuBLAS library context.

side

input

indicates if matrix A[i] is on the left or right of X[i].

uplo

input

indicates if matrix A[i] lower or upper part is stored, the other part is not referenced and is inferred from the stored elements.

trans

input

operation op(A[i]) that is non- or (conj.) transpose.

diag

input

indicates if the elements on the main diagonal of matrix A[i] are unity and should not be accessed.

m

input

number of rows of matrix B[i], with matrix A[i] sized accordingly.

n

input

number of columns of matrix B[i], with matrix A[i] is sized accordingly.

alpha

host or device

input

scalar used for multiplication, if alpha==0 then A[i] is not referenced and B[i] does not have to be a valid input.

A

device

input

array of pointers to array, with each array of dim. lda x m with lda>=max(1,m) if side == CUBLAS_SIDE_LEFT and lda x n with lda>=max(1,n) otherwise.

lda

input

leading dimension of two-dimensional array used to store matrix A[i].

B

device

in/out

array of pointers to array, with each array of dim. ldb x n with ldb>=max(1,m). Matrices B[i] should not overlap; otherwise, undefined behavior is expected.

ldb

input

leading dimension of two-dimensional array used to store matrix B[i].

batchCount

input

number of pointers contained in A and B.

The possible error values returned by this function and their meanings are listed below.

Error Value

Meaning

CUBLAS_STATUS_SUCCESS

the operation completed successfully

CUBLAS_STATUS_NOT_INITIALIZED

the library was not initialized

CUBLAS_STATUS_INVALID_VALUE

  • If m < 0 or n < 0 or batchCount < 0 or

  • if trans != CUBLAS_OP_NCUBLAS_OP_CCUBLAS_OP_T or

  • if uplo != CUBLAS_FILL_MODE_LOWERCUBLAS_FILL_MODE_UPPER or

  • if side != CUBLAS_SIDE_LEFTCUBLAS_SIDE_RIGHT or

  • if diag != CUBLAS_DIAG_NON_UNITCUBLAS_DIAG_UNIT or

  • if lda < max(1, m) if side == CUBLAS_SIDE_LEFT and lda < max(1, n) otherwise or

  • ldb < max(1, m)

CUBLAS_STATUS_EXECUTION_FAILED

the function failed to launch on the GPU

cublashemm()

cublasStatus_t cublasChemm(cublasHandle_t handle,cublasSideMode_t side, cublasFillMode_t uplo,int m, int n,const cuComplex       *alpha,const cuComplex       *A, int lda,const cuComplex       *B, int ldb,const cuComplex       *beta,cuComplex       *C, int ldc)
cublasStatus_t cublasZhemm(cublasHandle_t handle,cublasSideMode_t side, cublasFillMode_t uplo,int m, int n,const cuDoubleComplex *alpha,const cuDoubleComplex *A, int lda,const cuDoubleComplex *B, int ldb,const cuDoubleComplex *beta,cuDoubleComplex *C, int ldc)

Param.

Memory

In/out

Meaning

handle

input

handle to the cuBLAS library context.

side

input

indicates if matrix A is on the left or right of B.

uplo

input

indicates if matrix A lower or upper part is stored, the other Hermitian part is not referenced and is inferred from the stored elements.

m

input

number of rows of matrix C and B, with matrix A sized accordingly.

n

input

number of columns of matrix C and B, with matrix A sized accordingly.

alpha

host or device

input

scalar used for multiplication.

A

device

input

array of dimension lda x m with lda>=max(1,m) if side==CUBLAS_SIDE_LEFT and lda x n with lda>=max(1,n) otherwise. The imaginary parts of the diagonal elements are assumed to be zero.

lda

input

leading dimension of two-dimensional array used to store matrix A.

B

device

input

array of dimension ldb x n with ldb>=max(1,m).

ldb

input

leading dimension of two-dimensional array used to store matrix B.

beta

input

scalar used for multiplication, if beta==0 then C does not have to be a valid input.

C

device

in/out

array of dimensions ldc x n with ldc>=max(1,m).

ldc

input

leading dimension of two-dimensional array used to store matrix C.

The possible error values returned by this function and their meanings are listed below.

Error Value

Meaning

CUBLAS_STATUS_SUCCESS

the operation completed successfully

CUBLAS_STATUS_NOT_INITIALIZED

the library was not initialized

CUBLAS_STATUS_INVALID_VALUE

  • If m < 0 or n < 0 or

  • if side != CUBLAS_SIDE_LEFTCUBLAS_SIDE_RIGHT or

  • if uplo != CUBLAS_FILL_MODE_LOWERCUBLAS_FILL_MODE_UPPER or

  • if lda < max(1, m) if side == CUBLAS_SIDE_LEFT and lda < max(1, n) otherwise or

  • if ldb < max(1, m) or

  • if ldc < max(1, m) or

  • if alpha == NULL or beta == NULL or

  • C == NULL

CUBLAS_STATUS_EXECUTION_FAILED

the function failed to launch on the GPU

 cublasherk()

cublasStatus_t cublasCherk(cublasHandle_t handle,cublasFillMode_t uplo, cublasOperation_t trans,int n, int k,const float  *alpha,const cuComplex       *A, int lda,const float  *beta,cuComplex       *C, int ldc)
cublasStatus_t cublasZherk(cublasHandle_t handle,cublasFillMode_t uplo, cublasOperation_t trans,int n, int k,const double *alpha,const cuDoubleComplex *A, int lda,const double *beta,cuDoubleComplex *C, int ldc)

Param.

Memory

In/out

Meaning

handle

input

handle to the cuBLAS library context.

uplo

input

indicates if matrix A lower or upper part is stored, the other Hermitian part is not referenced and is inferred from the stored elements.

trans

input

operation op(A) that is non- or (conj.) transpose.

n

input

number of rows of matrix op(A) and C.

k

input

number of columns of matrix op(A).

alpha

host or device

input

scalar used for multiplication.

A

device

input

array of dimension lda x k with lda>=max(1,n) if transa == CUBLAS_OP_N and lda x n with lda>=max(1,k) otherwise.

lda

input

leading dimension of two-dimensional array used to store matrix A.

beta

input

scalar used for multiplication, if beta==0 then C does not have to be a valid input.

C

device

in/out

array of dimension ldc x n, with ldc>=max(1,n). The imaginary parts of the diagonal elements are assumed and set to zero.

ldc

input

leading dimension of two-dimensional array used to store matrix C.

The possible error values returned by this function and their meanings are listed below.

Error Value

Meaning

CUBLAS_STATUS_SUCCESS

the operation completed successfully

CUBLAS_STATUS_NOT_INITIALIZED

the library was not initialized

CUBLAS_STATUS_INVALID_VALUE

  • If n < 0 or k < 0 or

  • if trans != CUBLAS_OP_NCUBLAS_OP_CCUBLAS_OP_T or

  • if uplo != CUBLAS_FILL_MODE_LOWERCUBLAS_FILL_MODE_UPPER or

  • if lda < max(1, n) if trans == CUBLAS_OP_N and lda < max(1, k) otherwise or

  • if ldc < max(1, n) or

  • if alpha == NULL or beta == NULL or

  • C == NULL

CUBLAS_STATUS_EXECUTION_FAILED

the function failed to launch on the GPU

 cublasher2k()

cublasStatus_t cublasCher2k(cublasHandle_t handle,cublasFillMode_t uplo, cublasOperation_t trans,int n, int k,const cuComplex       *alpha,const cuComplex       *A, int lda,const cuComplex       *B, int ldb,const float  *beta,cuComplex       *C, int ldc)
cublasStatus_t cublasZher2k(cublasHandle_t handle,cublasFillMode_t uplo, cublasOperation_t trans,int n, int k,const cuDoubleComplex *alpha,const cuDoubleComplex *A, int lda,const cuDoubleComplex *B, int ldb,const double *beta,cuDoubleComplex *C, int ldc)

Param.

Memory

In/out

Meaning

handle

input

handle to the cuBLAS library context.

uplo

input

indicates if matrix A lower or upper part is stored, the other Hermitian part is not referenced and is inferred from the stored elements.

trans

input

operation op(A) that is non- or (conj.) transpose.

n

input

number of rows of matrix op(A), op(B) and C.

k

input

number of columns of matrix op(A) and op(B).

alpha

host or device

input

scalar used for multiplication.

A

device

input

array of dimension lda x k with lda>=max(1,n) if transa == CUBLAS_OP_N and lda x n with lda>=max(1,k) otherwise.

lda

input

leading dimension of two-dimensional array used to store matrix A.

B

device

input

array of dimension ldb x k with ldb>=max(1,n) if transb == CUBLAS_OP_N and ldb x n with ldb>=max(1,k) otherwise.

ldb

input

leading dimension of two-dimensional array used to store matrix B.

beta

host or device

input

scalar used for multiplication, if beta==0 then C does not have to be a valid input.

C

device

in/out

array of dimension ldc x n, with ldc>=max(1,n). The imaginary parts of the diagonal elements are assumed and set to zero.

ldc

input

leading dimension of two-dimensional array used to store matrix C.

The possible error values returned by this function and their meanings are listed below.

Error Value

Meaning

CUBLAS_STATUS_SUCCESS

the operation completed successfully

CUBLAS_STATUS_NOT_INITIALIZED

the library was not initialized

CUBLAS_STATUS_INVALID_VALUE

  • If n < 0 or k < 0 or

  • if trans != CUBLAS_OP_NCUBLAS_OP_CCUBLAS_OP_T or

  • if uplo != CUBLAS_FILL_MODE_LOWERCUBLAS_FILL_MODE_UPPER or

  • if lda < max(1, n) if trans == CUBLAS_OP_N and lda < max(1, k) otherwise or

  • if ldb < max(1, n) if trans == CUBLAS_OP_N and lda < max(1, k) otherwise or

  • if ldc < max(1, n) or

  • if alpha == NULL or beta == NULL or

  • C == NULL

CUBLAS_STATUS_EXECUTION_FAILED

the function failed to launch on the GPU

 cublasherkx()

cublasStatus_t cublasCherkx(cublasHandle_t handle,cublasFillMode_t uplo, cublasOperation_t trans,int n, int k,const cuComplex       *alpha,const cuComplex       *A, int lda,const cuComplex       *B, int ldb,const float  *beta,cuComplex       *C, int ldc)
cublasStatus_t cublasZherkx(cublasHandle_t handle,cublasFillMode_t uplo, cublasOperation_t trans,int n, int k,const cuDoubleComplex *alpha,const cuDoubleComplex *A, int lda,const cuDoubleComplex *B, int ldb,const double *beta,cuDoubleComplex *C, int ldc)

Param.

Memory

In/out

Meaning

handle

input

handle to the cuBLAS library context.

uplo

input

indicates if matrix A lower or upper part is stored, the other Hermitian part is not referenced and is inferred from the stored elements.

trans

input

operation op(A) that is non- or (conj.) transpose.

n

input

number of rows of matrix op(A), op(B) and C.

k

input

number of columns of matrix op(A) and op(B).

alpha

host or device

input

scalar used for multiplication.

A

device

input

array of dimension lda x k with lda>=max(1,n) if transa == CUBLAS_OP_N and lda x n with lda>=max(1,k) otherwise.

lda

input

leading dimension of two-dimensional array used to store matrix A.

B

device

input

array of dimension ldb x k with ldb>=max(1,n) if transb == CUBLAS_OP_N and ldb x n with ldb>=max(1,k) otherwise.

ldb

input

leading dimension of two-dimensional array used to store matrix B.

beta

host or device

input

real scalar used for multiplication, if beta==0 then C does not have to be a valid input.

C

device

in/out

array of dimension ldc x n, with ldc>=max(1,n). The imaginary parts of the diagonal elements are assumed and set to zero.

ldc

input

leading dimension of two-dimensional array used to store matrix C.

The possible error values returned by this function and their meanings are listed below.

Error Value

Meaning

CUBLAS_STATUS_SUCCESS

the operation completed successfully

CUBLAS_STATUS_NOT_INITIALIZED

the library was not initialized

CUBLAS_STATUS_INVALID_VALUE

  • If n < 0 or k < 0 or

  • if trans != CUBLAS_OP_NCUBLAS_OP_CCUBLAS_OP_T or

  • if uplo != CUBLAS_FILL_MODE_LOWERCUBLAS_FILL_MODE_UPPER or

  • if lda < max(1, n) if trans == CUBLAS_OP_N and lda < max(1, k) otherwise or

  • if ldb < max(1, n) if trans == CUBLAS_OP_N and lda < max(1, k) otherwise or

  • if ldc < max(1, n) or

  • if alpha == NULL or beta == NULL or

  • C == NULL

CUBLAS_STATUS_EXECUTION_FAILED

the function failed to launch on the GPU

相关内容

热门资讯

【2023-Pytorch-检... 项目下载地址:YOLOV5交通标志识别检测数据集+代码+模型+...
JavaWeb——Repons... 响应体当中的两种的数据格式:字符和字节 Reponse响应字符数据 演示,在get请...
端午节前夕大明湖 水碧树绿美如... 齐鲁晚报·齐鲁壹点记者 周青先端午节前夕,航拍济南大明湖景区。湖水清澈如玉,湖畔树木葱绿,景色仿佛画...
Python3 内置函数 Python3 内置函数 注意:有些函数与 Python2.x 变化不大࿰...
上海地标和平饭店携手莱佛士焕新... 雅高集团与锦江国际联合宣布,享誉全球的上海地标和平饭店将开启全新华章,于2027年焕新升级为莱佛士品...
非遗川韵・狂欢里约——中国广安... 推介会现场 广安市文化广播电视和旅游局与巴西里约旅行社协会签署合作框架协议 【南美侨报特约记者陈妤...
非洲手机之王,光环不再? 传音控股2025年一季度毛利率下降至19.97%,创近年来新低 投资时间网、标点财经研究员 李路 ...
重庆和成都谁强?重庆创造了3.... 在西部崛起的版图上,重庆与成都的发展路径差异鲜明。 重庆以直辖市的战略地位和庞大经济体量稳居西部龙头...
41 openEuler搭建F... 文章目录41 openEuler搭建FTP服务器-传输文件41.1 概述41.2 连接服务器41.3...
小白学Pytorch系列--T... 小白学Pytorch系列–Torch API (10) BLAS and LAPACK Opera...
Apsara Clouder云... Apsara Clouder云计算专项技能认证:云服务器ECS入门题库备份一下...
【清水味道】小吃系列 |清水地... 地软子生长于阴湿地表上,因此,里面包裹了泥土、沙子、草叶等杂质,能否净洗地软子,直接关乎地软子烹制食...
农村娃闻着香味挖,人称“地瓜泡... 六月六,地瓜熟。七月半,地瓜烂……南方的朋友听到这句顺口溜……一定会想起一种特殊的美食” “这种美食...
喜茶摊上事儿了?新品遭吐槽,网... 随着端午节渐渐靠近,知名饮料品牌喜茶也正式官宣人气端午限定产品—— “芒椰糯米饭”正式回归。 官...
天呐!想不到香炸小酥肉这么好吃... 哇塞!外酥里嫩的香炸小酥肉,一口就爱到不行! 友友们,今天必须要给大家分享一道超级绝的香炸小酥肉!谁...
安庆胡玉美蚕豆酱:咸香微辣的百... 安庆胡玉美蚕豆酱作为传承百年的老字号调味品,凭借咸香微辣的独特风味,成为无数人餐桌上的 “万能搭档”...
进阶C语言 第七章------... 绪论         书接上回,在上章我们学习完了文件的操作这样就能方便我们去保存我们...
耐克与乐高集团联动 全新的“玩... 5月30日,耐克与乐高集团正式宣布,双方的多年全球合作计划将于今年夏天全面启动,包括即将推出的一系列...
西藏7日游路线精选,玩遍拉萨林... 西藏7日游:拉萨林芝精华环线,人均1200元解锁雪域秘境 西藏,这片被雪山与圣湖环绕的高原,以其独特...