GPUOcelot
Functions

Memory Management

CUDA Driver API
Collaboration diagram for Memory Management:

Functions

CUresult CUDAAPI cuMemFreeHost (void *p)
 Frees page-locked host memory.
CUresult CUDAAPI cuMemHostAlloc (void **pp, size_t bytesize, unsigned int Flags)
 Allocates page-locked host memory.
CUresult CUDAAPI cuMemHostGetFlags (unsigned int *pFlags, void *p)
 Passes back flags that were used for a pinned allocation.
CUresult CUDAAPI cuArrayDestroy (CUarray hArray)
 Destroys a CUDA array.
CUresult CUDAAPI cuMemGetInfo (size_t *free, size_t *total)
 Gets free and total memory.
CUresult CUDAAPI cuMemAlloc (CUdeviceptr *dptr, size_t bytesize)
 Allocates device memory.
CUresult CUDAAPI cuMemAllocPitch (CUdeviceptr *dptr, size_t *pPitch, size_t WidthInBytes, size_t Height, unsigned int ElementSizeBytes)
 Allocates pitched device memory.
CUresult CUDAAPI cuMemFree (CUdeviceptr dptr)
 Frees device memory.
CUresult CUDAAPI cuMemGetAddressRange (CUdeviceptr *pbase, size_t *psize, CUdeviceptr dptr)
 Get information on memory allocations.
CUresult CUDAAPI cuMemAllocHost (void **pp, size_t bytesize)
 Allocates page-locked host memory.
CUresult CUDAAPI cuMemHostGetDevicePointer (CUdeviceptr *pdptr, void *p, unsigned int Flags)
 Passes back device pointer of mapped pinned memory.
CUresult CUDAAPI cuMemcpyHtoD (CUdeviceptr dstDevice, const void *srcHost, size_t ByteCount)
 Copies memory from Host to Device.
CUresult CUDAAPI cuMemcpyDtoH (void *dstHost, CUdeviceptr srcDevice, size_t ByteCount)
 Copies memory from Device to Host.
CUresult CUDAAPI cuMemcpyDtoD (CUdeviceptr dstDevice, CUdeviceptr srcDevice, size_t ByteCount)
 Copies memory from Device to Device.
CUresult CUDAAPI cuMemcpyDtoA (CUarray dstArray, size_t dstOffset, CUdeviceptr srcDevice, size_t ByteCount)
 Copies memory from Device to Array.
CUresult CUDAAPI cuMemcpyAtoD (CUdeviceptr dstDevice, CUarray srcArray, size_t srcOffset, size_t ByteCount)
 Copies memory from Array to Device.
CUresult CUDAAPI cuMemcpyHtoA (CUarray dstArray, size_t dstOffset, const void *srcHost, size_t ByteCount)
 Copies memory from Host to Array.
CUresult CUDAAPI cuMemcpyAtoH (void *dstHost, CUarray srcArray, size_t srcOffset, size_t ByteCount)
 Copies memory from Array to Host.
CUresult CUDAAPI cuMemcpyAtoA (CUarray dstArray, size_t dstOffset, CUarray srcArray, size_t srcOffset, size_t ByteCount)
 Copies memory from Array to Array.
CUresult CUDAAPI cuMemcpy2D (const CUDA_MEMCPY2D *pCopy)
 Copies memory for 2D arrays.
CUresult CUDAAPI cuMemcpy2DUnaligned (const CUDA_MEMCPY2D *pCopy)
 Copies memory for 2D arrays.
CUresult CUDAAPI cuMemcpy3D (const CUDA_MEMCPY3D *pCopy)
 Copies memory for 3D arrays.
CUresult CUDAAPI cuMemcpyHtoDAsync (CUdeviceptr dstDevice, const void *srcHost, size_t ByteCount, CUstream hStream)
 Copies memory from Host to Device.
CUresult CUDAAPI cuMemcpyDtoHAsync (void *dstHost, CUdeviceptr srcDevice, size_t ByteCount, CUstream hStream)
 Copies memory from Device to Host.
CUresult CUDAAPI cuMemcpyDtoDAsync (CUdeviceptr dstDevice, CUdeviceptr srcDevice, size_t ByteCount, CUstream hStream)
 Copies memory from Device to Device.
CUresult CUDAAPI cuMemcpyHtoAAsync (CUarray dstArray, size_t dstOffset, const void *srcHost, size_t ByteCount, CUstream hStream)
 Copies memory from Host to Array.
CUresult CUDAAPI cuMemcpyAtoHAsync (void *dstHost, CUarray srcArray, size_t srcOffset, size_t ByteCount, CUstream hStream)
 Copies memory from Array to Host.
CUresult CUDAAPI cuMemcpy2DAsync (const CUDA_MEMCPY2D *pCopy, CUstream hStream)
 Copies memory for 2D arrays.
CUresult CUDAAPI cuMemcpy3DAsync (const CUDA_MEMCPY3D *pCopy, CUstream hStream)
 Copies memory for 3D arrays.
CUresult CUDAAPI cuMemsetD8 (CUdeviceptr dstDevice, unsigned char uc, size_t N)
 Initializes device memory.
CUresult CUDAAPI cuMemsetD16 (CUdeviceptr dstDevice, unsigned short us, size_t N)
 Initializes device memory.
CUresult CUDAAPI cuMemsetD32 (CUdeviceptr dstDevice, unsigned int ui, size_t N)
 Initializes device memory.
CUresult CUDAAPI cuMemsetD2D8 (CUdeviceptr dstDevice, size_t dstPitch, unsigned char uc, size_t Width, size_t Height)
 Initializes device memory.
CUresult CUDAAPI cuMemsetD2D16 (CUdeviceptr dstDevice, size_t dstPitch, unsigned short us, size_t Width, size_t Height)
 Initializes device memory.
CUresult CUDAAPI cuMemsetD2D32 (CUdeviceptr dstDevice, size_t dstPitch, unsigned int ui, size_t Width, size_t Height)
 Initializes device memory.
CUresult CUDAAPI cuMemsetD8Async (CUdeviceptr dstDevice, unsigned char uc, size_t N, CUstream hStream)
 Sets device memory.
CUresult CUDAAPI cuMemsetD16Async (CUdeviceptr dstDevice, unsigned short us, size_t N, CUstream hStream)
 Sets device memory.
CUresult CUDAAPI cuMemsetD32Async (CUdeviceptr dstDevice, unsigned int ui, size_t N, CUstream hStream)
 Sets device memory.
CUresult CUDAAPI cuMemsetD2D8Async (CUdeviceptr dstDevice, size_t dstPitch, unsigned char uc, size_t Width, size_t Height, CUstream hStream)
 Sets device memory.
CUresult CUDAAPI cuMemsetD2D16Async (CUdeviceptr dstDevice, size_t dstPitch, unsigned short us, size_t Width, size_t Height, CUstream hStream)
 Sets device memory.
CUresult CUDAAPI cuMemsetD2D32Async (CUdeviceptr dstDevice, size_t dstPitch, unsigned int ui, size_t Width, size_t Height, CUstream hStream)
 Sets device memory.
CUresult CUDAAPI cuArrayCreate (CUarray *pHandle, const CUDA_ARRAY_DESCRIPTOR *pAllocateArray)
 Creates a 1D or 2D CUDA array.
CUresult CUDAAPI cuArrayGetDescriptor (CUDA_ARRAY_DESCRIPTOR *pArrayDescriptor, CUarray hArray)
 Get a 1D or 2D CUDA array descriptor.
CUresult CUDAAPI cuArray3DCreate (CUarray *pHandle, const CUDA_ARRAY3D_DESCRIPTOR *pAllocateArray)
 Creates a 3D CUDA array.
CUresult CUDAAPI cuArray3DGetDescriptor (CUDA_ARRAY3D_DESCRIPTOR *pArrayDescriptor, CUarray hArray)
 Get a 3D CUDA array descriptor.

Detailed Description

This section describes the memory management functions of the low-level CUDA driver application programming interface.


Function Documentation

CUresult CUDAAPI cuArray3DCreate ( CUarray pHandle,
const CUDA_ARRAY3D_DESCRIPTOR pAllocateArray 
)

Creates a 3D CUDA array.

Creates a CUDA array according to the CUDA_ARRAY3D_DESCRIPTOR structure pAllocateArray and returns a handle to the new CUDA array in *pHandle. The CUDA_ARRAY3D_DESCRIPTOR is defined as:

    typedef struct {
        unsigned int Width;
        unsigned int Height;
        unsigned int Depth;
        CUarray_format Format;
        unsigned int NumChannels;
        unsigned int Flags;
    } CUDA_ARRAY3D_DESCRIPTOR;

where:

Here are examples of CUDA array descriptions:

Description for a CUDA array of 2048 floats:

    CUDA_ARRAY3D_DESCRIPTOR desc;
    desc.Format = CU_AD_FORMAT_FLOAT;
    desc.NumChannels = 1;
    desc.Width = 2048;
    desc.Height = 0;
    desc.Depth = 0;

Description for a 64 x 64 CUDA array of floats:

    CUDA_ARRAY3D_DESCRIPTOR desc;
    desc.Format = CU_AD_FORMAT_FLOAT;
    desc.NumChannels = 1;
    desc.Width = 64;
    desc.Height = 64;
    desc.Depth = 0;

Description for a width x height x depth CUDA array of 64-bit, 4x16-bit float16's:

    CUDA_ARRAY3D_DESCRIPTOR desc;
    desc.FormatFlags = CU_AD_FORMAT_HALF;
    desc.NumChannels = 4;
    desc.Width = width;
    desc.Height = height;
    desc.Depth = depth;
Parameters:
pHandle- Returned array
pAllocateArray- 3D array descriptor
Returns:
CUDA_SUCCESS, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED, CUDA_ERROR_INVALID_CONTEXT, CUDA_ERROR_INVALID_VALUE, CUDA_ERROR_OUT_OF_MEMORY, CUDA_ERROR_UNKNOWN
See also:
cuArray3DGetDescriptor, cuArrayCreate, cuArrayDestroy, cuArrayGetDescriptor, cuMemAlloc, cuMemAllocHost, cuMemAllocPitch, cuMemcpy2D, cuMemcpy2DAsync, cuMemcpy2DUnaligned, cuMemcpy3D, cuMemcpy3DAsync, cuMemcpyAtoA, cuMemcpyAtoD, cuMemcpyAtoH, cuMemcpyAtoHAsync, cuMemcpyDtoA, cuMemcpyDtoD, cuMemcpyDtoDAsync, cuMemcpyDtoH, cuMemcpyDtoHAsync, cuMemcpyHtoA, cuMemcpyHtoAAsync, cuMemcpyHtoD, cuMemcpyHtoDAsync, cuMemFree, cuMemFreeHost, cuMemGetAddressRange, cuMemGetInfo, cuMemHostAlloc, cuMemHostGetDevicePointer, cuMemsetD2D8, cuMemsetD2D16, cuMemsetD2D32, cuMemsetD8, cuMemsetD16, cuMemsetD32
CUresult CUDAAPI cuArray3DGetDescriptor ( CUDA_ARRAY3D_DESCRIPTOR pArrayDescriptor,
CUarray  hArray 
)

Get a 3D CUDA array descriptor.

Returns in *pArrayDescriptor a descriptor containing information on the format and dimensions of the CUDA array hArray. It is useful for subroutines that have been passed a CUDA array, but need to know the CUDA array parameters for validation or other purposes.

This function may be called on 1D and 2D arrays, in which case the Height and/or Depth members of the descriptor struct will be set to 0.

Parameters:
pArrayDescriptor- Returned 3D array descriptor
hArray- 3D array to get descriptor of
Returns:
CUDA_SUCCESS, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED, CUDA_ERROR_INVALID_CONTEXT, CUDA_ERROR_INVALID_VALUE, CUDA_ERROR_INVALID_HANDLE
See also:
cuArray3DCreate, cuArrayCreate, cuArrayDestroy, cuArrayGetDescriptor, cuMemAlloc, cuMemAllocHost, cuMemAllocPitch, cuMemcpy2D, cuMemcpy2DAsync, cuMemcpy2DUnaligned, cuMemcpy3D, cuMemcpy3DAsync, cuMemcpyAtoA, cuMemcpyAtoD, cuMemcpyAtoH, cuMemcpyAtoHAsync, cuMemcpyDtoA, cuMemcpyDtoD, cuMemcpyDtoDAsync, cuMemcpyDtoH, cuMemcpyDtoHAsync, cuMemcpyHtoA, cuMemcpyHtoAAsync, cuMemcpyHtoD, cuMemcpyHtoDAsync, cuMemFree, cuMemFreeHost, cuMemGetAddressRange, cuMemGetInfo, cuMemHostAlloc, cuMemHostGetDevicePointer, cuMemsetD2D8, cuMemsetD2D16, cuMemsetD2D32, cuMemsetD8, cuMemsetD16, cuMemsetD32
CUresult CUDAAPI cuArrayCreate ( CUarray pHandle,
const CUDA_ARRAY_DESCRIPTOR pAllocateArray 
)

Creates a 1D or 2D CUDA array.

Creates a CUDA array according to the CUDA_ARRAY_DESCRIPTOR structure pAllocateArray and returns a handle to the new CUDA array in *pHandle. The CUDA_ARRAY_DESCRIPTOR is defined as:

    typedef struct {
        unsigned int Width;
        unsigned int Height;
        CUarray_format Format;
        unsigned int NumChannels;
    } CUDA_ARRAY_DESCRIPTOR;

where:

Here are examples of CUDA array descriptions:

Description for a CUDA array of 2048 floats:

    CUDA_ARRAY_DESCRIPTOR desc;
    desc.Format = CU_AD_FORMAT_FLOAT;
    desc.NumChannels = 1;
    desc.Width = 2048;
    desc.Height = 1;

Description for a 64 x 64 CUDA array of floats:

    CUDA_ARRAY_DESCRIPTOR desc;
    desc.Format = CU_AD_FORMAT_FLOAT;
    desc.NumChannels = 1;
    desc.Width = 64;
    desc.Height = 64;

Description for a width x height CUDA array of 64-bit, 4x16-bit float16's:

    CUDA_ARRAY_DESCRIPTOR desc;
    desc.FormatFlags = CU_AD_FORMAT_HALF;
    desc.NumChannels = 4;
    desc.Width = width;
    desc.Height = height;

Description for a width x height CUDA array of 16-bit elements, each of which is two 8-bit unsigned chars:

    CUDA_ARRAY_DESCRIPTOR arrayDesc;
    desc.FormatFlags = CU_AD_FORMAT_UNSIGNED_INT8;
    desc.NumChannels = 2;
    desc.Width = width;
    desc.Height = height;
Parameters:
pHandle- Returned array
pAllocateArray- Array descriptor
Returns:
CUDA_SUCCESS, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED, CUDA_ERROR_INVALID_CONTEXT, CUDA_ERROR_INVALID_VALUE, CUDA_ERROR_OUT_OF_MEMORY, CUDA_ERROR_UNKNOWN
See also:
cuArray3DCreate, cuArray3DGetDescriptor, cuArrayDestroy, cuArrayGetDescriptor, cuMemAlloc, cuMemAllocHost, cuMemAllocPitch, cuMemcpy2D, cuMemcpy2DAsync, cuMemcpy2DUnaligned, cuMemcpy3D, cuMemcpy3DAsync, cuMemcpyAtoA, cuMemcpyAtoD, cuMemcpyAtoH, cuMemcpyAtoHAsync, cuMemcpyDtoA, cuMemcpyDtoD, cuMemcpyDtoDAsync, cuMemcpyDtoH, cuMemcpyDtoHAsync, cuMemcpyHtoA, cuMemcpyHtoAAsync, cuMemcpyHtoD, cuMemcpyHtoDAsync, cuMemFree, cuMemFreeHost, cuMemGetAddressRange, cuMemGetInfo, cuMemHostAlloc, cuMemHostGetDevicePointer, cuMemsetD2D8, cuMemsetD2D16, cuMemsetD2D32, cuMemsetD8, cuMemsetD16, cuMemsetD32
CUresult CUDAAPI cuArrayDestroy ( CUarray  hArray)
CUresult CUDAAPI cuArrayGetDescriptor ( CUDA_ARRAY_DESCRIPTOR pArrayDescriptor,
CUarray  hArray 
)

Get a 1D or 2D CUDA array descriptor.

Returns in *pArrayDescriptor a descriptor containing information on the format and dimensions of the CUDA array hArray. It is useful for subroutines that have been passed a CUDA array, but need to know the CUDA array parameters for validation or other purposes.

Parameters:
pArrayDescriptor- Returned array descriptor
hArray- Array to get descriptor of
Returns:
CUDA_SUCCESS, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED, CUDA_ERROR_INVALID_CONTEXT, CUDA_ERROR_INVALID_VALUE, CUDA_ERROR_INVALID_HANDLE
See also:
cuArray3DCreate, cuArray3DGetDescriptor, cuArrayCreate, cuArrayDestroy, cuMemAlloc, cuMemAllocHost, cuMemAllocPitch, cuMemcpy2D, cuMemcpy2DAsync, cuMemcpy2DUnaligned, cuMemcpy3D, cuMemcpy3DAsync, cuMemcpyAtoA, cuMemcpyAtoD, cuMemcpyAtoH, cuMemcpyAtoHAsync, cuMemcpyDtoA, cuMemcpyDtoD, cuMemcpyDtoDAsync, cuMemcpyDtoH, cuMemcpyDtoHAsync, cuMemcpyHtoA, cuMemcpyHtoAAsync, cuMemcpyHtoD, cuMemcpyHtoDAsync, cuMemFree, cuMemFreeHost, cuMemGetAddressRange, cuMemGetInfo, cuMemHostAlloc, cuMemHostGetDevicePointer, cuMemsetD2D8, cuMemsetD2D16, cuMemsetD2D32, cuMemsetD8, cuMemsetD16, cuMemsetD32
CUresult CUDAAPI cuMemAlloc ( CUdeviceptr dptr,
size_t  bytesize 
)
CUresult CUDAAPI cuMemAllocHost ( void **  pp,
size_t  bytesize 
)

Allocates page-locked host memory.

Allocates bytesize bytes of host memory that is page-locked and accessible to the device. The driver tracks the virtual memory ranges allocated with this function and automatically accelerates calls to functions such as ::cuMemcpy(). Since the memory can be accessed directly by the device, it can be read or written with much higher bandwidth than pageable memory obtained with functions such as malloc(). Allocating excessive amounts of memory with cuMemAllocHost() may degrade system performance, since it reduces the amount of memory available to the system for paging. As a result, this function is best used sparingly to allocate staging areas for data exchange between host and device.

Parameters:
pp- Returned host pointer to page-locked memory
bytesize- Requested allocation size in bytes
Returns:
CUDA_SUCCESS, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED, CUDA_ERROR_INVALID_CONTEXT, CUDA_ERROR_INVALID_VALUE, CUDA_ERROR_OUT_OF_MEMORY
See also:
cuArray3DCreate, cuArray3DGetDescriptor, cuArrayCreate, cuArrayDestroy, cuArrayGetDescriptor, cuMemAlloc, cuMemAllocPitch, cuMemcpy2D, cuMemcpy2DAsync, cuMemcpy2DUnaligned, cuMemcpy3D, cuMemcpy3DAsync, cuMemcpyAtoA, cuMemcpyAtoD, cuMemcpyAtoH, cuMemcpyAtoHAsync, cuMemcpyDtoA, cuMemcpyDtoD, cuMemcpyDtoDAsync, cuMemcpyDtoH, cuMemcpyDtoHAsync, cuMemcpyHtoA, cuMemcpyHtoAAsync, cuMemcpyHtoD, cuMemcpyHtoDAsync, cuMemFree, cuMemFreeHost, cuMemGetAddressRange, cuMemGetInfo, cuMemHostAlloc, cuMemHostGetDevicePointer, cuMemsetD2D8, cuMemsetD2D16, cuMemsetD2D32, cuMemsetD8, cuMemsetD16, cuMemsetD32
CUresult CUDAAPI cuMemAllocPitch ( CUdeviceptr dptr,
size_t pPitch,
size_t  WidthInBytes,
size_t  Height,
unsigned int  ElementSizeBytes 
)

Allocates pitched device memory.

Allocates at least WidthInBytes * Height bytes of linear memory on the device and returns in *dptr a pointer to the allocated memory. The function may pad the allocation to ensure that corresponding pointers in any given row will continue to meet the alignment requirements for coalescing as the address is updated from row to row. ElementSizeBytes specifies the size of the largest reads and writes that will be performed on the memory range. ElementSizeBytes may be 4, 8 or 16 (since coalesced memory transactions are not possible on other data sizes). If ElementSizeBytes is smaller than the actual read/write size of a kernel, the kernel will run correctly, but possibly at reduced speed. The pitch returned in *pPitch by cuMemAllocPitch() is the width in bytes of the allocation. The intended usage of pitch is as a separate parameter of the allocation, used to compute addresses within the 2D array. Given the row and column of an array element of type T, the address is computed as:

   T* pElement = (T*)((char*)BaseAddress + Row * Pitch) + Column;

The pitch returned by cuMemAllocPitch() is guaranteed to work with cuMemcpy2D() under all circumstances. For allocations of 2D arrays, it is recommended that programmers consider performing pitch allocations using cuMemAllocPitch(). Due to alignment restrictions in the hardware, this is especially true if the application will be performing 2D memory copies between different regions of device memory (whether linear memory or CUDA arrays).

The byte alignment of the pitch returned by cuMemAllocPitch() is guaranteed to match or exceed the alignment requirement for texture binding with cuTexRefSetAddress2D().

Parameters:
dptr- Returned device pointer
pPitch- Returned pitch of allocation in bytes
WidthInBytes- Requested allocation width in bytes
Height- Requested allocation height in rows
ElementSizeBytes- Size of largest reads/writes for range
Returns:
CUDA_SUCCESS, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED, CUDA_ERROR_INVALID_CONTEXT, CUDA_ERROR_INVALID_VALUE, CUDA_ERROR_OUT_OF_MEMORY
See also:
cuArray3DCreate, cuArray3DGetDescriptor, cuArrayCreate, cuArrayDestroy, cuArrayGetDescriptor, cuMemAlloc, cuMemAllocHost, cuMemcpy2D, cuMemcpy2DAsync, cuMemcpy2DUnaligned, cuMemcpy3D, cuMemcpy3DAsync, cuMemcpyAtoA, cuMemcpyAtoD, cuMemcpyAtoH, cuMemcpyAtoHAsync, cuMemcpyDtoA, cuMemcpyDtoD, cuMemcpyDtoDAsync, cuMemcpyDtoH, cuMemcpyDtoHAsync, cuMemcpyHtoA, cuMemcpyHtoAAsync, cuMemcpyHtoD, cuMemcpyHtoDAsync, cuMemFree, cuMemFreeHost, cuMemGetAddressRange, cuMemGetInfo, cuMemHostAlloc, cuMemHostGetDevicePointer, cuMemsetD2D8, cuMemsetD2D16, cuMemsetD2D32, cuMemsetD8, cuMemsetD16, cuMemsetD32
CUresult CUDAAPI cuMemcpy2D ( const CUDA_MEMCPY2D pCopy)

Copies memory for 2D arrays.

Perform a 2D memory copy according to the parameters specified in pCopy. The CUDA_MEMCPY2D structure is defined as:

   typedef struct CUDA_MEMCPY2D_st {
      unsigned int srcXInBytes, srcY;
      CUmemorytype srcMemoryType;
          const void *srcHost;
          CUdeviceptr srcDevice;
          CUarray srcArray;
          unsigned int srcPitch;

      unsigned int dstXInBytes, dstY;
      CUmemorytype dstMemoryType;
          void *dstHost;
          CUdeviceptr dstDevice;
          CUarray dstArray;
          unsigned int dstPitch;

      unsigned int WidthInBytes;
      unsigned int Height;
   } CUDA_MEMCPY2D;

where:

  • ::srcMemoryType and ::dstMemoryType specify the type of memory of the source and destination, respectively; CUmemorytype_enum is defined as:
If ::srcMemoryType is CU_MEMORYTYPE_HOST, ::srcHost and ::srcPitch specify the (host) base address of the source data and the bytes per row to apply. ::srcArray is ignored.
If ::srcMemoryType is CU_MEMORYTYPE_DEVICE, ::srcDevice and ::srcPitch specify the (device) base address of the source data and the bytes per row to apply. ::srcArray is ignored.
If ::srcMemoryType is CU_MEMORYTYPE_ARRAY, ::srcArray specifies the handle of the source data. ::srcHost, ::srcDevice and ::srcPitch are ignored.
If ::dstMemoryType is CU_MEMORYTYPE_HOST, ::dstHost and ::dstPitch specify the (host) base address of the destination data and the bytes per row to apply. ::dstArray is ignored.
If ::dstMemoryType is CU_MEMORYTYPE_DEVICE, ::dstDevice and ::dstPitch specify the (device) base address of the destination data and the bytes per row to apply. ::dstArray is ignored.
If ::dstMemoryType is CU_MEMORYTYPE_ARRAY, ::dstArray specifies the handle of the destination data. ::dstHost, ::dstDevice and ::dstPitch are ignored.
  • ::srcXInBytes and ::srcY specify the base address of the source data for the copy.
For host pointers, the starting address is
  void* Start = (void*)((char*)srcHost+srcY*srcPitch + srcXInBytes);
For device pointers, the starting address is
  CUdeviceptr Start = srcDevice+srcY*srcPitch+srcXInBytes;
For CUDA arrays, ::srcXInBytes must be evenly divisible by the array element size.
  • ::dstXInBytes and ::dstY specify the base address of the destination data for the copy.
For host pointers, the base address is
  void* dstStart = (void*)((char*)dstHost+dstY*dstPitch + dstXInBytes);
For device pointers, the starting address is
  CUdeviceptr dstStart = dstDevice+dstY*dstPitch+dstXInBytes;
For CUDA arrays, ::dstXInBytes must be evenly divisible by the array element size.
  • ::WidthInBytes and ::Height specify the width (in bytes) and height of the 2D copy being performed.
  • If specified, ::srcPitch must be greater than or equal to ::WidthInBytes + ::srcXInBytes, and ::dstPitch must be greater than or equal to ::WidthInBytes + dstXInBytes.
cuMemcpy2D() returns an error if any pitch is greater than the maximum allowed (CU_DEVICE_ATTRIBUTE_MAX_PITCH). cuMemAllocPitch() passes back pitches that always work with cuMemcpy2D(). On intra-device memory copies (device ? device, CUDA array ? device, CUDA array ? CUDA array), cuMemcpy2D() may fail for pitches not computed by cuMemAllocPitch(). cuMemcpy2DUnaligned() does not have this restriction, but may run significantly slower in the cases where cuMemcpy2D() would have returned an error code.
Parameters:
pCopy- Parameters for the memory copy
Returns:
CUDA_SUCCESS, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED, CUDA_ERROR_INVALID_CONTEXT, CUDA_ERROR_INVALID_VALUE
See also:
cuArray3DCreate, cuArray3DGetDescriptor, cuArrayCreate, cuArrayDestroy, cuArrayGetDescriptor, cuMemAlloc, cuMemAllocHost, cuMemAllocPitch, cuMemcpy2DAsync, cuMemcpy2DUnaligned, cuMemcpy3D, cuMemcpy3DAsync, cuMemcpyAtoA, cuMemcpyAtoD, cuMemcpyAtoH, cuMemcpyAtoHAsync, cuMemcpyDtoA, cuMemcpyDtoD, cuMemcpyDtoDAsync, cuMemcpyDtoH, cuMemcpyDtoHAsync, cuMemcpyHtoA, cuMemcpyHtoAAsync, cuMemcpyHtoD, cuMemcpyHtoDAsync, cuMemFree, cuMemFreeHost, cuMemGetAddressRange, cuMemGetInfo, cuMemHostAlloc, cuMemHostGetDevicePointer, cuMemsetD2D8, cuMemsetD2D16, cuMemsetD2D32, cuMemsetD8, cuMemsetD16, cuMemsetD32
CUresult CUDAAPI cuMemcpy2DAsync ( const CUDA_MEMCPY2D pCopy,
CUstream  hStream 
)

Copies memory for 2D arrays.

Perform a 2D memory copy according to the parameters specified in pCopy. The CUDA_MEMCPY2D structure is defined as:

   typedef struct CUDA_MEMCPY2D_st {
      unsigned int srcXInBytes, srcY;
      CUmemorytype srcMemoryType;
      const void *srcHost;
      CUdeviceptr srcDevice;
      CUarray srcArray;
      unsigned int srcPitch;
      unsigned int dstXInBytes, dstY;
      CUmemorytype dstMemoryType;
      void *dstHost;
      CUdeviceptr dstDevice;
      CUarray dstArray;
      unsigned int dstPitch;
      unsigned int WidthInBytes;
      unsigned int Height;
   } CUDA_MEMCPY2D;

where:

  • ::srcMemoryType and ::dstMemoryType specify the type of memory of the source and destination, respectively; CUmemorytype_enum is defined as:
If ::srcMemoryType is CU_MEMORYTYPE_HOST, ::srcHost and ::srcPitch specify the (host) base address of the source data and the bytes per row to apply. ::srcArray is ignored.
If ::srcMemoryType is CU_MEMORYTYPE_DEVICE, ::srcDevice and ::srcPitch specify the (device) base address of the source data and the bytes per row to apply. ::srcArray is ignored.
If ::srcMemoryType is CU_MEMORYTYPE_ARRAY, ::srcArray specifies the handle of the source data. ::srcHost, ::srcDevice and ::srcPitch are ignored.
If ::dstMemoryType is CU_MEMORYTYPE_HOST, ::dstHost and ::dstPitch specify the (host) base address of the destination data and the bytes per row to apply. ::dstArray is ignored.
If ::dstMemoryType is CU_MEMORYTYPE_DEVICE, ::dstDevice and ::dstPitch specify the (device) base address of the destination data and the bytes per row to apply. ::dstArray is ignored.
If ::dstMemoryType is CU_MEMORYTYPE_ARRAY, ::dstArray specifies the handle of the destination data. ::dstHost, ::dstDevice and ::dstPitch are ignored.
  • ::srcXInBytes and ::srcY specify the base address of the source data for the copy.
For host pointers, the starting address is
  void* Start = (void*)((char*)srcHost+srcY*srcPitch + srcXInBytes);
For device pointers, the starting address is
  CUdeviceptr Start = srcDevice+srcY*srcPitch+srcXInBytes;
For CUDA arrays, ::srcXInBytes must be evenly divisible by the array element size.
  • ::dstXInBytes and ::dstY specify the base address of the destination data for the copy.
For host pointers, the base address is
  void* dstStart = (void*)((char*)dstHost+dstY*dstPitch + dstXInBytes);
For device pointers, the starting address is
  CUdeviceptr dstStart = dstDevice+dstY*dstPitch+dstXInBytes;
For CUDA arrays, ::dstXInBytes must be evenly divisible by the array element size.
  • ::WidthInBytes and ::Height specify the width (in bytes) and height of the 2D copy being performed.
  • If specified, ::srcPitch must be greater than or equal to ::WidthInBytes + ::srcXInBytes, and ::dstPitch must be greater than or equal to ::WidthInBytes + dstXInBytes.
  • If specified, ::srcPitch must be greater than or equal to ::WidthInBytes + ::srcXInBytes, and ::dstPitch must be greater than or equal to ::WidthInBytes + dstXInBytes.
  • If specified, ::srcHeight must be greater than or equal to ::Height + ::srcY, and ::dstHeight must be greater than or equal to ::Height + ::dstY.
cuMemcpy2D() returns an error if any pitch is greater than the maximum allowed (CU_DEVICE_ATTRIBUTE_MAX_PITCH). cuMemAllocPitch() passes back pitches that always work with cuMemcpy2D(). On intra-device memory copies (device ? device, CUDA array ? device, CUDA array ? CUDA array), cuMemcpy2D() may fail for pitches not computed by cuMemAllocPitch(). cuMemcpy2DUnaligned() does not have this restriction, but may run significantly slower in the cases where cuMemcpy2D() would have returned an error code.

cuMemcpy2DAsync() is asynchronous and can optionally be associated to a stream by passing a non-zero hStream argument. It only works on page-locked host memory and returns an error if a pointer to pageable memory is passed as input.

Parameters:
pCopy- Parameters for the memory copy
hStream- Stream identifier
Returns:
CUDA_SUCCESS, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED, CUDA_ERROR_INVALID_CONTEXT, CUDA_ERROR_INVALID_VALUE
See also:
cuArray3DCreate, cuArray3DGetDescriptor, cuArrayCreate, cuArrayDestroy, cuArrayGetDescriptor, cuMemAlloc, cuMemAllocHost, cuMemAllocPitch, cuMemcpy2D, cuMemcpy2DUnaligned, cuMemcpy3D, cuMemcpy3DAsync, cuMemcpyAtoA, cuMemcpyAtoD, cuMemcpyAtoH, cuMemcpyAtoHAsync, cuMemcpyDtoA, cuMemcpyDtoD, cuMemcpyDtoDAsync, cuMemcpyDtoH, cuMemcpyDtoHAsync, cuMemcpyHtoA, cuMemcpyHtoAAsync, cuMemcpyHtoD, cuMemcpyHtoDAsync, cuMemFree, cuMemFreeHost, cuMemGetAddressRange, cuMemGetInfo, cuMemHostAlloc, cuMemHostGetDevicePointer, cuMemsetD2D8, cuMemsetD2D8Async, cuMemsetD2D16, cuMemsetD2D16Async, cuMemsetD2D32, cuMemsetD2D32Async, cuMemsetD8, cuMemsetD8Async, cuMemsetD16, cuMemsetD16Async, cuMemsetD32, cuMemsetD32Async
CUresult CUDAAPI cuMemcpy2DUnaligned ( const CUDA_MEMCPY2D pCopy)

Copies memory for 2D arrays.

Perform a 2D memory copy according to the parameters specified in pCopy. The CUDA_MEMCPY2D structure is defined as:

   typedef struct CUDA_MEMCPY2D_st {
      unsigned int srcXInBytes, srcY;
      CUmemorytype srcMemoryType;
      const void *srcHost;
      CUdeviceptr srcDevice;
      CUarray srcArray;
      unsigned int srcPitch;
      unsigned int dstXInBytes, dstY;
      CUmemorytype dstMemoryType;
      void *dstHost;
      CUdeviceptr dstDevice;
      CUarray dstArray;
      unsigned int dstPitch;
      unsigned int WidthInBytes;
      unsigned int Height;
   } CUDA_MEMCPY2D;

where:

  • ::srcMemoryType and ::dstMemoryType specify the type of memory of the source and destination, respectively; CUmemorytype_enum is defined as:
If ::srcMemoryType is CU_MEMORYTYPE_HOST, ::srcHost and ::srcPitch specify the (host) base address of the source data and the bytes per row to apply. ::srcArray is ignored.
If ::srcMemoryType is CU_MEMORYTYPE_DEVICE, ::srcDevice and ::srcPitch specify the (device) base address of the source data and the bytes per row to apply. ::srcArray is ignored.
If ::srcMemoryType is CU_MEMORYTYPE_ARRAY, ::srcArray specifies the handle of the source data. ::srcHost, ::srcDevice and ::srcPitch are ignored.
If ::dstMemoryType is CU_MEMORYTYPE_HOST, ::dstHost and ::dstPitch specify the (host) base address of the destination data and the bytes per row to apply. ::dstArray is ignored.
If ::dstMemoryType is CU_MEMORYTYPE_DEVICE, ::dstDevice and ::dstPitch specify the (device) base address of the destination data and the bytes per row to apply. ::dstArray is ignored.
If ::dstMemoryType is CU_MEMORYTYPE_ARRAY, ::dstArray specifies the handle of the destination data. ::dstHost, ::dstDevice and ::dstPitch are ignored.
  • ::srcXInBytes and ::srcY specify the base address of the source data for the copy.
For host pointers, the starting address is
  void* Start = (void*)((char*)srcHost+srcY*srcPitch + srcXInBytes);
For device pointers, the starting address is
  CUdeviceptr Start = srcDevice+srcY*srcPitch+srcXInBytes;
For CUDA arrays, ::srcXInBytes must be evenly divisible by the array element size.
  • ::dstXInBytes and ::dstY specify the base address of the destination data for the copy.
For host pointers, the base address is
  void* dstStart = (void*)((char*)dstHost+dstY*dstPitch + dstXInBytes);
For device pointers, the starting address is
  CUdeviceptr dstStart = dstDevice+dstY*dstPitch+dstXInBytes;
For CUDA arrays, ::dstXInBytes must be evenly divisible by the array element size.
  • ::WidthInBytes and ::Height specify the width (in bytes) and height of the 2D copy being performed.
  • If specified, ::srcPitch must be greater than or equal to ::WidthInBytes + ::srcXInBytes, and ::dstPitch must be greater than or equal to ::WidthInBytes + dstXInBytes.
cuMemcpy2D() returns an error if any pitch is greater than the maximum allowed (CU_DEVICE_ATTRIBUTE_MAX_PITCH). cuMemAllocPitch() passes back pitches that always work with cuMemcpy2D(). On intra-device memory copies (device ? device, CUDA array ? device, CUDA array ? CUDA array), cuMemcpy2D() may fail for pitches not computed by cuMemAllocPitch(). cuMemcpy2DUnaligned() does not have this restriction, but may run significantly slower in the cases where cuMemcpy2D() would have returned an error code.
Parameters:
pCopy- Parameters for the memory copy
Returns:
CUDA_SUCCESS, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED, CUDA_ERROR_INVALID_CONTEXT, CUDA_ERROR_INVALID_VALUE
See also:
cuArray3DCreate, cuArray3DGetDescriptor, cuArrayCreate, cuArrayDestroy, cuArrayGetDescriptor, cuMemAlloc, cuMemAllocHost, cuMemAllocPitch, cuMemcpy2D, cuMemcpy2DAsync, cuMemcpy3D, cuMemcpy3DAsync, cuMemcpyAtoA, cuMemcpyAtoD, cuMemcpyAtoH, cuMemcpyAtoHAsync, cuMemcpyDtoA, cuMemcpyDtoD, cuMemcpyDtoDAsync, cuMemcpyDtoH, cuMemcpyDtoHAsync, cuMemcpyHtoA, cuMemcpyHtoAAsync, cuMemcpyHtoD, cuMemcpyHtoDAsync, cuMemFree, cuMemFreeHost, cuMemGetAddressRange, cuMemGetInfo, cuMemHostAlloc, cuMemHostGetDevicePointer, cuMemsetD2D8, cuMemsetD2D16, cuMemsetD2D32, cuMemsetD8, cuMemsetD16, cuMemsetD32
CUresult CUDAAPI cuMemcpy3D ( const CUDA_MEMCPY3D pCopy)

Copies memory for 3D arrays.

Perform a 3D memory copy according to the parameters specified in pCopy. The CUDA_MEMCPY3D structure is defined as:

        typedef struct CUDA_MEMCPY3D_st {

            unsigned int srcXInBytes, srcY, srcZ;
            unsigned int srcLOD;
            CUmemorytype srcMemoryType;
                const void *srcHost;
                CUdeviceptr srcDevice;
                CUarray srcArray;
                unsigned int srcPitch;  // ignored when src is array
                unsigned int srcHeight; // ignored when src is array; may be 0 if Depth==1

            unsigned int dstXInBytes, dstY, dstZ;
            unsigned int dstLOD;
            CUmemorytype dstMemoryType;
                void *dstHost;
                CUdeviceptr dstDevice;
                CUarray dstArray;
                unsigned int dstPitch;  // ignored when dst is array
                unsigned int dstHeight; // ignored when dst is array; may be 0 if Depth==1

            unsigned int WidthInBytes;
            unsigned int Height;
            unsigned int Depth;
        } CUDA_MEMCPY3D;

where:

  • ::srcMemoryType and ::dstMemoryType specify the type of memory of the source and destination, respectively; CUmemorytype_enum is defined as:
If ::srcMemoryType is CU_MEMORYTYPE_HOST, ::srcHost, ::srcPitch and ::srcHeight specify the (host) base address of the source data, the bytes per row, and the height of each 2D slice of the 3D array. ::srcArray is ignored.
If ::srcMemoryType is CU_MEMORYTYPE_DEVICE, ::srcDevice, ::srcPitch and ::srcHeight specify the (device) base address of the source data, the bytes per row, and the height of each 2D slice of the 3D array. ::srcArray is ignored.
If ::srcMemoryType is CU_MEMORYTYPE_ARRAY, ::srcArray specifies the handle of the source data. ::srcHost, ::srcDevice, ::srcPitch and ::srcHeight are ignored.
If ::dstMemoryType is CU_MEMORYTYPE_HOST, ::dstHost and ::dstPitch specify the (host) base address of the destination data, the bytes per row, and the height of each 2D slice of the 3D array. ::dstArray is ignored.
If ::dstMemoryType is CU_MEMORYTYPE_DEVICE, ::dstDevice and ::dstPitch specify the (device) base address of the destination data, the bytes per row, and the height of each 2D slice of the 3D array. ::dstArray is ignored.
If ::dstMemoryType is CU_MEMORYTYPE_ARRAY, ::dstArray specifies the handle of the destination data. ::dstHost, ::dstDevice, ::dstPitch and ::dstHeight are ignored.
  • ::srcXInBytes, ::srcY and ::srcZ specify the base address of the source data for the copy.
For host pointers, the starting address is
  void* Start = (void*)((char*)srcHost+(srcZ*srcHeight+srcY)*srcPitch + srcXInBytes);
For device pointers, the starting address is
  CUdeviceptr Start = srcDevice+(srcZ*srcHeight+srcY)*srcPitch+srcXInBytes;
For CUDA arrays, ::srcXInBytes must be evenly divisible by the array element size.
  • dstXInBytes, ::dstY and ::dstZ specify the base address of the destination data for the copy.
For host pointers, the base address is
  void* dstStart = (void*)((char*)dstHost+(dstZ*dstHeight+dstY)*dstPitch + dstXInBytes);
For device pointers, the starting address is
  CUdeviceptr dstStart = dstDevice+(dstZ*dstHeight+dstY)*dstPitch+dstXInBytes;
For CUDA arrays, ::dstXInBytes must be evenly divisible by the array element size.
  • ::WidthInBytes, ::Height and ::Depth specify the width (in bytes), height and depth of the 3D copy being performed.
  • If specified, ::srcPitch must be greater than or equal to ::WidthInBytes + ::srcXInBytes, and ::dstPitch must be greater than or equal to ::WidthInBytes + dstXInBytes.
  • If specified, ::srcHeight must be greater than or equal to ::Height + ::srcY, and ::dstHeight must be greater than or equal to ::Height + ::dstY.
cuMemcpy3D() returns an error if any pitch is greater than the maximum allowed (CU_DEVICE_ATTRIBUTE_MAX_PITCH).

The ::srcLOD and ::dstLOD members of the CUDA_MEMCPY3D structure must be set to 0.

Parameters:
pCopy- Parameters for the memory copy
Returns:
CUDA_SUCCESS, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED, CUDA_ERROR_INVALID_CONTEXT, CUDA_ERROR_INVALID_VALUE
See also:
cuArray3DCreate, cuArray3DGetDescriptor, cuArrayCreate, cuArrayDestroy, cuArrayGetDescriptor, cuMemAlloc, cuMemAllocHost, cuMemAllocPitch, cuMemcpy2D, cuMemcpy2DAsync, cuMemcpy2DUnaligned, cuMemcpy3DAsync, cuMemcpyAtoA, cuMemcpyAtoD, cuMemcpyAtoH, cuMemcpyAtoHAsync, cuMemcpyDtoA, cuMemcpyDtoD, cuMemcpyDtoDAsync, cuMemcpyDtoH, cuMemcpyDtoHAsync, cuMemcpyHtoA, cuMemcpyHtoAAsync, cuMemcpyHtoD, cuMemcpyHtoDAsync, cuMemFree, cuMemFreeHost, cuMemGetAddressRange, cuMemGetInfo, cuMemHostAlloc, cuMemHostGetDevicePointer, cuMemsetD2D8, cuMemsetD2D16, cuMemsetD2D32, cuMemsetD8, cuMemsetD16, cuMemsetD32
CUresult CUDAAPI cuMemcpy3DAsync ( const CUDA_MEMCPY3D pCopy,
CUstream  hStream 
)

Copies memory for 3D arrays.

Perform a 3D memory copy according to the parameters specified in pCopy. The CUDA_MEMCPY3D structure is defined as:

        typedef struct CUDA_MEMCPY3D_st {

            unsigned int srcXInBytes, srcY, srcZ;
            unsigned int srcLOD;
            CUmemorytype srcMemoryType;
                const void *srcHost;
                CUdeviceptr srcDevice;
                CUarray srcArray;
                unsigned int srcPitch;  // ignored when src is array
                unsigned int srcHeight; // ignored when src is array; may be 0 if Depth==1

            unsigned int dstXInBytes, dstY, dstZ;
            unsigned int dstLOD;
            CUmemorytype dstMemoryType;
                void *dstHost;
                CUdeviceptr dstDevice;
                CUarray dstArray;
                unsigned int dstPitch;  // ignored when dst is array
                unsigned int dstHeight; // ignored when dst is array; may be 0 if Depth==1

            unsigned int WidthInBytes;
            unsigned int Height;
            unsigned int Depth;
        } CUDA_MEMCPY3D;

where:

  • ::srcMemoryType and ::dstMemoryType specify the type of memory of the source and destination, respectively; CUmemorytype_enum is defined as:
If ::srcMemoryType is CU_MEMORYTYPE_HOST, ::srcHost, ::srcPitch and ::srcHeight specify the (host) base address of the source data, the bytes per row, and the height of each 2D slice of the 3D array. ::srcArray is ignored.
If ::srcMemoryType is CU_MEMORYTYPE_DEVICE, ::srcDevice, ::srcPitch and ::srcHeight specify the (device) base address of the source data, the bytes per row, and the height of each 2D slice of the 3D array. ::srcArray is ignored.
If ::srcMemoryType is CU_MEMORYTYPE_ARRAY, ::srcArray specifies the handle of the source data. ::srcHost, ::srcDevice, ::srcPitch and ::srcHeight are ignored.
If ::dstMemoryType is CU_MEMORYTYPE_HOST, ::dstHost and ::dstPitch specify the (host) base address of the destination data, the bytes per row, and the height of each 2D slice of the 3D array. ::dstArray is ignored.
If ::dstMemoryType is CU_MEMORYTYPE_DEVICE, ::dstDevice and ::dstPitch specify the (device) base address of the destination data, the bytes per row, and the height of each 2D slice of the 3D array. ::dstArray is ignored.
If ::dstMemoryType is CU_MEMORYTYPE_ARRAY, ::dstArray specifies the handle of the destination data. ::dstHost, ::dstDevice, ::dstPitch and ::dstHeight are ignored.
  • ::srcXInBytes, ::srcY and ::srcZ specify the base address of the source data for the copy.
For host pointers, the starting address is
  void* Start = (void*)((char*)srcHost+(srcZ*srcHeight+srcY)*srcPitch + srcXInBytes);
For device pointers, the starting address is
  CUdeviceptr Start = srcDevice+(srcZ*srcHeight+srcY)*srcPitch+srcXInBytes;
For CUDA arrays, ::srcXInBytes must be evenly divisible by the array element size.
  • dstXInBytes, ::dstY and ::dstZ specify the base address of the destination data for the copy.
For host pointers, the base address is
  void* dstStart = (void*)((char*)dstHost+(dstZ*dstHeight+dstY)*dstPitch + dstXInBytes);
For device pointers, the starting address is
  CUdeviceptr dstStart = dstDevice+(dstZ*dstHeight+dstY)*dstPitch+dstXInBytes;
For CUDA arrays, ::dstXInBytes must be evenly divisible by the array element size.
  • ::WidthInBytes, ::Height and ::Depth specify the width (in bytes), height and depth of the 3D copy being performed.
  • If specified, ::srcPitch must be greater than or equal to ::WidthInBytes + ::srcXInBytes, and ::dstPitch must be greater than or equal to ::WidthInBytes + dstXInBytes.
  • If specified, ::srcHeight must be greater than or equal to ::Height + ::srcY, and ::dstHeight must be greater than or equal to ::Height + ::dstY.
cuMemcpy3D() returns an error if any pitch is greater than the maximum allowed (CU_DEVICE_ATTRIBUTE_MAX_PITCH).

cuMemcpy3DAsync() is asynchronous and can optionally be associated to a stream by passing a non-zero hStream argument. It only works on page-locked host memory and returns an error if a pointer to pageable memory is passed as input.

The ::srcLOD and ::dstLOD members of the CUDA_MEMCPY3D structure must be set to 0.

Parameters:
pCopy- Parameters for the memory copy
hStream- Stream identifier
Returns:
CUDA_SUCCESS, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED, CUDA_ERROR_INVALID_CONTEXT, CUDA_ERROR_INVALID_VALUE
See also:
cuArray3DCreate, cuArray3DGetDescriptor, cuArrayCreate, cuArrayDestroy, cuArrayGetDescriptor, cuMemAlloc, cuMemAllocHost, cuMemAllocPitch, cuMemcpy2D, cuMemcpy2DAsync, cuMemcpy2DUnaligned, cuMemcpy3D, cuMemcpyAtoA, cuMemcpyAtoD, cuMemcpyAtoH, cuMemcpyAtoHAsync, cuMemcpyDtoA, cuMemcpyDtoD, cuMemcpyDtoDAsync, cuMemcpyDtoH, cuMemcpyDtoHAsync, cuMemcpyHtoA, cuMemcpyHtoAAsync, cuMemcpyHtoD, cuMemcpyHtoDAsync, cuMemFree, cuMemFreeHost, cuMemGetAddressRange, cuMemGetInfo, cuMemHostAlloc, cuMemHostGetDevicePointer, cuMemsetD2D8, cuMemsetD2D8Async, cuMemsetD2D16, cuMemsetD2D16Async, cuMemsetD2D32, cuMemsetD2D32Async, cuMemsetD8, cuMemsetD8Async, cuMemsetD16, cuMemsetD16Async, cuMemsetD32, cuMemsetD32Async
CUresult CUDAAPI cuMemcpyAtoA ( CUarray  dstArray,
size_t  dstOffset,
CUarray  srcArray,
size_t  srcOffset,
size_t  ByteCount 
)

Copies memory from Array to Array.

Copies from one 1D CUDA array to another. dstArray and srcArray specify the handles of the destination and source CUDA arrays for the copy, respectively. dstOffset and srcOffset specify the destination and source offsets in bytes into the CUDA arrays. ByteCount is the number of bytes to be copied. The size of the elements in the CUDA arrays need not be the same format, but the elements must be the same size; and count must be evenly divisible by that size.

Parameters:
dstArray- Destination array
dstOffset- Offset in bytes of destination array
srcArray- Source array
srcOffset- Offset in bytes of source array
ByteCount- Size of memory copy in bytes
Returns:
CUDA_SUCCESS, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED, CUDA_ERROR_INVALID_CONTEXT, CUDA_ERROR_INVALID_VALUE
See also:
cuArray3DCreate, cuArray3DGetDescriptor, cuArrayCreate, cuArrayDestroy, cuArrayGetDescriptor, cuMemAlloc, cuMemAllocHost, cuMemAllocPitch, cuMemcpy2D, cuMemcpy2DAsync, cuMemcpy2DUnaligned, cuMemcpy3D, cuMemcpy3DAsync, cuMemcpyAtoD, cuMemcpyAtoH, cuMemcpyAtoHAsync, cuMemcpyDtoA, cuMemcpyDtoD, cuMemcpyDtoDAsync, cuMemcpyDtoH, cuMemcpyDtoHAsync, cuMemcpyHtoA, cuMemcpyHtoAAsync, cuMemcpyHtoD, cuMemcpyHtoDAsync, cuMemFree, cuMemFreeHost, cuMemGetAddressRange, cuMemGetInfo, cuMemHostAlloc, cuMemHostGetDevicePointer, cuMemsetD2D8, cuMemsetD2D16, cuMemsetD2D32, cuMemsetD8, cuMemsetD16, cuMemsetD32
CUresult CUDAAPI cuMemcpyAtoD ( CUdeviceptr  dstDevice,
CUarray  srcArray,
size_t  srcOffset,
size_t  ByteCount 
)

Copies memory from Array to Device.

Copies from one 1D CUDA array to device memory. dstDevice specifies the base pointer of the destination and must be naturally aligned with the CUDA array elements. srcArray and srcOffset specify the CUDA array handle and the offset in bytes into the array where the copy is to begin. ByteCount specifies the number of bytes to copy and must be evenly divisible by the array element size.

Parameters:
dstDevice- Destination device pointer
srcArray- Source array
srcOffset- Offset in bytes of source array
ByteCount- Size of memory copy in bytes
Returns:
CUDA_SUCCESS, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED, CUDA_ERROR_INVALID_CONTEXT, CUDA_ERROR_INVALID_VALUE
See also:
cuArray3DCreate, cuArray3DGetDescriptor, cuArrayCreate, cuArrayDestroy, cuArrayGetDescriptor, cuMemAlloc, cuMemAllocHost, cuMemAllocPitch, cuMemcpy2D, cuMemcpy2DAsync, cuMemcpy2DUnaligned, cuMemcpy3D, cuMemcpy3DAsync, cuMemcpyAtoA, cuMemcpyAtoH, cuMemcpyAtoHAsync, cuMemcpyDtoA, cuMemcpyDtoD, cuMemcpyDtoDAsync, cuMemcpyDtoH, cuMemcpyDtoHAsync, cuMemcpyHtoA, cuMemcpyHtoAAsync, cuMemcpyHtoD, cuMemcpyHtoDAsync, cuMemFree, cuMemFreeHost, cuMemGetAddressRange, cuMemGetInfo, cuMemHostAlloc, cuMemHostGetDevicePointer, cuMemsetD2D8, cuMemsetD2D16, cuMemsetD2D32, cuMemsetD8, cuMemsetD16, cuMemsetD32
CUresult CUDAAPI cuMemcpyAtoH ( void *  dstHost,
CUarray  srcArray,
size_t  srcOffset,
size_t  ByteCount 
)

Copies memory from Array to Host.

Copies from one 1D CUDA array to host memory. dstHost specifies the base pointer of the destination. srcArray and srcOffset specify the CUDA array handle and starting offset in bytes of the source data. ByteCount specifies the number of bytes to copy.

Parameters:
dstHost- Destination device pointer
srcArray- Source array
srcOffset- Offset in bytes of source array
ByteCount- Size of memory copy in bytes
Returns:
CUDA_SUCCESS, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED, CUDA_ERROR_INVALID_CONTEXT, CUDA_ERROR_INVALID_VALUE
See also:
cuArray3DCreate, cuArray3DGetDescriptor, cuArrayCreate, cuArrayDestroy, cuArrayGetDescriptor, cuMemAlloc, cuMemAllocHost, cuMemAllocPitch, cuMemcpy2D, cuMemcpy2DAsync, cuMemcpy2DUnaligned, cuMemcpy3D, cuMemcpy3DAsync, cuMemcpyAtoA, cuMemcpyAtoD, cuMemcpyAtoHAsync, cuMemcpyDtoA, cuMemcpyDtoD, cuMemcpyDtoDAsync, cuMemcpyDtoH, cuMemcpyDtoHAsync, cuMemcpyHtoA, cuMemcpyHtoAAsync, cuMemcpyHtoD, cuMemcpyHtoDAsync, cuMemFree, cuMemFreeHost, cuMemGetAddressRange, cuMemGetInfo, cuMemHostAlloc, cuMemHostGetDevicePointer, cuMemsetD2D8, cuMemsetD2D16, cuMemsetD2D32, cuMemsetD8, cuMemsetD16, cuMemsetD32
CUresult CUDAAPI cuMemcpyAtoHAsync ( void *  dstHost,
CUarray  srcArray,
size_t  srcOffset,
size_t  ByteCount,
CUstream  hStream 
)

Copies memory from Array to Host.

Copies from one 1D CUDA array to host memory. dstHost specifies the base pointer of the destination. srcArray and srcOffset specify the CUDA array handle and starting offset in bytes of the source data. ByteCount specifies the number of bytes to copy.

cuMemcpyAtoHAsync() is asynchronous and can optionally be associated to a stream by passing a non-zero stream argument. It only works on page-locked host memory and returns an error if a pointer to pageable memory is passed as input.

Parameters:
dstHost- Destination pointer
srcArray- Source array
srcOffset- Offset in bytes of source array
ByteCount- Size of memory copy in bytes
hStream- Stream identifier
Returns:
CUDA_SUCCESS, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED, CUDA_ERROR_INVALID_CONTEXT, CUDA_ERROR_INVALID_VALUE
See also:
cuArray3DCreate, cuArray3DGetDescriptor, cuArrayCreate, cuArrayDestroy, cuArrayGetDescriptor, cuMemAlloc, cuMemAllocHost, cuMemAllocPitch, cuMemcpy2D, cuMemcpy2DAsync, cuMemcpy2DUnaligned, cuMemcpy3D, cuMemcpy3DAsync, cuMemcpyAtoA, cuMemcpyAtoD, cuMemcpyAtoH, cuMemcpyDtoA, cuMemcpyDtoD, cuMemcpyDtoDAsync, cuMemcpyDtoH, cuMemcpyDtoHAsync, cuMemcpyHtoA, cuMemcpyHtoAAsync, cuMemcpyHtoD, cuMemcpyHtoDAsync, cuMemFree, cuMemFreeHost, cuMemGetAddressRange, cuMemGetInfo, cuMemHostAlloc, cuMemHostGetDevicePointer, cuMemsetD2D8, cuMemsetD2D8Async, cuMemsetD2D16, cuMemsetD2D16Async, cuMemsetD2D32, cuMemsetD2D32Async, cuMemsetD8, cuMemsetD8Async, cuMemsetD16, cuMemsetD16Async, cuMemsetD32, cuMemsetD32Async
CUresult CUDAAPI cuMemcpyDtoA ( CUarray  dstArray,
size_t  dstOffset,
CUdeviceptr  srcDevice,
size_t  ByteCount 
)

Copies memory from Device to Array.

Copies from device memory to a 1D CUDA array. dstArray and dstOffset specify the CUDA array handle and starting index of the destination data. srcDevice specifies the base pointer of the source. ByteCount specifies the number of bytes to copy.

Parameters:
dstArray- Destination array
dstOffset- Offset in bytes of destination array
srcDevice- Source device pointer
ByteCount- Size of memory copy in bytes
Returns:
CUDA_SUCCESS, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED, CUDA_ERROR_INVALID_CONTEXT, CUDA_ERROR_INVALID_VALUE
See also:
cuArray3DCreate, cuArray3DGetDescriptor, cuArrayCreate, cuArrayDestroy, cuArrayGetDescriptor, cuMemAlloc, cuMemAllocHost, cuMemAllocPitch, cuMemcpy2D, cuMemcpy2DAsync, cuMemcpy2DUnaligned, cuMemcpy3D, cuMemcpy3DAsync, cuMemcpyAtoA, cuMemcpyAtoD, cuMemcpyAtoH, cuMemcpyAtoHAsync, cuMemcpyDtoD, cuMemcpyDtoDAsync, cuMemcpyDtoH, cuMemcpyDtoHAsync, cuMemcpyHtoA, cuMemcpyHtoAAsync, cuMemcpyHtoD, cuMemcpyHtoDAsync, cuMemFree, cuMemFreeHost, cuMemGetAddressRange, cuMemGetInfo, cuMemHostAlloc, cuMemHostGetDevicePointer, cuMemsetD2D8, cuMemsetD2D16, cuMemsetD2D32, cuMemsetD8, cuMemsetD16, cuMemsetD32
CUresult CUDAAPI cuMemcpyDtoD ( CUdeviceptr  dstDevice,
CUdeviceptr  srcDevice,
size_t  ByteCount 
)

Copies memory from Device to Device.

Copies from device memory to device memory. dstDevice and srcDevice are the base pointers of the destination and source, respectively. ByteCount specifies the number of bytes to copy. Note that this function is asynchronous.

Parameters:
dstDevice- Destination device pointer
srcDevice- Source device pointer
ByteCount- Size of memory copy in bytes
Returns:
CUDA_SUCCESS, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED, CUDA_ERROR_INVALID_CONTEXT, CUDA_ERROR_INVALID_VALUE
See also:
cuArray3DCreate, cuArray3DGetDescriptor, cuArrayCreate, cuArrayDestroy, cuArrayGetDescriptor, cuMemAlloc, cuMemAllocHost, cuMemAllocPitch, cuMemcpy2D, cuMemcpy2DAsync, cuMemcpy2DUnaligned, cuMemcpy3D, cuMemcpy3DAsync, cuMemcpyAtoA, cuMemcpyAtoD, cuMemcpyAtoH, cuMemcpyAtoHAsync, cuMemcpyDtoA, cuMemcpyDtoH, cuMemcpyDtoHAsync, cuMemcpyHtoA, cuMemcpyHtoAAsync, cuMemcpyHtoD, cuMemcpyHtoDAsync, cuMemFree, cuMemFreeHost, cuMemGetAddressRange, cuMemGetInfo, cuMemHostAlloc, cuMemHostGetDevicePointer, cuMemsetD2D8, cuMemsetD2D16, cuMemsetD2D32, cuMemsetD8, cuMemsetD16, cuMemsetD32
CUresult CUDAAPI cuMemcpyDtoDAsync ( CUdeviceptr  dstDevice,
CUdeviceptr  srcDevice,
size_t  ByteCount,
CUstream  hStream 
)

Copies memory from Device to Device.

Copies from device memory to device memory. dstDevice and srcDevice are the base pointers of the destination and source, respectively. ByteCount specifies the number of bytes to copy. Note that this function is asynchronous and can optionally be associated to a stream by passing a non-zero hStream argument

Parameters:
dstDevice- Destination device pointer
srcDevice- Source device pointer
ByteCount- Size of memory copy in bytes
hStream- Stream identifier
Returns:
CUDA_SUCCESS, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED, CUDA_ERROR_INVALID_CONTEXT, CUDA_ERROR_INVALID_VALUE
See also:
cuArray3DCreate, cuArray3DGetDescriptor, cuArrayCreate, cuArrayDestroy, cuArrayGetDescriptor, cuMemAlloc, cuMemAllocHost, cuMemAllocPitch, cuMemcpy2D, cuMemcpy2DAsync, cuMemcpy2DUnaligned, cuMemcpy3D, cuMemcpy3DAsync, cuMemcpyAtoA, cuMemcpyAtoD, cuMemcpyAtoH, cuMemcpyAtoHAsync, cuMemcpyDtoA, cuMemcpyDtoD, cuMemcpyDtoH, cuMemcpyDtoHAsync, cuMemcpyHtoA, cuMemcpyHtoAAsync, cuMemcpyHtoD, cuMemcpyHtoDAsync, cuMemFree, cuMemFreeHost, cuMemGetAddressRange, cuMemGetInfo, cuMemHostAlloc, cuMemHostGetDevicePointer, cuMemsetD2D8, cuMemsetD2D8Async, cuMemsetD2D16, cuMemsetD2D16Async, cuMemsetD2D32, cuMemsetD2D32Async, cuMemsetD8, cuMemsetD8Async, cuMemsetD16, cuMemsetD16Async, cuMemsetD32, cuMemsetD32Async
CUresult CUDAAPI cuMemcpyDtoH ( void *  dstHost,
CUdeviceptr  srcDevice,
size_t  ByteCount 
)

Copies memory from Device to Host.

Copies from device to host memory. dstHost and srcDevice specify the base pointers of the destination and source, respectively. ByteCount specifies the number of bytes to copy. Note that this function is synchronous.

Parameters:
dstHost- Destination host pointer
srcDevice- Source device pointer
ByteCount- Size of memory copy in bytes
Returns:
CUDA_SUCCESS, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED, CUDA_ERROR_INVALID_CONTEXT, CUDA_ERROR_INVALID_VALUE
See also:
cuArray3DCreate, cuArray3DGetDescriptor, cuArrayCreate, cuArrayDestroy, cuArrayGetDescriptor, cuMemAlloc, cuMemAllocHost, cuMemAllocPitch, cuMemcpy2D, cuMemcpy2DAsync, cuMemcpy2DUnaligned, cuMemcpy3D, cuMemcpy3DAsync, cuMemcpyAtoA, cuMemcpyAtoD, cuMemcpyAtoH, cuMemcpyAtoHAsync, cuMemcpyDtoA, cuMemcpyDtoD, cuMemcpyDtoDAsync, cuMemcpyDtoHAsync, cuMemcpyHtoA, cuMemcpyHtoAAsync, cuMemcpyHtoD, cuMemcpyHtoDAsync, cuMemFree, cuMemFreeHost, cuMemGetAddressRange, cuMemGetInfo, cuMemHostAlloc, cuMemHostGetDevicePointer, cuMemsetD2D8, cuMemsetD2D16, cuMemsetD2D32, cuMemsetD8, cuMemsetD16, cuMemsetD32
CUresult CUDAAPI cuMemcpyDtoHAsync ( void *  dstHost,
CUdeviceptr  srcDevice,
size_t  ByteCount,
CUstream  hStream 
)

Copies memory from Device to Host.

Copies from device to host memory. dstHost and srcDevice specify the base pointers of the destination and source, respectively. ByteCount specifies the number of bytes to copy.

cuMemcpyDtoHAsync() is asynchronous and can optionally be associated to a stream by passing a non-zero hStream argument. It only works on page-locked memory and returns an error if a pointer to pageable memory is passed as input.

Parameters:
dstHost- Destination host pointer
srcDevice- Source device pointer
ByteCount- Size of memory copy in bytes
hStream- Stream identifier
Returns:
CUDA_SUCCESS, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED, CUDA_ERROR_INVALID_CONTEXT, CUDA_ERROR_INVALID_VALUE
See also:
cuArray3DCreate, cuArray3DGetDescriptor, cuArrayCreate, cuArrayDestroy, cuArrayGetDescriptor, cuMemAlloc, cuMemAllocHost, cuMemAllocPitch, cuMemcpy2D, cuMemcpy2DAsync, cuMemcpy2DUnaligned, cuMemcpy3D, cuMemcpy3DAsync, cuMemcpyAtoA, cuMemcpyAtoD, cuMemcpyAtoH, cuMemcpyAtoHAsync, cuMemcpyDtoA, cuMemcpyDtoD, cuMemcpyDtoDAsync, cuMemcpyDtoH, cuMemcpyHtoA, cuMemcpyHtoAAsync, cuMemcpyHtoD, cuMemcpyHtoDAsync, cuMemFree, cuMemFreeHost, cuMemGetAddressRange, cuMemGetInfo, cuMemHostAlloc, cuMemHostGetDevicePointer, cuMemsetD2D8, cuMemsetD2D8Async, cuMemsetD2D16, cuMemsetD2D16Async, cuMemsetD2D32, cuMemsetD2D32Async, cuMemsetD8, cuMemsetD8Async, cuMemsetD16, cuMemsetD16Async, cuMemsetD32, cuMemsetD32Async
CUresult CUDAAPI cuMemcpyHtoA ( CUarray  dstArray,
size_t  dstOffset,
const void *  srcHost,
size_t  ByteCount 
)

Copies memory from Host to Array.

Copies from host memory to a 1D CUDA array. dstArray and dstOffset specify the CUDA array handle and starting offset in bytes of the destination data. pSrc specifies the base address of the source. ByteCount specifies the number of bytes to copy.

Parameters:
dstArray- Destination array
dstOffset- Offset in bytes of destination array
srcHost- Source host pointer
ByteCount- Size of memory copy in bytes
Returns:
CUDA_SUCCESS, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED, CUDA_ERROR_INVALID_CONTEXT, CUDA_ERROR_INVALID_VALUE
See also:
cuArray3DCreate, cuArray3DGetDescriptor, cuArrayCreate, cuArrayDestroy, cuArrayGetDescriptor, cuMemAlloc, cuMemAllocHost, cuMemAllocPitch, cuMemcpy2D, cuMemcpy2DAsync, cuMemcpy2DUnaligned, cuMemcpy3D, cuMemcpy3DAsync, cuMemcpyAtoA, cuMemcpyAtoD, cuMemcpyAtoH, cuMemcpyAtoHAsync, cuMemcpyDtoA, cuMemcpyDtoD, cuMemcpyDtoDAsync, cuMemcpyDtoH, cuMemcpyDtoHAsync, cuMemcpyHtoAAsync, cuMemcpyHtoD, cuMemcpyHtoDAsync, cuMemFree, cuMemFreeHost, cuMemGetAddressRange, cuMemGetInfo, cuMemHostAlloc, cuMemHostGetDevicePointer, cuMemsetD2D8, cuMemsetD2D16, cuMemsetD2D32, cuMemsetD8, cuMemsetD16, cuMemsetD32
CUresult CUDAAPI cuMemcpyHtoAAsync ( CUarray  dstArray,
size_t  dstOffset,
const void *  srcHost,
size_t  ByteCount,
CUstream  hStream 
)

Copies memory from Host to Array.

Copies from host memory to a 1D CUDA array. dstArray and dstOffset specify the CUDA array handle and starting offset in bytes of the destination data. srcHost specifies the base address of the source. ByteCount specifies the number of bytes to copy.

cuMemcpyHtoAAsync() is asynchronous and can optionally be associated to a stream by passing a non-zero hStream argument. It only works on page-locked memory and returns an error if a pointer to pageable memory is passed as input.

Parameters:
dstArray- Destination array
dstOffset- Offset in bytes of destination array
srcHost- Source host pointer
ByteCount- Size of memory copy in bytes
hStream- Stream identifier
Returns:
CUDA_SUCCESS, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED, CUDA_ERROR_INVALID_CONTEXT, CUDA_ERROR_INVALID_VALUE
See also:
cuArray3DCreate, cuArray3DGetDescriptor, cuArrayCreate, cuArrayDestroy, cuArrayGetDescriptor, cuMemAlloc, cuMemAllocHost, cuMemAllocPitch, cuMemcpy2D, cuMemcpy2DAsync, cuMemcpy2DUnaligned, cuMemcpy3D, cuMemcpy3DAsync, cuMemcpyAtoA, cuMemcpyAtoD, cuMemcpyAtoH, cuMemcpyAtoHAsync, cuMemcpyDtoA, cuMemcpyDtoD, cuMemcpyDtoDAsync, cuMemcpyDtoH, cuMemcpyDtoHAsync, cuMemcpyHtoA, cuMemcpyHtoD, cuMemcpyHtoDAsync, cuMemFree, cuMemFreeHost, cuMemGetAddressRange, cuMemGetInfo, cuMemHostAlloc, cuMemHostGetDevicePointer, cuMemsetD2D8, cuMemsetD2D8Async, cuMemsetD2D16, cuMemsetD2D16Async, cuMemsetD2D32, cuMemsetD2D32Async, cuMemsetD8, cuMemsetD8Async, cuMemsetD16, cuMemsetD16Async, cuMemsetD32, cuMemsetD32Async
CUresult CUDAAPI cuMemcpyHtoD ( CUdeviceptr  dstDevice,
const void *  srcHost,
size_t  ByteCount 
)

Copies memory from Host to Device.

Copies from host memory to device memory. dstDevice and srcHost are the base addresses of the destination and source, respectively. ByteCount specifies the number of bytes to copy. Note that this function is synchronous.

Parameters:
dstDevice- Destination device pointer
srcHost- Source host pointer
ByteCount- Size of memory copy in bytes
Returns:
CUDA_SUCCESS, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED, CUDA_ERROR_INVALID_CONTEXT, CUDA_ERROR_INVALID_VALUE
See also:
cuArray3DCreate, cuArray3DGetDescriptor, cuArrayCreate, cuArrayDestroy, cuArrayGetDescriptor, cuMemAlloc, cuMemAllocHost, cuMemAllocPitch, cuMemcpy2D, cuMemcpy2DAsync, cuMemcpy2DUnaligned, cuMemcpy3D, cuMemcpy3DAsync, cuMemcpyAtoA, cuMemcpyAtoD, cuMemcpyAtoH, cuMemcpyAtoHAsync, cuMemcpyDtoA, cuMemcpyDtoD, cuMemcpyDtoDAsync, cuMemcpyDtoH, cuMemcpyDtoHAsync, cuMemcpyHtoA, cuMemcpyHtoAAsync, cuMemcpyHtoDAsync, cuMemFree, cuMemFreeHost, cuMemGetAddressRange, cuMemGetInfo, cuMemHostAlloc, cuMemHostGetDevicePointer, cuMemsetD2D8, cuMemsetD2D16, cuMemsetD2D32, cuMemsetD8, cuMemsetD16, cuMemsetD32
CUresult CUDAAPI cuMemcpyHtoDAsync ( CUdeviceptr  dstDevice,
const void *  srcHost,
size_t  ByteCount,
CUstream  hStream 
)

Copies memory from Host to Device.

Copies from host memory to device memory. dstDevice and srcHost are the base addresses of the destination and source, respectively. ByteCount specifies the number of bytes to copy.

cuMemcpyHtoDAsync() is asynchronous and can optionally be associated to a stream by passing a non-zero hStream argument. It only works on page-locked memory and returns an error if a pointer to pageable memory is passed as input.

Parameters:
dstDevice- Destination device pointer
srcHost- Source host pointer
ByteCount- Size of memory copy in bytes
hStream- Stream identifier
Returns:
CUDA_SUCCESS, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED, CUDA_ERROR_INVALID_CONTEXT, CUDA_ERROR_INVALID_VALUE
See also:
cuArray3DCreate, cuArray3DGetDescriptor, cuArrayCreate, cuArrayDestroy, cuArrayGetDescriptor, cuMemAlloc, cuMemAllocHost, cuMemAllocPitch, cuMemcpy2D, cuMemcpy2DAsync, cuMemcpy2DUnaligned, cuMemcpy3D, cuMemcpy3DAsync, cuMemcpyAtoA, cuMemcpyAtoD, cuMemcpyAtoH, cuMemcpyAtoHAsync, cuMemcpyDtoA, cuMemcpyDtoD, cuMemcpyDtoDAsync, cuMemcpyDtoH, cuMemcpyDtoHAsync, cuMemcpyHtoA, cuMemcpyHtoAAsync, cuMemcpyHtoD, cuMemFree, cuMemFreeHost, cuMemGetAddressRange, cuMemGetInfo, cuMemHostAlloc, cuMemHostGetDevicePointer, cuMemsetD2D8, cuMemsetD2D8Async, cuMemsetD2D16, cuMemsetD2D16Async, cuMemsetD2D32, cuMemsetD2D32Async, cuMemsetD8, cuMemsetD8Async, cuMemsetD16, cuMemsetD16Async, cuMemsetD32, cuMemsetD32Async
CUresult CUDAAPI cuMemFree ( CUdeviceptr  dptr)
CUresult CUDAAPI cuMemFreeHost ( void *  p)
CUresult CUDAAPI cuMemGetAddressRange ( CUdeviceptr pbase,
size_t psize,
CUdeviceptr  dptr 
)

Get information on memory allocations.

Returns the base address in *pbase and size in *psize of the allocation by cuMemAlloc() or cuMemAllocPitch() that contains the input pointer dptr. Both parameters pbase and psize are optional. If one of them is NULL, it is ignored.

Parameters:
pbase- Returned base address
psize- Returned size of device memory allocation
dptr- Device pointer to query
Returns:
CUDA_SUCCESS, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED, CUDA_ERROR_INVALID_CONTEXT, CUDA_ERROR_INVALID_VALUE
See also:
cuArray3DCreate, cuArray3DGetDescriptor, cuArrayCreate, cuArrayDestroy, cuArrayGetDescriptor, cuMemAlloc, cuMemAllocHost, cuMemAllocPitch, cuMemcpy2D, cuMemcpy2DAsync, cuMemcpy2DUnaligned, cuMemcpy3D, cuMemcpy3DAsync, cuMemcpyAtoA, cuMemcpyAtoD, cuMemcpyAtoH, cuMemcpyAtoHAsync, cuMemcpyDtoA, cuMemcpyDtoD, cuMemcpyDtoDAsync, cuMemcpyDtoH, cuMemcpyDtoHAsync, cuMemcpyHtoA, cuMemcpyHtoAAsync, cuMemcpyHtoD, cuMemcpyHtoDAsync, cuMemFree, cuMemFreeHost, cuMemGetInfo, cuMemHostAlloc, cuMemHostGetDevicePointer, cuMemsetD2D8, cuMemsetD2D16, cuMemsetD2D32, cuMemsetD8, cuMemsetD16, cuMemsetD32
CUresult CUDAAPI cuMemGetInfo ( size_t free,
size_t total 
)
CUresult CUDAAPI cuMemHostAlloc ( void **  pp,
size_t  bytesize,
unsigned int  Flags 
)

Allocates page-locked host memory.

Allocates bytesize bytes of host memory that is page-locked and accessible to the device. The driver tracks the virtual memory ranges allocated with this function and automatically accelerates calls to functions such as cuMemcpyHtoD(). Since the memory can be accessed directly by the device, it can be read or written with much higher bandwidth than pageable memory obtained with functions such as malloc(). Allocating excessive amounts of pinned memory may degrade system performance, since it reduces the amount of memory available to the system for paging. As a result, this function is best used sparingly to allocate staging areas for data exchange between host and device.

The Flags parameter enables different options to be specified that affect the allocation, as follows.

  • CU_MEMHOSTALLOC_PORTABLE: The memory returned by this call will be considered as pinned memory by all CUDA contexts, not just the one that performed the allocation.
  • CU_MEMHOSTALLOC_WRITECOMBINED: Allocates the memory as write-combined (WC). WC memory can be transferred across the PCI Express bus more quickly on some system configurations, but cannot be read efficiently by most CPUs. WC memory is a good option for buffers that will be written by the CPU and read by the GPU via mapped pinned memory or host->device transfers.

All of these flags are orthogonal to one another: a developer may allocate memory that is portable, mapped and/or write-combined with no restrictions.

The CUDA context must have been created with the CU_CTX_MAP_HOST flag in order for the ::CU_MEMHOSTALLOC_MAPPED flag to have any effect.

The ::CU_MEMHOSTALLOC_MAPPED flag may be specified on CUDA contexts for devices that do not support mapped pinned memory. The failure is deferred to cuMemHostGetDevicePointer() because the memory may be mapped into other CUDA contexts via the CU_MEMHOSTALLOC_PORTABLE flag.

The memory allocated by this function must be freed with cuMemFreeHost().

Parameters:
pp- Returned host pointer to page-locked memory
bytesize- Requested allocation size in bytes
Flags- Flags for allocation request
Returns:
CUDA_SUCCESS, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED, CUDA_ERROR_INVALID_CONTEXT, CUDA_ERROR_INVALID_VALUE, CUDA_ERROR_OUT_OF_MEMORY
See also:
cuArray3DCreate, cuArray3DGetDescriptor, cuArrayCreate, cuArrayDestroy, cuArrayGetDescriptor, cuMemAlloc, cuMemAllocHost, cuMemAllocPitch, cuMemcpy2D, cuMemcpy2DAsync, cuMemcpy2DUnaligned, cuMemcpy3D, cuMemcpy3DAsync, cuMemcpyAtoA, cuMemcpyAtoD, cuMemcpyAtoH, cuMemcpyAtoHAsync, cuMemcpyDtoA, cuMemcpyDtoD, cuMemcpyDtoDAsync, cuMemcpyDtoH, cuMemcpyDtoHAsync, cuMemcpyHtoA, cuMemcpyHtoAAsync, cuMemcpyHtoD, cuMemcpyHtoDAsync, cuMemFree, cuMemFreeHost, cuMemGetAddressRange, cuMemGetInfo, cuMemHostGetDevicePointer, cuMemsetD2D8, cuMemsetD2D16, cuMemsetD2D32, cuMemsetD8, cuMemsetD16, cuMemsetD32
CUresult CUDAAPI cuMemHostGetDevicePointer ( CUdeviceptr pdptr,
void *  p,
unsigned int  Flags 
)

Passes back device pointer of mapped pinned memory.

Passes back the device pointer pdptr corresponding to the mapped, pinned host buffer p allocated by cuMemHostAlloc.

cuMemHostGetDevicePointer() will fail if the ::CU_MEMALLOCHOST_DEVICEMAP flag was not specified at the time the memory was allocated, or if the function is called on a GPU that does not support mapped pinned memory.

Flags provides for future releases. For now, it must be set to 0.

Parameters:
pdptr- Returned device pointer
p- Host pointer
Flags- Options (must be 0)
Returns:
CUDA_SUCCESS, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED, CUDA_ERROR_INVALID_CONTEXT, CUDA_ERROR_INVALID_VALUE
See also:
cuArray3DCreate, cuArray3DGetDescriptor, cuArrayCreate, cuArrayDestroy, cuArrayGetDescriptor, cuMemAlloc, cuMemAllocHost, cuMemAllocPitch, cuMemcpy2D, cuMemcpy2DAsync, cuMemcpy2DUnaligned, cuMemcpy3D, cuMemcpy3DAsync, cuMemcpyAtoA, cuMemcpyAtoD, cuMemcpyAtoH, cuMemcpyAtoHAsync, cuMemcpyDtoA, cuMemcpyDtoD, cuMemcpyDtoDAsync, cuMemcpyDtoH, cuMemcpyDtoHAsync, cuMemcpyHtoA, cuMemcpyHtoAAsync, cuMemcpyHtoD, cuMemcpyHtoDAsync, cuMemFree, cuMemFreeHost, cuMemGetAddressRange, cuMemGetInfo, cuMemHostAlloc, cuMemsetD2D8, cuMemsetD2D16, cuMemsetD2D32, cuMemsetD8, cuMemsetD16, cuMemsetD32
CUresult CUDAAPI cuMemHostGetFlags ( unsigned int *  pFlags,
void *  p 
)

Passes back flags that were used for a pinned allocation.

Passes back the flags pFlags that were specified when allocating the pinned host buffer p allocated by cuMemHostAlloc.

cuMemHostGetFlags() will fail if the pointer does not reside in an allocation performed by cuMemAllocHost() or cuMemHostAlloc().

Parameters:
pFlags- Returned flags word
p- Host pointer
Returns:
CUDA_SUCCESS, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED, CUDA_ERROR_INVALID_CONTEXT, CUDA_ERROR_INVALID_VALUE
See also:
cuMemAllocHost, cuMemHostAlloc
CUresult CUDAAPI cuMemsetD16 ( CUdeviceptr  dstDevice,
unsigned short  us,
size_t  N 
)
CUresult CUDAAPI cuMemsetD16Async ( CUdeviceptr  dstDevice,
unsigned short  us,
size_t  N,
CUstream  hStream 
)
CUresult CUDAAPI cuMemsetD2D16 ( CUdeviceptr  dstDevice,
size_t  dstPitch,
unsigned short  us,
size_t  Width,
size_t  Height 
)

Initializes device memory.

Sets the 2D memory range of Width 16-bit values to the specified value us. Height specifies the number of rows to set, and dstPitch specifies the number of bytes between each row. This function performs fastest when the pitch is one that has been passed back by cuMemAllocPitch().

Parameters:
dstDevice- Destination device pointer
dstPitch- Pitch of destination device pointer
us- Value to set
Width- Width of row
Height- Number of rows
Returns:
CUDA_SUCCESS, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED, CUDA_ERROR_INVALID_CONTEXT, CUDA_ERROR_INVALID_VALUE
See also:
cuArray3DCreate, cuArray3DGetDescriptor, cuArrayCreate, cuArrayDestroy, cuArrayGetDescriptor, cuMemAlloc, cuMemAllocHost, cuMemAllocPitch, cuMemcpy2D, cuMemcpy2DAsync, cuMemcpy2DUnaligned, cuMemcpy3D, cuMemcpy3DAsync, cuMemcpyAtoA, cuMemcpyAtoD, cuMemcpyAtoH, cuMemcpyAtoHAsync, cuMemcpyDtoA, cuMemcpyDtoD, cuMemcpyDtoDAsync, cuMemcpyDtoH, cuMemcpyDtoHAsync, cuMemcpyHtoA, cuMemcpyHtoAAsync, cuMemcpyHtoD, cuMemcpyHtoDAsync, cuMemFree, cuMemFreeHost, cuMemGetAddressRange, cuMemGetInfo, cuMemHostAlloc, cuMemHostGetDevicePointer, cuMemsetD2D8, cuMemsetD2D8Async, cuMemsetD2D16Async, cuMemsetD2D32, cuMemsetD2D32Async, cuMemsetD8, cuMemsetD8Async, cuMemsetD16, cuMemsetD16Async, cuMemsetD32, cuMemsetD32Async
CUresult CUDAAPI cuMemsetD2D16Async ( CUdeviceptr  dstDevice,
size_t  dstPitch,
unsigned short  us,
size_t  Width,
size_t  Height,
CUstream  hStream 
)

Sets device memory.

Sets the 2D memory range of Width 16-bit values to the specified value us. Height specifies the number of rows to set, and dstPitch specifies the number of bytes between each row. This function performs fastest when the pitch is one that has been passed back by cuMemAllocPitch().

cuMemsetD2D16Async() is asynchronous and can optionally be associated to a stream by passing a non-zero stream argument.

Parameters:
dstDevice- Destination device pointer
dstPitch- Pitch of destination device pointer
us- Value to set
Width- Width of row
Height- Number of rows
hStream- Stream identifier
Returns:
CUDA_SUCCESS, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED, CUDA_ERROR_INVALID_CONTEXT, CUDA_ERROR_INVALID_VALUE
See also:
cuArray3DCreate, cuArray3DGetDescriptor, cuArrayCreate, cuArrayDestroy, cuArrayGetDescriptor, cuMemAlloc, cuMemAllocHost, cuMemAllocPitch, cuMemcpy2D, cuMemcpy2DAsync, cuMemcpy2DUnaligned, cuMemcpy3D, cuMemcpy3DAsync, cuMemcpyAtoA, cuMemcpyAtoD, cuMemcpyAtoH, cuMemcpyAtoHAsync, cuMemcpyDtoA, cuMemcpyDtoD, cuMemcpyDtoDAsync, cuMemcpyDtoH, cuMemcpyDtoHAsync, cuMemcpyHtoA, cuMemcpyHtoAAsync, cuMemcpyHtoD, cuMemcpyHtoDAsync, cuMemFree, cuMemFreeHost, cuMemGetAddressRange, cuMemGetInfo, cuMemHostAlloc, cuMemHostGetDevicePointer, cuMemsetD2D8, cuMemsetD2D8Async, cuMemsetD2D16, cuMemsetD2D32, cuMemsetD2D32Async, cuMemsetD8, cuMemsetD8Async, cuMemsetD16, cuMemsetD16Async, cuMemsetD32, cuMemsetD32Async
CUresult CUDAAPI cuMemsetD2D32 ( CUdeviceptr  dstDevice,
size_t  dstPitch,
unsigned int  ui,
size_t  Width,
size_t  Height 
)

Initializes device memory.

Sets the 2D memory range of Width 32-bit values to the specified value ui. Height specifies the number of rows to set, and dstPitch specifies the number of bytes between each row. This function performs fastest when the pitch is one that has been passed back by cuMemAllocPitch().

Parameters:
dstDevice- Destination device pointer
dstPitch- Pitch of destination device pointer
ui- Value to set
Width- Width of row
Height- Number of rows
Returns:
CUDA_SUCCESS, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED, CUDA_ERROR_INVALID_CONTEXT, CUDA_ERROR_INVALID_VALUE
See also:
cuArray3DCreate, cuArray3DGetDescriptor, cuArrayCreate, cuArrayDestroy, cuArrayGetDescriptor, cuMemAlloc, cuMemAllocHost, cuMemAllocPitch, cuMemcpy2D, cuMemcpy2DAsync, cuMemcpy2DUnaligned, cuMemcpy3D, cuMemcpy3DAsync, cuMemcpyAtoA, cuMemcpyAtoD, cuMemcpyAtoH, cuMemcpyAtoHAsync, cuMemcpyDtoA, cuMemcpyDtoD, cuMemcpyDtoDAsync, cuMemcpyDtoH, cuMemcpyDtoHAsync, cuMemcpyHtoA, cuMemcpyHtoAAsync, cuMemcpyHtoD, cuMemcpyHtoDAsync, cuMemFree, cuMemFreeHost, cuMemGetAddressRange, cuMemGetInfo, cuMemHostAlloc, cuMemHostGetDevicePointer, cuMemsetD2D8, cuMemsetD2D8Async, cuMemsetD2D16, cuMemsetD2D16Async, cuMemsetD2D32Async, cuMemsetD8, cuMemsetD8Async, cuMemsetD16, cuMemsetD16Async, cuMemsetD32, cuMemsetD32Async
CUresult CUDAAPI cuMemsetD2D32Async ( CUdeviceptr  dstDevice,
size_t  dstPitch,
unsigned int  ui,
size_t  Width,
size_t  Height,
CUstream  hStream 
)

Sets device memory.

Sets the 2D memory range of Width 32-bit values to the specified value ui. Height specifies the number of rows to set, and dstPitch specifies the number of bytes between each row. This function performs fastest when the pitch is one that has been passed back by cuMemAllocPitch().

cuMemsetD2D32Async() is asynchronous and can optionally be associated to a stream by passing a non-zero stream argument.

Parameters:
dstDevice- Destination device pointer
dstPitch- Pitch of destination device pointer
ui- Value to set
Width- Width of row
Height- Number of rows
hStream- Stream identifier
Returns:
CUDA_SUCCESS, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED, CUDA_ERROR_INVALID_CONTEXT, CUDA_ERROR_INVALID_VALUE
See also:
cuArray3DCreate, cuArray3DGetDescriptor, cuArrayCreate, cuArrayDestroy, cuArrayGetDescriptor, cuMemAlloc, cuMemAllocHost, cuMemAllocPitch, cuMemcpy2D, cuMemcpy2DAsync, cuMemcpy2DUnaligned, cuMemcpy3D, cuMemcpy3DAsync, cuMemcpyAtoA, cuMemcpyAtoD, cuMemcpyAtoH, cuMemcpyAtoHAsync, cuMemcpyDtoA, cuMemcpyDtoD, cuMemcpyDtoDAsync, cuMemcpyDtoH, cuMemcpyDtoHAsync, cuMemcpyHtoA, cuMemcpyHtoAAsync, cuMemcpyHtoD, cuMemcpyHtoDAsync, cuMemFree, cuMemFreeHost, cuMemGetAddressRange, cuMemGetInfo, cuMemHostAlloc, cuMemHostGetDevicePointer, cuMemsetD2D8, cuMemsetD2D8Async, cuMemsetD2D16, cuMemsetD2D16Async, cuMemsetD2D32, cuMemsetD8, cuMemsetD8Async, cuMemsetD16, cuMemsetD16Async, cuMemsetD32, cuMemsetD32Async
CUresult CUDAAPI cuMemsetD2D8 ( CUdeviceptr  dstDevice,
size_t  dstPitch,
unsigned char  uc,
size_t  Width,
size_t  Height 
)

Initializes device memory.

Sets the 2D memory range of Width 8-bit values to the specified value uc. Height specifies the number of rows to set, and dstPitch specifies the number of bytes between each row. This function performs fastest when the pitch is one that has been passed back by cuMemAllocPitch().

Parameters:
dstDevice- Destination device pointer
dstPitch- Pitch of destination device pointer
uc- Value to set
Width- Width of row
Height- Number of rows
Returns:
CUDA_SUCCESS, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED, CUDA_ERROR_INVALID_CONTEXT, CUDA_ERROR_INVALID_VALUE
See also:
cuArray3DCreate, cuArray3DGetDescriptor, cuArrayCreate, cuArrayDestroy, cuArrayGetDescriptor, cuMemAlloc, cuMemAllocHost, cuMemAllocPitch, cuMemcpy2D, cuMemcpy2DAsync, cuMemcpy2DUnaligned, cuMemcpy3D, cuMemcpy3DAsync, cuMemcpyAtoA, cuMemcpyAtoD, cuMemcpyAtoH, cuMemcpyAtoHAsync, cuMemcpyDtoA, cuMemcpyDtoD, cuMemcpyDtoDAsync, cuMemcpyDtoH, cuMemcpyDtoHAsync, cuMemcpyHtoA, cuMemcpyHtoAAsync, cuMemcpyHtoD, cuMemcpyHtoDAsync, cuMemFree, cuMemFreeHost, cuMemGetAddressRange, cuMemGetInfo, cuMemHostAlloc, cuMemHostGetDevicePointer, cuMemsetD2D8Async, cuMemsetD2D16, cuMemsetD2D16Async, cuMemsetD2D32, cuMemsetD2D32Async, cuMemsetD8, cuMemsetD8Async, cuMemsetD16, cuMemsetD16Async, cuMemsetD32, cuMemsetD32Async
CUresult CUDAAPI cuMemsetD2D8Async ( CUdeviceptr  dstDevice,
size_t  dstPitch,
unsigned char  uc,
size_t  Width,
size_t  Height,
CUstream  hStream 
)

Sets device memory.

Sets the 2D memory range of Width 8-bit values to the specified value uc. Height specifies the number of rows to set, and dstPitch specifies the number of bytes between each row. This function performs fastest when the pitch is one that has been passed back by cuMemAllocPitch().

cuMemsetD2D8Async() is asynchronous and can optionally be associated to a stream by passing a non-zero stream argument.

Parameters:
dstDevice- Destination device pointer
dstPitch- Pitch of destination device pointer
uc- Value to set
Width- Width of row
Height- Number of rows
hStream- Stream identifier
Returns:
CUDA_SUCCESS, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED, CUDA_ERROR_INVALID_CONTEXT, CUDA_ERROR_INVALID_VALUE
See also:
cuArray3DCreate, cuArray3DGetDescriptor, cuArrayCreate, cuArrayDestroy, cuArrayGetDescriptor, cuMemAlloc, cuMemAllocHost, cuMemAllocPitch, cuMemcpy2D, cuMemcpy2DAsync, cuMemcpy2DUnaligned, cuMemcpy3D, cuMemcpy3DAsync, cuMemcpyAtoA, cuMemcpyAtoD, cuMemcpyAtoH, cuMemcpyAtoHAsync, cuMemcpyDtoA, cuMemcpyDtoD, cuMemcpyDtoDAsync, cuMemcpyDtoH, cuMemcpyDtoHAsync, cuMemcpyHtoA, cuMemcpyHtoAAsync, cuMemcpyHtoD, cuMemcpyHtoDAsync, cuMemFree, cuMemFreeHost, cuMemGetAddressRange, cuMemGetInfo, cuMemHostAlloc, cuMemHostGetDevicePointer, cuMemsetD2D8, cuMemsetD2D16, cuMemsetD2D16Async, cuMemsetD2D32, cuMemsetD2D32Async, cuMemsetD8, cuMemsetD8Async, cuMemsetD16, cuMemsetD16Async, cuMemsetD32, cuMemsetD32Async
CUresult CUDAAPI cuMemsetD32 ( CUdeviceptr  dstDevice,
unsigned int  ui,
size_t  N 
)
CUresult CUDAAPI cuMemsetD32Async ( CUdeviceptr  dstDevice,
unsigned int  ui,
size_t  N,
CUstream  hStream 
)
CUresult CUDAAPI cuMemsetD8 ( CUdeviceptr  dstDevice,
unsigned char  uc,
size_t  N 
)
CUresult CUDAAPI cuMemsetD8Async ( CUdeviceptr  dstDevice,
unsigned char  uc,
size_t  N,
CUstream  hStream 
)
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Defines