GPUOcelot
Functions

Context Management

CUDA Driver API
Collaboration diagram for Context Management:

Functions

CUresult CUDAAPI cuCtxDestroy (CUcontext ctx)
 Destroy the current context or a floating CUDA context.
CUresult CUDAAPI cuCtxAttach (CUcontext *pctx, unsigned int flags)
 Increment a context's usage-count.
CUresult CUDAAPI cuCtxDetach (CUcontext ctx)
 Decrement a context's usage-count.
CUresult CUDAAPI cuCtxPushCurrent (CUcontext ctx)
 Pushes a floating context on the current CPU thread.
CUresult CUDAAPI cuCtxPopCurrent (CUcontext *pctx)
 Pops the current CUDA context from the current CPU thread.
CUresult CUDAAPI cuCtxGetDevice (CUdevice *device)
 Returns the device ID for the current context.
CUresult CUDAAPI cuCtxSynchronize (void)
 Block for a context's tasks to complete.
CUresult CUDAAPI cuCtxSetLimit (CUlimit limit, size_t value)
 Set resource limits.
CUresult CUDAAPI cuCtxGetLimit (size_t *pvalue, CUlimit limit)
 Returns resource limits.
CUresult CUDAAPI cuCtxGetCacheConfig (CUfunc_cache *pconfig)
 Returns the preferred cache configuration for the current context.
CUresult CUDAAPI cuCtxSetCacheConfig (CUfunc_cache config)
 Sets the preferred cache configuration for the current context.
CUresult CUDAAPI cuCtxGetApiVersion (CUcontext ctx, unsigned int *version)
 Gets the context's API version.
CUresult CUDAAPI cuCtxCreate (CUcontext *pctx, unsigned int flags, CUdevice dev)
 Create a CUDA context.

Detailed Description

This section describes the context management functions of the low-level CUDA driver application programming interface.


Function Documentation

CUresult CUDAAPI cuCtxAttach ( CUcontext pctx,
unsigned int  flags 
)

Increment a context's usage-count.

Increments the usage count of the context and passes back a context handle in *pctx that must be passed to cuCtxDetach() when the application is done with the context. cuCtxAttach() fails if there is no context current to the thread.

Currently, the flags parameter must be 0.

Parameters:
pctx- Returned context handle of the current context
flags- Context attach flags (must be 0)
Returns:
CUDA_SUCCESS, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED, CUDA_ERROR_INVALID_CONTEXT, CUDA_ERROR_INVALID_VALUE
See also:
cuCtxCreate, cuCtxDestroy, cuCtxDetach, cuCtxGetApiVersion, cuCtxGetCacheConfig, cuCtxGetDevice, cuCtxGetLimit, cuCtxPopCurrent, cuCtxPushCurrent, cuCtxSetCacheConfig, cuCtxSetLimit, cuCtxSynchronize
CUresult CUDAAPI cuCtxCreate ( CUcontext pctx,
unsigned int  flags,
CUdevice  dev 
)

Create a CUDA context.

Creates a new CUDA context and associates it with the calling thread. The flags parameter is described below. The context is created with a usage count of 1 and the caller of cuCtxCreate() must call cuCtxDestroy() or cuCtxDetach() when done using the context. If a context is already current to the thread, it is supplanted by the newly created context and may be restored by a subsequent call to cuCtxPopCurrent().

The two LSBs of the flags parameter can be used to control how the OS thread, which owns the CUDA context at the time of an API call, interacts with the OS scheduler when waiting for results from the GPU.

  • CU_CTX_SCHED_AUTO: The default value if the flags parameter is zero, uses a heuristic based on the number of active CUDA contexts in the process C and the number of logical processors in the system P. If C > P, then CUDA will yield to other OS threads when waiting for the GPU, otherwise CUDA will not yield while waiting for results and actively spin on the processor.
  • CU_CTX_SCHED_SPIN: Instruct CUDA to actively spin when waiting for results from the GPU. This can decrease latency when waiting for the GPU, but may lower the performance of CPU threads if they are performing work in parallel with the CUDA thread.
  • CU_CTX_SCHED_YIELD: Instruct CUDA to yield its thread when waiting for results from the GPU. This can increase latency when waiting for the GPU, but can increase the performance of CPU threads performing work in parallel with the GPU.
  • CU_CTX_BLOCKING_SYNC: Instruct CUDA to block the CPU thread on a synchronization primitive when waiting for the GPU to finish work.
  • CU_CTX_MAP_HOST: Instruct CUDA to support mapped pinned allocations. This flag must be set in order to allocate pinned host memory that is accessible to the GPU.
  • CU_CTX_LMEM_RESIZE_TO_MAX: Instruct CUDA to not reduce local memory after resizing local memory for a kernel. This can prevent thrashing by local memory allocations when launching many kernels with high local memory usage at the cost of potentially increased memory usage.

Note to Linux users:

Context creation will fail with CUDA_ERROR_UNKNOWN if the compute mode of the device is CU_COMPUTEMODE_PROHIBITED. Similarly, context creation will also fail with CUDA_ERROR_UNKNOWN if the compute mode for the device is set to CU_COMPUTEMODE_EXCLUSIVE and there is already an active context on the device. The function cuDeviceGetAttribute() can be used with CU_DEVICE_ATTRIBUTE_COMPUTE_MODE to determine the compute mode of the device. The nvidia-smi tool can be used to set the compute mode for devices. Documentation for nvidia-smi can be obtained by passing a -h option to it.

Parameters:
pctx- Returned context handle of the new context
flags- Context creation flags
dev- Device to create context on
Returns:
CUDA_SUCCESS, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED, CUDA_ERROR_INVALID_CONTEXT, CUDA_ERROR_INVALID_DEVICE, CUDA_ERROR_INVALID_VALUE, CUDA_ERROR_OUT_OF_MEMORY, CUDA_ERROR_UNKNOWN
See also:
cuCtxAttach, cuCtxDestroy, cuCtxDetach, cuCtxGetApiVersion, cuCtxGetCacheConfig, cuCtxGetDevice, cuCtxGetLimit, cuCtxPopCurrent, cuCtxPushCurrent, cuCtxSetCacheConfig, cuCtxSetLimit, cuCtxSynchronize
CUresult CUDAAPI cuCtxDestroy ( CUcontext  ctx)

Destroy the current context or a floating CUDA context.

Destroys the CUDA context specified by ctx. If the context usage count is not equal to 1, or the context is current to any CPU thread other than the current one, this function fails. Floating contexts (detached from a CPU thread via cuCtxPopCurrent()) may be destroyed by this function.

Parameters:
ctx- Context to destroy
Returns:
CUDA_SUCCESS, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED, CUDA_ERROR_INVALID_CONTEXT, CUDA_ERROR_INVALID_VALUE
See also:
cuCtxAttach, cuCtxCreate, cuCtxDetach, cuCtxGetApiVersion, cuCtxGetCacheConfig, cuCtxGetDevice, cuCtxGetLimit, cuCtxPopCurrent, cuCtxPushCurrent, cuCtxSetCacheConfig, cuCtxSetLimit, cuCtxSynchronize
CUresult CUDAAPI cuCtxDetach ( CUcontext  ctx)

Decrement a context's usage-count.

Decrements the usage count of the context ctx, and destroys the context if the usage count goes to 0. The context must be a handle that was passed back by cuCtxCreate() or cuCtxAttach(), and must be current to the calling thread.

Parameters:
ctx- Context to destroy
Returns:
CUDA_SUCCESS, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED, CUDA_ERROR_INVALID_CONTEXT
See also:
cuCtxAttach, cuCtxCreate, cuCtxDestroy, cuCtxGetApiVersion, cuCtxGetCacheConfig, cuCtxGetDevice, cuCtxGetLimit, cuCtxPopCurrent, cuCtxPushCurrent, cuCtxSetCacheConfig, cuCtxSetLimit, cuCtxSynchronize
CUresult CUDAAPI cuCtxGetApiVersion ( CUcontext  ctx,
unsigned int *  version 
)

Gets the context's API version.

Returns the API version used to create ctx in version. If ctx is NULL, returns the API version used to create the currently bound context.

This wil return the API version used to create a context (for example, 3010 or 3020), which library developers can use to direct callers to a specific API version. Note that this API version may not be the same as returned by cuDriverGetVersion.

Parameters:
ctx- Context to check
version- Pointer to version
Returns:
CUDA_SUCCESS, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED, CUDA_ERROR_INVALID_CONTEXT, CUDA_ERROR_UNKNOWN
See also:
cuCtxAttach, cuCtxCreate, cuCtxDestroy, cuCtxDetach, cuCtxGetDevice, cuCtxGetLimit, cuCtxPopCurrent, cuCtxPushCurrent, cuCtxSetCacheConfig, cuCtxSetLimit, cuCtxSynchronize
CUresult CUDAAPI cuCtxGetCacheConfig ( CUfunc_cache pconfig)

Returns the preferred cache configuration for the current context.

On devices where the L1 cache and shared memory use the same hardware resources, this returns through pconfig the preferred cache configuration for the current context. This is only a preference. The driver will use the requested configuration if possible, but it is free to choose a different configuration if required to execute functions.

This will return a pconfig of CU_FUNC_CACHE_PREFER_NONE on devices where the size of the L1 cache and shared memory are fixed.

The supported cache configurations are:

Parameters:
pconfig- Returned cache configuration
Returns:
CUDA_SUCCESS, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED, CUDA_ERROR_INVALID_CONTEXT, CUDA_ERROR_INVALID_VALUE
See also:
cuCtxAttach, cuCtxCreate, cuCtxDestroy, cuCtxDetach, cuCtxGetApiVersion, cuCtxGetDevice, cuCtxGetLimit, cuCtxPopCurrent, cuCtxPushCurrent, cuCtxSetCacheConfig, cuCtxSetLimit, cuCtxSynchronize, cuFuncSetCacheConfig
CUresult CUDAAPI cuCtxGetDevice ( CUdevice device)

Returns the device ID for the current context.

Returns in *device the ordinal of the current context's device.

Parameters:
device- Returned device ID for the current context
Returns:
CUDA_SUCCESS, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED, CUDA_ERROR_INVALID_CONTEXT, CUDA_ERROR_INVALID_VALUE,
See also:
cuCtxAttach, cuCtxCreate, cuCtxDestroy, cuCtxDetach, cuCtxGetApiVersion, cuCtxGetCacheConfig, cuCtxGetLimit, cuCtxPopCurrent, cuCtxPushCurrent, cuCtxSetCacheConfig, cuCtxSetLimit, cuCtxSynchronize
CUresult CUDAAPI cuCtxGetLimit ( size_t pvalue,
CUlimit  limit 
)

Returns resource limits.

Returns in *pvalue the current size of limit. The supported CUlimit values are:

Parameters:
limit- Limit to query
pvalue- Returned size in bytes of limit
Returns:
CUDA_SUCCESS, CUDA_ERROR_INVALID_VALUE, CUDA_ERROR_UNSUPPORTED_LIMIT
See also:
cuCtxAttach, cuCtxCreate, cuCtxDestroy, cuCtxDetach, cuCtxGetApiVersion, cuCtxGetCacheConfig, cuCtxGetDevice, cuCtxPopCurrent, cuCtxPushCurrent, cuCtxSetCacheConfig, cuCtxSetLimit, cuCtxSynchronize
CUresult CUDAAPI cuCtxPopCurrent ( CUcontext pctx)

Pops the current CUDA context from the current CPU thread.

Pops the current CUDA context from the CPU thread. The CUDA context must have a usage count of 1. CUDA contexts have a usage count of 1 upon creation; the usage count may be incremented with cuCtxAttach() and decremented with cuCtxDetach().

If successful, cuCtxPopCurrent() passes back the old context handle in *pctx. That context may then be made current to a different CPU thread by calling cuCtxPushCurrent().

Floating contexts may be destroyed by calling cuCtxDestroy().

If a context was current to the CPU thread before cuCtxCreate() or cuCtxPushCurrent() was called, this function makes that context current to the CPU thread again.

Parameters:
pctx- Returned new context handle
Returns:
CUDA_SUCCESS, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED, CUDA_ERROR_INVALID_CONTEXT
See also:
cuCtxAttach, cuCtxCreate, cuCtxDestroy, cuCtxDetach, cuCtxGetApiVersion, cuCtxGetCacheConfig, cuCtxGetDevice, cuCtxGetLimit, cuCtxPushCurrent, cuCtxSetCacheConfig, cuCtxSetLimit, cuCtxSynchronize
CUresult CUDAAPI cuCtxPushCurrent ( CUcontext  ctx)

Pushes a floating context on the current CPU thread.

Pushes the given context ctx onto the CPU thread's stack of current contexts. The specified context becomes the CPU thread's current context, so all CUDA functions that operate on the current context are affected.

The previous current context may be made current again by calling cuCtxDestroy() or cuCtxPopCurrent().

The context must be "floating," i.e. not attached to any thread. Contexts are made to float by calling cuCtxPopCurrent().

Parameters:
ctx- Floating context to attach
Returns:
CUDA_SUCCESS, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED, CUDA_ERROR_INVALID_CONTEXT, CUDA_ERROR_INVALID_VALUE
See also:
cuCtxAttach, cuCtxCreate, cuCtxDestroy, cuCtxDetach, cuCtxGetApiVersion, cuCtxGetCacheConfig, cuCtxGetDevice, cuCtxGetLimit, cuCtxPopCurrent, cuCtxSetCacheConfig, cuCtxSetLimit, cuCtxSynchronize
CUresult CUDAAPI cuCtxSetCacheConfig ( CUfunc_cache  config)

Sets the preferred cache configuration for the current context.

On devices where the L1 cache and shared memory use the same hardware resources, this sets through config the preferred cache configuration for the current context. This is only a preference. The driver will use the requested configuration if possible, but it is free to choose a different configuration if required to execute the function. Any function preference set via cuFuncSetCacheConfig() will be preferred over this context-wide setting. Setting the context-wide cache configuration to CU_FUNC_CACHE_PREFER_NONE will cause subsequent kernel launches to prefer to not change the cache configuration unless required to launch the kernel.

This setting does nothing on devices where the size of the L1 cache and shared memory are fixed.

Launching a kernel with a different preference than the most recent preference setting may insert a device-side synchronization point.

The supported cache configurations are:

Parameters:
config- Requested cache configuration
Returns:
CUDA_SUCCESS, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED, CUDA_ERROR_INVALID_CONTEXT, CUDA_ERROR_INVALID_VALUE
See also:
cuCtxAttach, cuCtxCreate, cuCtxDestroy, cuCtxDetach, cuCtxGetApiVersion, cuCtxGetCacheConfig, cuCtxGetDevice, cuCtxGetLimit, cuCtxPopCurrent, cuCtxPushCurrent, cuCtxSetLimit, cuCtxSynchronize, cuFuncSetCacheConfig
CUresult CUDAAPI cuCtxSetLimit ( CUlimit  limit,
size_t  value 
)

Set resource limits.

Setting limit to value is a request by the application to update the current limit maintained by the context. The driver is free to modify the requested value to meet h/w requirements (this could be clamping to minimum or maximum values, rounding up to nearest element size, etc). The application can use cuCtxGetLimit() to find out exactly what the limit has been set to.

Setting each CUlimit has its own specific restrictions, so each is discussed here.

  • CU_LIMIT_STACK_SIZE controls the stack size of each GPU thread. This limit is only applicable to devices of compute capability 2.0 and higher. Attempting to set this limit on devices of compute capability less than 2.0 will result in the error CUDA_ERROR_UNSUPPORTED_LIMIT being returned.
Parameters:
limit- Limit to set
value- Size in bytes of limit
Returns:
CUDA_SUCCESS, CUDA_ERROR_INVALID_VALUE, CUDA_ERROR_UNSUPPORTED_LIMIT
See also:
cuCtxAttach, cuCtxCreate, cuCtxDestroy, cuCtxDetach, cuCtxGetApiVersion, cuCtxGetCacheConfig, cuCtxGetDevice, cuCtxGetLimit, cuCtxPopCurrent, cuCtxPushCurrent, cuCtxSetCacheConfig, cuCtxSynchronize
CUresult CUDAAPI cuCtxSynchronize ( void  )

Block for a context's tasks to complete.

Blocks until the device has completed all preceding requested tasks. cuCtxSynchronize() returns an error if one of the preceding tasks failed. If the context was created with the CU_CTX_BLOCKING_SYNC flag, the CPU thread will block until the GPU context has finished its work.

Returns:
CUDA_SUCCESS, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED, CUDA_ERROR_INVALID_CONTEXT
See also:
cuCtxAttach, cuCtxCreate, cuCtxDestroy, cuCtxDetach, cuCtxGetApiVersion, cuCtxGetCacheConfig, cuCtxGetDevice, cuCtxGetLimit, cuCtxPopCurrent, cuCtxPushCurrent cuCtxSetCacheConfig, cuCtxSetLimit
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Defines