Emulator and Related Tools
GPU Ocelot’s PTX emulator enables CUDA applications to be executed on a functional simulator that computes the complete architectural state of a GPU for each dynamic instruction. This may be augmented with user-defined trace generators which react to dynamic instruction traces as the program is executing enabling real-time workload characterization and correctness checks. Existing trace analyzers provide support for memory access checks, race detection, an interactive debugger, and feedback for performing tuning.
Andrew Kerr, Gregory Diamos, Sudhakar Yalamanchili. GPU Application Development, Debugging, and Performance Tuning with GPU Ocelot. GPU Computing GEMS Jade Edition, 1st Edition. September 2011. [paper]