NNvidia·DSASDE-2Onsite – Coding 2
Matrix Transpose / Memory Coalescing
Problem
Given a large 2D matrix, write code to transpose it efficiently. Then discuss how a naive transpose causes uncoalesced memory access on a GPU and how tiling with shared memory fixes it.
Example
Tile-based transpose using shared memory avoids strided global loads.
Constraints
- Matrix may exceed cache
What NVIDIA looks for
Reasoning about memory hierarchy, coalescing, and bank conflicts — not just the CPU algorithm.
added 6 days ago