2dbi
Home/Nvidia/Matrix Transpose / Memory Coalescing
NNvidia·DSASDE-2Onsite – Coding 2

Matrix Transpose / Memory Coalescing

Problem

Given a large 2D matrix, write code to transpose it efficiently. Then discuss how a naive transpose causes uncoalesced memory access on a GPU and how tiling with shared memory fixes it.

Example

Tile-based transpose using shared memory avoids strided global loads.

Constraints

  • Matrix may exceed cache

What NVIDIA looks for

Reasoning about memory hierarchy, coalescing, and bank conflicts — not just the CPU algorithm.

added 6 days ago
LeadersAccount