Motivation: Large models and matrix ops can produce big ephemeral tensors, cross-device copies, and temporary allocations that lead to excessive memory use or OOMs. Adopt consistent, device-aware tensor handling, prefer built-in ops, and free large temporaries at clear boundaries.

Guidelines:

When to use which pattern:

Result: following these practices reduces peak/resident memory, avoids hidden cross-device allocations, and keeps heavy numeric code both readable and efficient.