When adding or changing model loading, quantization, LoRA/abliteration logic, or merging behavior, follow these explicit rules to ensure correctness, compatibility and reproducibility.

1) Explicit quantization and dtype handling

Example: quant_cfg = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_compute_dtype=torch.bfloat16, …)

2) LoRA adapter configuration and scaling

3) Row normalization / magnitude preservation modes

Snippet (FULL flow outline): W = W_org.view(W.shape[0], -1) W = W + lora_B @ lora_A W = F.normalize(W, p=2, dim=1) W = W * W_row_norms W = W - W_org U, S, Vh = torch.svd_lowrank(W, q=2*r+4)

split sqrt(S) across U and V

lora_B = U[:, :r] @ diag(sqrt_S) lora_A = diag(sqrt_S) @ Vh[:r, :]

4) Compatibility & hybrid fallback

5) Safe merging for quantized models

6) Data processing and diagnostics

7) UX, tests, and diagnostics

Why this matters (motivation)

References: discussions about LoRA scaling, normalization and merging, quantization config and compute dtype, hybrid compatibility and PEFT unwrapping, winsorization axis and testing guidance (see source discussions).