Default rule: in performance-sensitive code, prevent hidden blowups and avoid expensive work being executed repeatedly in hot paths or on import.

Apply this checklist: 1) Replace inefficient primitives inside loops

2) Don’t do global scans per iteration in graph/ordering logic

3) Cache expensive introspection/metadata

4) Prefer lazy, memoized loading over eager initialization

5) Choose fast strategies by default (when the API is designed for it)

Example (dependency resolution):

from collections import deque

# Build reverse deps once: dep -> set(dependents)
reverse_deps = {}
for cls, deps in dep_graph.items():
    for dep in deps:
        reverse_deps.setdefault(dep, set()).add(cls)

queue = deque(sorted([c for c in instance_by_class if in_degree[c] == 0], key=lambda x: x.__name__))
while queue:
    current = queue.popleft()
    for dependent in reverse_deps.get(current, ()):  # only affected neighbors
        in_degree[dependent] -= 1
        if in_degree[dependent] == 0:
            queue.append(dependent)