When implementing AI model support or hardware backends, clearly document known limitations, compatibility constraints, and version requirements to prevent integration issues and set proper expectations.
AI systems often have specific constraints that can cause runtime failures or unexpected behavior if not properly communicated. This includes quantization limitations, hardware-specific requirements, version dependencies, and feature compatibility matrices.
Key areas to document:
Example documentation pattern:
## Known Issues
Several limitations currently affect offline quantized model loading:
1. Mixed-bit Quantization Limitations
Mixed-bit quantization is not fully supported. Due to vLLM's layer fusion
(e.g., QKV fusion), applying different bit-widths to components within the
same fused layer can lead to compatibility issues.
2. Limited Support for Quantized MoE Models
Most quantized MoE models may encounter inference issues due to
kernel-related limitations.
This prevents users from encountering undocumented failures and helps them choose appropriate alternatives or workarounds.
Enter the URL of a public GitHub repository