Always preserve raw network response data early in the processing pipeline before applying any transformations or parsing. Once network data is processed or encoded, critical information may be permanently lost and cannot be recovered later.
This is particularly important for:
Example of the problem:
// BAD: Encoding conversion after string processing
function encodeRawHTML(document: Document): Document {
// At this point, UTF-8 misencoding is "too destructive"
// to recover original Shift_JIS content
const decoder = new TextDecoder(charset);
document.rawHtml = decoder.decode(stringData); // Information already lost
}
// BAD: Using processed HTML for link extraction
linksOnPage = extractLinks(html, urlToScrap); // May miss links from raw content
Move encoding and parsing operations to earlier parts of the stack where you still have access to the raw network buffers. This prevents permanent data loss and ensures all network content is available for processing.
Enter the URL of a public GitHub repository