Use the language parser/grammar (e.g., Langium/Jison) for diagram syntax instead of hand-rolled regex parsing. Regex is allowed only for small, well-bounded token-level helpers when (a) it precisely matches the intended grammar slice (e.g., remove only outer quotes), (b) it doesn’t destructively change user content or semantics, and (c) it preserves source positions/ranges by avoiding pre-processing that removes/reflows lines before parsing.
Apply these rules:
.langium / grammar layer.%% ... comments and directives in the parser so AST line/range data stays accurate.Example: wrapper-quote removal must be outer-only (not global):
function stripOuterQuotes(s: string): string {
const t = s.trim();
if ((t.startsWith('"') && t.endsWith('"')) || (t.startsWith("'") && t.endsWith("'"))) {
// remove only the outer quotes
return t.slice(1, -1);
}
return t;
}
When you must match with regex (e.g., token-level extraction), keep it narrowly scoped and back it with tests for edge cases (escaped quotes, whitespace variations, delimiter presence, comments).
Enter the URL of a public GitHub repository