When integrating LLMs/embeddings into production code, treat the model as unreliable: enforce a stable IO contract (especially JSON), validate and safely recover from format mismatches, and route model calls through the project’s unified abstractions with configurable parameters.

Apply this checklist: 1) Make LLM outputs contract-safe

Assume responses may not contain the expected ```json block or may include extra text.
Parse defensively (try strict code-fence JSON first, then fall back to the first JSON object), validate required fields, and on failure return a safe default (e.g., empty dict) rather than throwing.

Example (defensive JSON parsing):

import json, re
from typing import Optional, Dict

def parse_llm_json(text: str) -> Dict:
    # 1) Prefer ```json ... ```
    code_block = re.search(r"```json\s*(.*?)\s*```", text, re.S|re.I)
    candidate = code_block.group(1) if code_block else None

    # 2) Fallback: first JSON object
    if not candidate:
        obj = re.search(r"\{.*?\}", text, re.S)
        candidate = obj.group(0) if obj else ""

    if not candidate:
        return {}

    try:
        data = json.loads(candidate)
    except json.JSONDecodeError:
        return {}

    # 3) Validate minimum contract
    if not isinstance(data, dict):
        return {}
    return data

2) Never call provider SDKs directly in core business logic

Use the project’s LLM abstraction layer (e.g., the same path as LLMExtractor/LLMTranslator) to centralize auth, retries, routing, and message handling.
Avoid global mutable history variables; rely on framework-managed conversation state.

3) Keep embeddings/models configurable and extensible

Don’t hardcode embedding dimension or provider model details.
Accept/configure embedding_fn (or an equivalent abstraction) so switching embedding providers/models doesn’t require code changes.

4) Keep message/history handling consistent

If a proxy backend can’t handle multi-turn directly, convert messages only; let the framework manage multi-turn formatting.

5) Add fallback behavior for empty/failed vector sub-results

If vector-based graph/entity expansion returns empty, fall back to keyword-based search (or another deterministic strategy) so retrieval doesn’t silently degrade.

This standard prevents production breakage from minor LLM formatting drift, improves portability across LLM providers, and makes embedding/model changes safe and configuration-driven.

Add Repository

Private Repository