Back to all reviewers

embedding provider configuration

Unstructured-IO/unstructured
Based on 2 comments
Shell

When configuring embedding providers in AI integrations, ensure proper dimension specification and use dedicated test files for different providers. Always specify the embedding dimension that matches your chosen provider (e.g., 384 for huggingface embedders) and avoid modifying existing configurations for testing purposes.

AI Shell

Reviewer Prompt

When configuring embedding providers in AI integrations, ensure proper dimension specification and use dedicated test files for different providers. Always specify the embedding dimension that matches your chosen provider (e.g., 384 for huggingface embedders) and avoid modifying existing configurations for testing purposes.

Instead of changing embedding providers in existing files for testing, create separate test files for each provider. This prevents disruption to working configurations and provides clearer test isolation.

Example of proper configuration:

PYTHONPATH=. ./unstructured/ingest/main.py \
  --embedding-provider "langchain-huggingface" \
  --embedding-dimension 384 \
  astra \
  --token "$ASTRA_DB_APPLICATION_TOKEN" \
  --collection-name "$COLLECTION_NAME"

For testing new providers, create dedicated test files like test_unstructured_ingest/src/local-embed-vertexai.sh rather than modifying existing provider configurations. This approach maintains stability while enabling comprehensive testing of different AI embedding services.

2
Comments Analyzed
Shell
Primary Language
AI
Category

Source Discussions