Test algorithmic behavior

huggingface/tokenizers

Based on 2 comments

Python

When testing algorithms, verify their actual functionality rather than just checking for proper instantiation or property modification. Ensure your tests capture the core behaviors of the algorithm under various parameters.

Algorithms Python

Reviewer Prompt

For deterministic algorithms:

Test with different input parameters and verify the expected output
Include edge cases that might affect algorithm behavior

For non-deterministic algorithms:

Focus on testing invariant properties rather than exact outputs
Identify the stable parts of the output that should always be consistent

Example:

# Insufficient test - only checks properties
def test_fixed_length_weak():
    pretok = FixedLength(length=5)
    assert pretok.length == 5
    
    pretok.length = 10
    assert pretok.length == 10

# Better test - verifies actual algorithm behavior
def test_fixed_length_complete():
    test_string = "This is a test string for tokenization"
    
    # Test with length=5
    pretok = FixedLength(length=5)
    output_5 = pretok.pre_tokenize(test_string)
    assert len(output_5[0]) == 5  # First chunk should be 5 chars
    assert "".join([t for t, _ in output_5]) == test_string  # Original text preserved
    
    # Test with length=10
    pretok.length = 10
    output_10 = pretok.pre_tokenize(test_string)
    assert len(output_10[0]) == 10  # First chunk should be 10 chars
    assert len(output_10) < len(output_5)  # Fewer chunks with larger length

For algorithms with inherent variability (like probabilistic models), focus on testing stable properties rather than specific outputs that might change between runs.

Comments Analyzed

Python

Primary Language

Algorithms

Test algorithmic behavior

Reviewer Prompt

Source Discussions

Add Repository

Private Repository