Tests should validate actual system behavior rather than just verifying method calls. When writing tests, focus on asserting the expected outcomes and side effects rather than simply checking if methods were invoked.

Bad example:

def test_task_execution():
    with patch.object(Agent, "_execute", return_value="ok") as execute:
        result = task.execute()
        assert result.raw == "ok"
        execute.assert_called_once()  # Only validates the call

Good example:

@tool("test_tool", result_as_answer=True)
def delayed_tool():
    sleep(5)  # Actual behavior
    return "ok"

def test_task_execution():
    agent = Agent(tools=[delayed_tool])
    task = Task(description="test", agent=agent)
    
    result = task.execute()
    assert result.raw == "ok"
    # Test validates actual timing behavior
    assert task.execution_time >= 5

Key points: