Back to all reviewers

PEP 257 docstring compliance

Unstructured-IO/unstructured
Based on 11 comments
Python

All public modules, classes, methods, and functions must have docstrings that follow PEP 257 formatting standards. This ensures consistent, professional documentation across the codebase.

Documentation Python

Reviewer Prompt

All public modules, classes, methods, and functions must have docstrings that follow PEP 257 formatting standards. This ensures consistent, professional documentation across the codebase.

Required PEP 257 Format:

  • Use triple double-quotes (""")
  • First line must be a complete sentence that fits on one line
  • For multi-line docstrings, add a blank line after the summary
  • Closing triple-quotes on separate line for multi-line docstrings
  • Describe the contract/purpose, not implementation details

Examples:

Single-line docstring:

def get_nltk_data_dir() -> str | None:
    """The directory where `nltk` resources are located."""

Multi-line docstring:

def clean_pdfminer_inner_elements(document: "DocumentLayout") -> "DocumentLayout":
    """Move pdfminer elements from inside tables to the extra_info dictionary.

    Each element appears in the `extra_info` dictionary using its table id as the key.
    """

Module docstring:

"""NDJSON file utilities for document processing.

This module provides functionality for reading and writing newline-delimited JSON files.
Used primarily for batch processing of document elements during partitioning workflows.
"""

Coverage Requirements:

  • All public modules should explain their purpose and usage
  • All public classes should document their role and key methods
  • All public methods and functions require docstrings
  • Internal functions benefit from docstrings for maintainability

This standard improves code discoverability, reduces onboarding time, and maintains professional documentation quality throughout the codebase.

11
Comments Analyzed
Python
Primary Language
Documentation
Category

Source Discussions