two-phase filtering algorithms

PostHog/posthog

Based on 2 comments

Python

When working with large datasets or complex matching operations, implement algorithms that use a two-phase approach: first filter candidates using efficient broad criteria, then verify matches with precise logic. This pattern significantly improves performance by reducing the number of expensive operations.

Algorithms Python

Reviewer Prompt

The approach is particularly effective when:

Database-level filtering can eliminate most irrelevant records
Verification logic is computationally expensive
Memory usage needs to be minimized

Example implementation:

def get_dependent_cohorts_reverse(cohort: Cohort) -> list[Cohort]:
    # Phase 1: Database-level filtering using broad criteria
    filter_conditions = Q()
    filter_conditions |= Q(filters__icontains=f'"value": {cohort.id}')
    filter_conditions |= Q(filters__icontains=f'"value": "{str(cohort.id)}"')
    
    candidate_cohorts = (
        Cohort.objects.filter(filter_conditions, team=cohort.team, deleted=False)
        .exclude(id=cohort.id)
    )
    
    dependent_cohorts = []
    
    # Phase 2: Precise verification of filtered candidates
    for candidate_cohort in candidate_cohorts:
        for prop in candidate_cohort.properties.flat:
            if prop.type == "cohort" and not isinstance(prop.value, list):
                try:
                    if int(prop.value) == cohort.id:
                        dependent_cohorts.append(candidate_cohort)
                        break
                except (ValueError, TypeError):
                    continue
    
    return dependent_cohorts

This pattern avoids loading all records into memory and performing expensive operations on irrelevant data, instead using the database’s indexing and filtering capabilities first.

Comments Analyzed

Python

Primary Language

Algorithms

two-phase filtering algorithms

Reviewer Prompt

Source Discussions

Add Repository

Private Repository