Overview

The leaderboard scoring system turns GitHub activity into ranked lists. It normalizes events into signals, scores them against a configurable ruleset, and aggregates results by entity type — contributor, repository, or team.

Architecture


GitHub GraphQL API                  Signals DB              Computed Results
─────────────────────           ────────────────          ─────────────────
fetchOrgScoringDataGraphQL()
  ─── normalizeGitHubData()
    ──> upsertSignals() ──> signals (partitioned)
                                      │
                                      ▼
                              computeScores() ────> Contributor Leaderboard
                              aggregateByRepository() ──> Repository Leaderboard
                              aggregateByTeam() ──> Team Leaderboard
                                      │
                                      ▼
                              leaderboard_materializations
                              computed_scores (per preset × time_period)

Pipeline flow

The orchestration lives in lib/leaderboard/pipeline.ts:

Resolve repos for the requested scopeType — org-wide, a single repo, or a team’s repos
Fetch GitHub data via GraphQL with ETag caching (commits, PRs, issues, reviews, comments)
Normalize raw data into Signal[] with content-hash dedup
Upsert signals to the partitioned signals table (idempotent)
Score via the entity-type-specific aggregation function
Write to leaderboard_materializations + computed_scores for fast reads
Cache the response in Redis + in-memory LRU for subsequent requests

Entity-type aggregation

Three entity types, each with its own aggregation strategy:

Entity Type	Aggregation Function	Scoring Logic
`contributor`	`computeScores()`	Per-user — chronological, daily quotas, diminishing returns, multipliers
`repository`	`aggregateByRepository()`	Per-repo — base points + multipliers, no quotas/diminishing returns
`team`	`aggregateByTeam()`	Per-team — sum of member user scores

Contributor (default)

Each user’s signals are processed chronologically with full per-user scoring context. This is the most detailed mode and the default view.

Repository

Signals grouped by signal.repo. Each repo gets a fresh scoring context with skipQuota: true — daily quotas and diminishing returns don’t apply (those are per-user concepts). Zero-point conditions and penalties still apply.

Team

User scores are computed first via computeScores() (per-user with full rules). Then aggregateByTeam() maps users to teams via GitHub team membership and sums scores per team.

Key behaviors:

A user in multiple teams contributes their full score to each team
Users are deduplicated within a single team
Team memberships are fetched from GitHub via fetchOrgTeamsDataGraphQL() (GraphQL, paginated)

Caching layers

Three caching layers work together to keep things fast:

In-memory LRU cache (lib/leaderboard/request-cache.ts) — max 64 entries. Evicts oldest entries to prevent unbounded growth. On read: if fresh → return immediately; if stale → return stale data + trigger background refresh; on cold start → full compute.
Redis cache — leaderboard responses cached in Redis with configurable TTL. Falls back gracefully if Redis is unavailable.
ETag-based conditional requests — per-endpoint ETags (commits, PRs, issues) stored in repository_sync_state. If-None-Match headers avoid payload transfer on unchanged data.

Scoring Engine — signal types, algorithm, multipliers, quotas
Configuration & Presets — default ruleset, custom presets, API
Reference — DB schema, types, module map

Architecture


GitHub GraphQL API                  Signals DB              Computed Results
─────────────────────           ────────────────          ─────────────────
fetchOrgScoringDataGraphQL()
  ─── normalizeGitHubData()
    ──> upsertSignals() ──> signals (partitioned)
                                      │
                                      ▼
                              computeScores() ────> Contributor Leaderboard
                              aggregateByRepository() ──> Repository Leaderboard
                              aggregateByTeam() ──> Team Leaderboard
                                      │
                                      ▼
                              leaderboard_materializations
                              computed_scores (per preset × time_period)

Pipeline flow

The orchestration lives in lib/leaderboard/pipeline.ts:

Resolve repos for the requested scopeType — org-wide, a single repo, or a team’s repos
Fetch GitHub data via GraphQL with ETag caching (commits, PRs, issues, reviews, comments)
Normalize raw data into Signal[] with content-hash dedup
Upsert signals to the partitioned signals table (idempotent)
Score via the entity-type-specific aggregation function
Write to leaderboard_materializations + computed_scores for fast reads
Cache the response in Redis + in-memory LRU for subsequent requests

Entity-type aggregation

Three entity types, each with its own aggregation strategy:

Entity Type	Aggregation Function	Scoring Logic
`contributor`	`computeScores()`	Per-user — chronological, daily quotas, diminishing returns, multipliers
`repository`	`aggregateByRepository()`	Per-repo — base points + multipliers, no quotas/diminishing returns
`team`	`aggregateByTeam()`	Per-team — sum of member user scores

Contributor (default)

Each user’s signals are processed chronologically with full per-user scoring context. This is the most detailed mode and the default view.

Repository

Team

User scores are computed first via computeScores() (per-user with full rules). Then aggregateByTeam() maps users to teams via GitHub team membership and sums scores per team.

Key behaviors:

A user in multiple teams contributes their full score to each team
Users are deduplicated within a single team
Team memberships are fetched from GitHub via fetchOrgTeamsDataGraphQL() (GraphQL, paginated)

Caching layers

Three caching layers work together to keep things fast:

In-memory LRU cache (lib/leaderboard/request-cache.ts) — max 64 entries. Evicts oldest entries to prevent unbounded growth. On read: if fresh → return immediately; if stale → return stale data + trigger background refresh; on cold start → full compute.
Redis cache — leaderboard responses cached in Redis with configurable TTL. Falls back gracefully if Redis is unavailable.
ETag-based conditional requests — per-endpoint ETags (commits, PRs, issues) stored in repository_sync_state. If-None-Match headers avoid payload transfer on unchanged data.

Scoring Engine — signal types, algorithm, multipliers, quotas
Configuration & Presets — default ruleset, custom presets, API
Reference — DB schema, types, module map

Architecture

Pipeline flow

Entity-type aggregation

Contributor (default)

Repository

Team

Caching layers

Related

Architecture

Pipeline flow

Entity-type aggregation

Contributor (default)

Repository

Team

Caching layers

Related