Backend Performance Audit Report — Project Prism

Updated May 4, 2026

Date: 2026-03-31 Scope: go-backend/ — all handlers, services, repositories, middleware, caching, database schema Method: Static analysis of source code + schema. No runtime profiling or query plans available. Author: Claude (automated audit)

Executive Summary
Ranked Findings
Concrete Potential Changes
Recommended Next Steps
What Could Not Be Verified

1. Executive Summary

The Go backend is well-architected with clean layering (handler → service → repository), proper org isolation, Redis caching with singleflight coalescing, async bounded queues, and Prometheus observability. The codebase is production-quality.

However, there are several concrete optimization opportunities, mostly in:

Database query patterns — COUNT queries carrying unnecessary JOINs, unbounded result sets, ILIKE on unindexed columns
Bulk/batch operations — sequential per-row DB calls where batching is possible (imports, scorecard calculation)
Cache invalidation — DeletePattern collects all keys into memory before deletion
Middleware contention — rate limiter lock contention under high concurrency
Missing composite indexes — filters that hit separate single-column indexes instead of compound ones

No critical correctness bugs found. No N+1 query loops in hot paths. Auth and org isolation are solid.

Codebase at a Glance

Metric	Count
Handlers	27
Services	29
Repositories	37
Middleware	9
Domain entities	32
Database tables	66
Explicit indexes	120+
DB pool max conns	25
Cache layer	Redis (JSON, 1min–1hr TTL)
Auth	Clerk JWT + API keys (SHA-256)

2. Ranked Findings

F-01: COUNT queries carry full JOIN chains unnecessarily


Severity	🔴 High
Impact	High
Confidence	High
Files	`go-backend/internal/repository/postgres/comp_repo.go` (lines 269–274)
	`go-backend/internal/repository/postgres/building_repo.go` (lines 428–433)

What is happening: List() runs a COUNT query with the same 5–6 LEFT JOINs as the data query, even when filters only reference the base table (organization_id, lease_type, square_feet, lease_date_signed).

For comps:

SELECT COUNT(*)
FROM comp_leases c
LEFT JOIN buildings b ON ...
LEFT JOIN building_addresses ba ON ...
LEFT JOIN zip_codes z ON ...
LEFT JOIN tenant_lookup t ON ...
LEFT JOIN owner_lookup o ON ...
LEFT JOIN contacts ct ON ...
WHERE c.organization_id = $1 AND c.lease_type = ANY($2)

For buildings:

SELECT COUNT(*)
FROM buildings b
LEFT JOIN building_addresses ba ON ...
LEFT JOIN zip_codes z ON ...
LEFT JOIN building_electrical be ON ...
LEFT JOIN building_metrics_current m ON ...
LEFT JOIN scorecard_profiles sp ON ...
LEFT JOIN building_scores bs ON ...
WHERE b.organization_id = $1

Why it may be slow: PostgreSQL must execute all JOINs to count rows, even though the WHERE clause only references c.organization_id and c.lease_type — columns on the base table. The 5–6 LEFT JOINs multiply planning time and I/O for zero benefit in the count.

Suggested fix: Build a minimal COUNT query that only joins tables referenced by the active WHERE filters. If no filter touches z.city, don't join zip_codes in the count.

Risk assessment: Safe. Only changes which JOINs appear in the count — no behavioral change.

F-02: `ListByBuildingID` / `ListByTenantID` return unbounded result sets


Severity	🔴 High
Impact	High
Confidence	High
Files	`go-backend/internal/repository/postgres/comp_repo.go` (lines 334–387)

What is happening: These queries have no LIMIT:

SELECT [30+ columns with 6 JOINs]
FROM comp_leases c
LEFT JOIN buildings b ON ...
LEFT JOIN building_addresses ba ON ...
LEFT JOIN zip_codes z ON ...
LEFT JOIN tenant_lookup t ON ...
LEFT JOIN owner_lookup o ON ...
LEFT JOIN contacts ct ON ...
WHERE c.organization_id = $1 AND c.building_id = $2
ORDER BY c.lease_date_signed DESC NULLS LAST
-- NO LIMIT

A building with 5,000 comps returns all 5,000 rows with 6 JOINs each. Same issue exists in tim_repo.ListByStatus.

Why it may be slow: Unbounded result set + heavy JOIN chain. Memory scales linearly with rows. A single building detail page could trigger this.

Suggested fix: Add pagination (LIMIT/OFFSET or cursor) with a sensible default (e.g., 100).

Risk assessment: Safe, but frontend may need adjustment if it expects all results at once.

F-03: BulkCalculate (scorecard) is fully sequential


Severity	🔴 High
Impact	High
Confidence	High
Files	`go-backend/internal/service/building_scorecard_service.go`

What is happening: BulkCalculate() loops through buildings and calls CalculateForBuilding() sequentially. Each call makes 4–5 DB queries:

scoreRepo.GetActiveProfile() — fetches the active profile (same every time)
buildingRepo.GetByID() — fetches building data
distRepo.ListByBuildingID() — fetches all distances
laborRepo.ListPropertiesNearPoint() — spatial query with radius
scoreRepo.Upsert() — persists the score

Why it may be slow: For 500 buildings: 2,500 sequential DB round-trips with zero parallelism. Each round-trip is ~1–2ms, so 500 buildings takes ~5–10 seconds minimum, purely from DB latency.

Suggested fix:

Pre-fetch the scorecard profile once and pass it to each calculation
Use a bounded worker pool (errgroup with limit of 5–10) to parallelize
Consider batch-fetching distances for multiple buildings in one query

Risk assessment: Needs benchmarking — must respect connection pool limits (25 max conns).

F-04: Building import does 4 DB calls per row (no batching)


Severity	🔴 High
Impact	High
Confidence	High
Files	`go-backend/internal/service/import_job_service.go`

What is happening: For each building row in a CSV import:

buildingParkRepo.GetOrCreateByName() — upsert by name
ownerRepo.GetOrCreateByName() — upsert by name, org-scoped
buildingRepo.Create() — multi-table transaction (buildings + addresses + electrical)
buildingMetricsRepo.Upsert() — upsert metrics

Comps and TIMs use BulkCreate with pgx.Batch (batch size 500), but buildings don't.

Why it may be slow: 1,000 buildings = 4,000 DB round-trips. At ~1–2ms each, that's 4–8 seconds of pure latency, not counting the CSV parsing overhead.

Suggested fix:

Pre-resolve all parks and owners in bulk (deduplicate names, batch GetOrCreateByName)
Batch-insert buildings using pgx.Batch, same pattern already used for comps
Process in batches of 500 (matching the comp pattern)

Risk assessment: Moderate — requires transactional batch logic for multi-table inserts.

F-05: ILIKE search on unindexed columns (sequential scans)


Severity	🔴 High
Impact	High
Confidence	High
Files	`go-backend/internal/repository/postgres/building_repo.go` (lines 311–317)
	`go-backend/internal/repository/postgres/comp_repo.go` (lines 218–223)

What is happening: Search queries use ILIKE '%term%' on multiple columns OR'd together:

-- Building search
WHERE (b.building_name ILIKE $1 OR ba.street_name ILIKE $1 OR z.city ILIKE $1)

-- Comp search
WHERE (b.building_name ILIKE $1 OR t.tenant_name ILIKE $1 OR c.comments ILIKE $1)

No trigram (pg_trgm) or full-text search indexes exist on any of these columns.

Why it may be slow: ILIKE '%x%' (leading wildcard) cannot use B-tree indexes — it always performs a sequential scan. The OR across multiple joined tables forces the planner into worst-case join strategies.

Suggested fix: Add pg_trgm GIN indexes on search columns:

CREATE EXTENSION IF NOT EXISTS pg_trgm;
CREATE INDEX buildings_name_trgm_idx ON buildings USING gin (building_name gin_trgm_ops);
CREATE INDEX building_addresses_street_trgm_idx ON building_addresses USING gin (street_name gin_trgm_ops);
CREATE INDEX tenant_lookup_name_trgm_idx ON tenant_lookup USING gin (tenant_name gin_trgm_ops);

Risk assessment: Safe. Trigram indexes are purely additive — no schema change, no query syntax change needed. ILIKE automatically uses pg_trgm GIN indexes when available.

F-06: Missing composite indexes for common query patterns


Severity	🟡 Medium
Impact	Medium
Confidence	Medium (needs EXPLAIN ANALYZE to confirm)
Files	`database/PRISM_vNext_FINAL.sql`

What is happening: Several high-frequency queries filter on (org_id + column) but only have separate single-column indexes. PostgreSQL may bitmap-combine them, but a composite index is faster.

Specific gaps:

Missing Index	Used By	Current Coverage
`comp_leases(organization_id, lease_date_signed DESC)`	Dashboard absorption, timeline, recent comps	Separate indexes on `org_id` and `lease_date_signed`
`tims(organization_id, status, date_added DESC)`	TIM list, dashboard TIM metrics	Separate indexes on `org_id`, `status`, `date_added`
`operations_issue_state(organization_id, queue_type, entity_type, entity_id, issue_code)`	Upsert identity lookup	Only `(org, queue, status)` index — doesn't cover entity lookup
`comp_leases(organization_id, building_id)`	`ListByBuildingID`	Separate indexes

Suggested fix:

CREATE INDEX comp_leases_org_signed_idx
  ON comp_leases(organization_id, lease_date_signed DESC);

CREATE INDEX tims_org_status_date_idx
  ON tims(organization_id, status, date_added DESC);

CREATE INDEX operations_issue_state_identity_idx
  ON operations_issue_state(organization_id, queue_type, entity_type, entity_id, issue_code);

Risk assessment: Safe — additive indexes. Monitor for write overhead on high-write tables.

F-07: `DeletePattern` collects all matching keys into memory before deletion


Severity	🟡 Medium
Impact	Medium
Confidence	High
Files	`go-backend/internal/cache/cache.go` (lines 84–103)

What is happening:

func (c *RedisCache) DeletePattern(ctx context.Context, pattern string) error {
    iter := c.client.Scan(ctx, 0, c.key(pattern), 100).Iterator()
    var keys []string          // unbounded slice
    for iter.Next(ctx) {
        keys = append(keys, iter.Val())  // grows without limit
    }
    // ... then delete all at once
    c.client.Del(ctx, keys...)
}

Why it may be slow:

Called on every building/comp/TIM write (cache invalidation)
If an org has 10,000 cached list variants, this allocates a 10,000-element slice
No pre-allocation, so Go's append doubles capacity repeatedly, causing GC pressure
Single DEL call with 10,000 keys may timeout

Suggested fix: Delete in batches of 100–500 keys during SCAN iteration using Redis pipelines:

pipe := c.client.Pipeline()
count := 0
for iter.Next(ctx) {
    pipe.Del(ctx, iter.Val())
    count++
    if count >= 500 {
        pipe.Exec(ctx)
        pipe = c.client.Pipeline()
        count = 0
    }
}
if count > 0 { pipe.Exec(ctx) }

Risk assessment: Safe.

F-08: Rate limiter cleanup holds write lock during full map iteration


Severity	🟡 Medium
Impact	Medium
Confidence	Medium
Files	`go-backend/internal/middleware/rate_limit.go`

What is happening: The cleanup goroutine acquires a write lock on the entire bucket map and iterates all entries to find stale ones. During this window, all Allow() calls are blocked (they need at least a read lock).

Additionally, the Allow() path itself has nested locking:

RLock to check if bucket exists
If miss → WLock to create bucket
Bucket-level Mutex to check/update tokens

Why it may be slow: With thousands of distinct API keys/orgs, cleanup blocks all rate limiting for the full iteration duration. The nested lock pattern also creates contention under high concurrency.

Suggested fix:

Collect stale keys under a read lock, then delete them under a write lock (two-phase cleanup)
Or use sync.Map for lock-free reads
Consider sharding the bucket map by hash of key

Risk assessment: Needs benchmarking — sync.Map has different performance characteristics.

F-09: SHA-256 hash + 2 DB lookups on every API key auth request


Severity	🟡 Medium
Impact	Medium
Confidence	High
Files	`go-backend/internal/middleware/auth.go`

What is happening: Every request authenticated via API key:

SHA-256 hash of the raw key (~200ns)
DB lookup: api_keys table by hash (~1–2ms)
DB lookup: organizations table by org_id (~1–2ms)
Async: UPDATE api_keys SET last_used_at = NOW() (bounded queue)

Why it may be slow: 2–4ms of auth overhead per request, purely from DB queries. JWT auth (Clerk) doesn't have this problem because JWKS are cached locally.

Suggested fix: Cache validated (key_hash → org, scopes, expiry, revoked_at) in Redis or a local LRU with 30–60s TTL. Invalidate on revoke.

Risk assessment: Low risk. Must handle key revocation correctly — check revoked_at on cache refresh.

F-10: Distance calculation fetches ALL key points regardless of proximity


Severity	🟡 Medium
Impact	Medium
Confidence	Medium
Files	`go-backend/internal/service/distance_calculation_service.go`

What is happening: CalculateForBuilding() fetches up to 1,000 key points and makes one routing API call per key point (OSRM/Mapbox), with a 100ms delay between calls for rate limiting.

Why it may be slow:

A building in Texas doesn't need driving distance to a key point in New York
100 key points × 100ms delay = 10 seconds minimum per building
External API calls are the bottleneck, not local compute
Wastes routing API quota on irrelevant distances

Suggested fix: Pre-filter key points by bounding box or haversine distance (e.g., within 200 miles) before calling the routing API. The key_points table has a GIST spatial index — use ST_DWithin to filter.

Risk assessment: Safe — reduces unnecessary external calls without changing results for relevant key points.

F-11: CSV is read 2–3 times during import


Severity	🟡 Medium
Impact	Low–Medium
Confidence	High
Files	`go-backend/internal/service/import_job_service.go`

What is happening:

readCSVMeta() — reads full file for row count + headers
Validate() — reads full file again, validates each row
Process() — reads full file a third time, inserts valid rows

Header-mapping logic is duplicated across all three functions.

Why it may be slow: Triple file I/O. For a 100MB CSV, that's 300MB of disk reads. Also duplicated code increases maintenance burden.

Suggested fix:

Read once, cache metadata from first pass
Combine validation and processing into a single streaming pass (validate row → insert if valid → accumulate errors)
Extract shared CSV mapping setup to a helper

Risk assessment: Moderate — requires refactoring the import pipeline. Current two-phase approach (validate then process) may be intentional UX.

F-12: Building list cache key is hyper-specific (likely low hit rate)


Severity	🟢 Low–Medium
Impact	Low–Medium
Confidence	Medium (needs Redis metrics)
Files	`go-backend/internal/service/building_service.go`

What is happening: The list cache key includes: org ID, page number, page size, sort column, sort order, and all filter values (submarket IDs, corridor IDs, cities, states, property types, building statuses, classes, RBA ranges, score ranges, vacancy filters, etc.).

Any change in any filter = complete cache miss.

Why it may be slow: Users apply many different filter combinations. The cache is still written to on every miss (Redis SET), burning Redis bandwidth without corresponding hits. With 5-minute TTL and diverse queries, most keys expire unused.

Suggested fix:

Check Redis hit/miss metrics to validate this hypothesis
If confirmed: either cache at coarser granularity (unfiltered first page per org) or skip list caching entirely and rely on DB performance + singleflight
The singleflight already handles thundering herd — the cache may be redundant for lists

Risk assessment: Needs measurement before changing.

F-13: Scorecard profile fetched from DB on every single calculation


Severity	🟢 Low
Impact	Low
Confidence	High
Files	`go-backend/internal/service/building_scorecard_service.go`

What is happening: CalculateForBuilding() calls scoreRepo.GetActiveProfile() on every invocation. The active profile is application-wide and rarely changes. In BulkCalculate for 500 buildings, that's 500 identical queries.

Suggested fix: Cache the active profile in-memory with 1–5 min TTL, or fetch once in BulkCalculate and pass as parameter.

Risk assessment: Safe.

F-14: HTTP `WriteTimeout` of 11 minutes applies to ALL endpoints


Severity	🟢 Low
Impact	Low
Confidence	Medium
Files	`go-backend/cmd/api/main.go`

What is happening:

&http.Server{
    WriteTimeout: 11 * time.Minute,  // Must exceed export timeouts (10m)
}

This is set to accommodate long-running export streams. However, it applies to all endpoints. A slow client on a non-export endpoint holds a goroutine + connection for up to 11 minutes.

Suggested fix: Use per-route timeouts via http.TimeoutHandler wrapper:

CRUD endpoints: 30 seconds
Dashboard/search: 60 seconds
Export endpoints: 11 minutes (current behavior)

Risk assessment: Safe, but requires route-level middleware changes.

F-15: Sentry sample rate at 20% with no per-endpoint control


Severity	🟢 Low
Impact	Low
Confidence	Low
Files	`go-backend/cmd/api/main.go`

What is happening: TracesSampleRate: 0.2 applies uniformly. High-traffic endpoints (building list, dashboard) are sampled at the same rate as rare admin endpoints.

Suggested fix: Use TracesSampler function to apply dynamic rates:

Health checks, metrics: 0% (skip)
High-traffic reads: 5%
Mutations: 20%
Errors/slow requests: 100%

Risk assessment: Safe.

3. Concrete Potential Changes

Ordered by impact-to-effort ratio (best ROI first):

Priority	Change	Files	Effort	Impact	Risk	Status
1	Add `pg_trgm` GIN indexes on search columns	`migrations/add_performance_indexes.sql`	Low	High	Safe	Done
2	Add composite indexes (`org+date`, `org+status+date`)	`migrations/add_performance_indexes.sql`	Low	Medium	Safe	Done
3	Add LIMIT to `ListByBuildingID`, `ListByTenantID`	`comp_repo.go`	Low	High	Safe	Done (1000 cap; `ListByStatus` left unbounded — single-table, no JOINs)
4	Cache scorecard profile in BulkCalculate	`building_scorecard_service.go`	Low	Low	Safe	Done
5	Batch `DeletePattern` during SCAN iteration	`cache.go`	Low	Medium	Safe	Done
6	Minimal COUNT queries (join only referenced tables)	`comp_repo.go`, `building_repo.go`	Medium	High	Safe	Done
7	Cache API key validation in Redis (30s TTL)	`auth.go`	Medium	Medium	Low risk	Deferred — crosses middleware→cache boundary
8	Parallelize `BulkCalculate` with bounded worker pool	`building_scorecard_service.go`	Medium	High	Needs benchmarking	Deferred — needs pool saturation data
9	Pre-filter key points by proximity	`distance_calculation_service.go`	Medium	Medium	Safe	Done (300mi haversine pre-filter)
10	Per-route write timeouts	`main.go`	Medium	Low	Safe	Deferred
11	Batch building imports (pre-resolve + batch insert)	`import_job_service.go`	High	High	Needs testing	Deferred — import loop refactor
12	Single-pass CSV import	`import_job_service.go`	High	Low–Med	Moderate	Deferred — intentional UX split
13	Fix cache key stability (sort slices, compact format)	`building_service.go`, `comp_service.go`, `tim_service.go`	Low	Low–Med	Safe	Done (F-17)
14	Dynamic Sentry trace sampling	`main.go`	Low	Low	Safe	Done (F-15: 0% health, 5% GET, 20% mutations)
15	Batch spatial queries in BulkCreate	`building_service.go`, `geography_repo.go`	Medium	High	Safe	Done (F-16: 2 queries instead of 2N)

4. Implementation Status & Next Steps

Implemented (2026-03-31)

All Phase 1 quick wins plus several Phase 3 items shipped in a single pass:

F-01: Minimal COUNT queries — comp and building list counts only JOIN tables referenced by active filters
F-02: LIMIT 1000 on ListByBuildingID and ListByTenantID (comp queries with 6 JOINs). ListByStatus left unbounded (single-table, lightweight rows, caller expects all results)
F-05: pg_trgm GIN indexes on building_name, street_name, city, tenant_name, comments
F-06: Composite indexes on (org, date), (org, status, date), (org, building), (org, tenant), operations identity
F-07: DeletePattern now deletes in batches of 500 via Redis pipelines
F-10: Haversine pre-filter (300mi radius) before routing API calls — skips distant key points
F-13: BulkCalculate pre-fetches the active scorecard profile once instead of N times
F-15: Dynamic Sentry trace sampling — 0% health/metrics, 5% GET reads, 20% mutations
F-16: Batch spatial queries in BulkCreate — 2 queries total instead of 2 per building
F-17: Cache key stability — slices sorted and comma-delimited for consistent keys

Remaining: Measure First

Before tackling these, gather production data:

Enable pg_stat_statements — top 20 slowest queries by total time
Check Redis INFO stats — keyspace_hits vs keyspace_misses to validate F-12 (list cache hit rate)
Check Prometheus — http_request_duration_seconds p95/p99 by endpoint
Check db_pool_active_connections gauge during peak bulk operations

Remaining: Data-Dependent

Item	Trigger	Effort
F-03: Parallelize `BulkCalculate` with errgroup	Pool not saturating during bulk ops	Medium
F-04: Batch building imports (pre-resolve parks/owners)	Import speed is a measured problem	High
F-08: Rate limiter lock contention (sharded mutexes)	High concurrent API key traffic	Medium
F-09: Cache API key validation in Redis	API key auth latency is measurable	Medium
F-12: Evaluate list cache utility	Redis metrics show low hit rate	Low

Remaining: Low Priority

F-11: Single-pass CSV import — validate-then-process is intentional UX
F-14: Per-route write timeouts — low impact

5. What Could Not Be Verified

Area	Why	How to Verify
Actual query plans	No access to `EXPLAIN ANALYZE` on production data. Index recommendations are based on schema + query structure, not measured plans.	Run `EXPLAIN (ANALYZE, BUFFERS)` on the queries from F-01, F-05, F-06 against production-scale data.
Cache hit rates	No access to Redis metrics. F-12 (list cache utility) is an educated guess.	Check `redis-cli INFO stats` — compare `keyspace_hits` to `keyspace_misses`. Also check per-key-pattern hit rates if available.
Real traffic patterns	No access to Prometheus or Sentry dashboards. Endpoint frequency and p95 latency unknown.	Query Prometheus: `histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))` grouped by path.
Export stored procedures	`export.refresh_*` functions are defined in the DB, not in repo code. Their internal performance is a black box.	Run `EXPLAIN ANALYZE` on the refresh functions. Check if export tables have proper indexes for the refresh queries.
`buildings_with_metrics_v` view	Dashboard queries reference this view. Whether it's materialized or a regular view affects performance significantly.	Check: `SELECT relkind FROM pg_class WHERE relname = 'buildings_with_metrics_v'` — `v` = view, `m` = materialized view.
Connection pool saturation	Pool is 25 max conns. Under concurrent BulkCalculate or import, this could saturate.	Check Prometheus: `db_pool_active_connections` gauge during peak bulk operations.
Import worker concurrency	The standalone import worker's actual concurrency setting in production is unknown.	Check deployment config or env var `IMPORT_WORKER_CONCURRENCY`.
Rate limiter bucket count	Can't assess real lock contention without knowing how many distinct orgs/API keys exist.	Add a Prometheus gauge for `rate_limit_bucket_count` or log it periodically.

6. Code-Verified Review Notes

Reviewer: Claude (code-checked review, 2026-03-31) Method: Spot-checked findings F-01, F-02, F-03, F-05, F-07, F-13 against source code. All descriptions matched.

Findings confirmed as-described

F-01: COUNT query at comp_repo.go:273 unconditionally uses compJoinClause(). Accurate.
F-02: ListByBuildingID / ListByTenantID at comp_repo.go:334–387 have no LIMIT. Accurate.
F-03: BulkCalculate at building_scorecard_service.go:129 is a pure sequential loop. Accurate.
F-05: Leading-wildcard ILIKE at comp_repo.go:218–220 with no trigram index. Accurate.
F-07: DeletePattern at cache.go:84–103 matches the audit description verbatim. Accurate.
F-13: GetActiveProfile() called inside CalculateForBuilding at building_scorecard_service.go:86, re-executed per building in BulkCalculate. Accurate.

Pushback / nuance on specific findings

F-02 — Risk is understated. Adding LIMIT to ListByBuildingID/ListByTenantID is a behavioral change, not just a performance fix. If the building detail page or any aggregation logic relies on receiving all comps, a hard LIMIT silently truncates data. Callers must be audited before adding a limit. A soft limit with a warning log may be safer initially.

F-04 — Effort is understated. The per-building transaction spans multiple tables (buildings + addresses + electrical). The existing comp BulkCreate using pgx.Batch is single-table, so the same pattern doesn't directly apply. This is closer to "High" effort than the table suggests.

F-08 — sync.Map is the wrong suggestion. sync.Map is optimized for read-heavy, append-only workloads. The rate limiter mutates on every Allow() call (decrementing tokens), which is the worst case for sync.Map. Sharded sync.Mutex maps (partition by hash of key) would be a better fit for reducing contention without changing the locking model.

F-11 — Intentional UX deserves more weight. The validate-then-process split lets users see all validation errors before any data is written. Merging into a single streaming pass changes that contract. This should not be prioritized unless import speed is a measured problem.

F-12 — Measure before acting. Singleflight already handles thundering herd. The Redis writes are cheap even if most keys expire unused. Measure hit/miss rates before changing anything here.

Additional findings missed by the original audit

F-16: BulkCreate runs sequential spatial queries for geography derivation


Severity	🔴 High
Impact	High
Confidence	High
Files	`go-backend/internal/service/building_service.go` (lines 264–268, 354–375)

BulkCreate calls deriveGeography() per building in a loop (line 265–268). Each call makes 2 spatial DB queries (GetSubmarketByPoint, GetCorridorByPoint). For 1,000 buildings that's 2,000 sequential round-trips — same class of problem as F-03 and F-04. A batch spatial lookup (e.g., ST_Within against all points in one query) would reduce this to 2 queries total. This should be added to Phase 3 alongside the other bulk-operation improvements.

F-17: Cache key generation produces unstable, verbose keys


Severity	🟢 Low
Impact	Low
Confidence	High
Files	`go-backend/internal/service/building_service.go` (lines 450–475)

generateListCacheKey uses fmt.Sprintf(":sm%v", f.SubmarketIDs) which produces keys like :sm[1 2 3]. These are order-dependent (same filter set in different order = cache miss) and verbose. Reinforces the F-12 hypothesis that list cache hit rates are low. If list caching is kept, keys should sort slice values and use a compact delimiter (e.g., :sm1,2,3) or hash the filter struct.

Adjusted priority recommendation

All items below priority 7 have been implemented. See §3 and §4 for full status.

Priority	Change	Status
1	`pg_trgm` GIN indexes (F-05)	Done
2	Composite indexes (F-06)	Done
3	Minimal COUNT queries (F-01)	Done
4	Cache scorecard profile (F-13)	Done
5	Batch `DeletePattern` (F-07)	Done
6	LIMIT on comp queries (F-02)	Done (1000 cap on comp queries; ListByStatus left unbounded)
7	Pre-filter key points (F-10)	Done (300mi haversine)
8	Batch spatial queries (F-16)	Done (LATERAL UNNEST, 2 queries total)
9	Cache key stability (F-17)	Done (sorted, comma-delimited)
10	Dynamic Sentry sampling (F-15)	Done
11+	F-03, F-04, F-08, F-09, F-11, F-12, F-14	Deferred — see §4

Appendix: Architecture Overview

For reference, the backend follows this structure:

HTTP Request
  → RealIP → Sentry → Recoverer → Compress → CORS → Logging
  → Prometheus → AuditLogger → MaxBody → SecurityHeaders
  → RateLimiter → Auth (JWT or API Key)
  → Handler (parse HTTP, validate input)
    → Service (business logic, caching, singleflight)
      → Repository (SQL, parameterized queries, org-scoped)
        → PostgreSQL (pgx/v5, pool of 25 conns)
      → Redis Cache (JSON, TTL 1min–1hr)
  → Response (JSON)

Background processes:

Import worker (standalone binary, polls import_jobs table)
JWKS refresh (hourly, in-process)
DB pool metrics collector (every 15s, in-process)
Async bounded queues: audit logging, error capture, API key last_used_at updates

Backend Performance Audit Report — Project Prism

Table of Contents

1. Executive Summary

Codebase at a Glance

2. Ranked Findings

F-01: COUNT queries carry full JOIN chains unnecessarily

F-02: ListByBuildingID / ListByTenantID return unbounded result sets

F-03: BulkCalculate (scorecard) is fully sequential

F-04: Building import does 4 DB calls per row (no batching)

F-05: ILIKE search on unindexed columns (sequential scans)

F-06: Missing composite indexes for common query patterns

F-07: DeletePattern collects all matching keys into memory before deletion

F-08: Rate limiter cleanup holds write lock during full map iteration

F-09: SHA-256 hash + 2 DB lookups on every API key auth request

F-10: Distance calculation fetches ALL key points regardless of proximity

F-11: CSV is read 2–3 times during import

F-12: Building list cache key is hyper-specific (likely low hit rate)

F-13: Scorecard profile fetched from DB on every single calculation

F-14: HTTP WriteTimeout of 11 minutes applies to ALL endpoints

F-15: Sentry sample rate at 20% with no per-endpoint control

3. Concrete Potential Changes

4. Implementation Status & Next Steps

Implemented (2026-03-31)

Remaining: Measure First

Remaining: Data-Dependent

Remaining: Low Priority

5. What Could Not Be Verified

6. Code-Verified Review Notes

Findings confirmed as-described

Pushback / nuance on specific findings

Additional findings missed by the original audit

Adjusted priority recommendation

Appendix: Architecture Overview

F-02: `ListByBuildingID` / `ListByTenantID` return unbounded result sets

F-07: `DeletePattern` collects all matching keys into memory before deletion

F-14: HTTP `WriteTimeout` of 11 minutes applies to ALL endpoints