Property PrismDev Hub

Backend Performance Audit Report — Project Prism

Updated Apr 3, 2026

Date: 2026-03-31 Scope: go-backend/ — all handlers, services, repositories, middleware, caching, database schema Method: Static analysis of source code + schema. No runtime profiling or query plans available. Author: Claude (automated audit)


Table of Contents

  1. Executive Summary
  2. Ranked Findings
  3. Concrete Potential Changes
  4. Recommended Next Steps
  5. What Could Not Be Verified

1. Executive Summary

The Go backend is well-architected with clean layering (handler → service → repository), proper org isolation, Redis caching with singleflight coalescing, async bounded queues, and Prometheus observability. The codebase is production-quality.

However, there are several concrete optimization opportunities, mostly in:

  • Database query patterns — COUNT queries carrying unnecessary JOINs, unbounded result sets, ILIKE on unindexed columns
  • Bulk/batch operations — sequential per-row DB calls where batching is possible (imports, scorecard calculation)
  • Cache invalidationDeletePattern collects all keys into memory before deletion
  • Middleware contention — rate limiter lock contention under high concurrency
  • Missing composite indexes — filters that hit separate single-column indexes instead of compound ones

No critical correctness bugs found. No N+1 query loops in hot paths. Auth and org isolation are solid.

Codebase at a Glance

MetricCount
Handlers27
Services29
Repositories37
Middleware9
Domain entities32
Database tables66
Explicit indexes120+
DB pool max conns25
Cache layerRedis (JSON, 1min–1hr TTL)
AuthClerk JWT + API keys (SHA-256)

2. Ranked Findings

F-01: COUNT queries carry full JOIN chains unnecessarily

Severity🔴 High
ImpactHigh
ConfidenceHigh
Filesgo-backend/internal/repository/postgres/comp_repo.go (lines 269–274)
go-backend/internal/repository/postgres/building_repo.go (lines 428–433)

What is happening: List() runs a COUNT query with the same 5–6 LEFT JOINs as the data query, even when filters only reference the base table (organization_id, lease_type, square_feet, lease_date_signed).

For comps:

SELECT COUNT(*)
FROM comp_leases c
LEFT JOIN buildings b ON ...
LEFT JOIN building_addresses ba ON ...
LEFT JOIN zip_codes z ON ...
LEFT JOIN tenant_lookup t ON ...
LEFT JOIN owner_lookup o ON ...
LEFT JOIN contacts ct ON ...
WHERE c.organization_id = $1 AND c.lease_type = ANY($2)

For buildings:

SELECT COUNT(*)
FROM buildings b
LEFT JOIN building_addresses ba ON ...
LEFT JOIN zip_codes z ON ...
LEFT JOIN building_electrical be ON ...
LEFT JOIN building_metrics_current m ON ...
LEFT JOIN scorecard_profiles sp ON ...
LEFT JOIN building_scores bs ON ...
WHERE b.organization_id = $1

Why it may be slow: PostgreSQL must execute all JOINs to count rows, even though the WHERE clause only references c.organization_id and c.lease_type — columns on the base table. The 5–6 LEFT JOINs multiply planning time and I/O for zero benefit in the count.

Suggested fix: Build a minimal COUNT query that only joins tables referenced by the active WHERE filters. If no filter touches z.city, don't join zip_codes in the count.

Risk assessment: Safe. Only changes which JOINs appear in the count — no behavioral change.


F-02: ListByBuildingID / ListByTenantID return unbounded result sets

Severity🔴 High
ImpactHigh
ConfidenceHigh
Filesgo-backend/internal/repository/postgres/comp_repo.go (lines 334–387)

What is happening: These queries have no LIMIT:

SELECT [30+ columns with 6 JOINs]
FROM comp_leases c
LEFT JOIN buildings b ON ...
LEFT JOIN building_addresses ba ON ...
LEFT JOIN zip_codes z ON ...
LEFT JOIN tenant_lookup t ON ...
LEFT JOIN owner_lookup o ON ...
LEFT JOIN contacts ct ON ...
WHERE c.organization_id = $1 AND c.building_id = $2
ORDER BY c.lease_date_signed DESC NULLS LAST
-- NO LIMIT

A building with 5,000 comps returns all 5,000 rows with 6 JOINs each. Same issue exists in tim_repo.ListByStatus.

Why it may be slow: Unbounded result set + heavy JOIN chain. Memory scales linearly with rows. A single building detail page could trigger this.

Suggested fix: Add pagination (LIMIT/OFFSET or cursor) with a sensible default (e.g., 100).

Risk assessment: Safe, but frontend may need adjustment if it expects all results at once.


F-03: BulkCalculate (scorecard) is fully sequential

Severity🔴 High
ImpactHigh
ConfidenceHigh
Filesgo-backend/internal/service/building_scorecard_service.go

What is happening: BulkCalculate() loops through buildings and calls CalculateForBuilding() sequentially. Each call makes 4–5 DB queries:

  1. scoreRepo.GetActiveProfile() — fetches the active profile (same every time)
  2. buildingRepo.GetByID() — fetches building data
  3. distRepo.ListByBuildingID() — fetches all distances
  4. laborRepo.ListPropertiesNearPoint() — spatial query with radius
  5. scoreRepo.Upsert() — persists the score

Why it may be slow: For 500 buildings: 2,500 sequential DB round-trips with zero parallelism. Each round-trip is ~1–2ms, so 500 buildings takes ~5–10 seconds minimum, purely from DB latency.

Suggested fix:

  • Pre-fetch the scorecard profile once and pass it to each calculation
  • Use a bounded worker pool (errgroup with limit of 5–10) to parallelize
  • Consider batch-fetching distances for multiple buildings in one query

Risk assessment: Needs benchmarking — must respect connection pool limits (25 max conns).


F-04: Building import does 4 DB calls per row (no batching)

Severity🔴 High
ImpactHigh
ConfidenceHigh
Filesgo-backend/internal/service/import_job_service.go

What is happening: For each building row in a CSV import:

  1. buildingParkRepo.GetOrCreateByName() — upsert by name
  2. ownerRepo.GetOrCreateByName() — upsert by name, org-scoped
  3. buildingRepo.Create() — multi-table transaction (buildings + addresses + electrical)
  4. buildingMetricsRepo.Upsert() — upsert metrics

Comps and TIMs use BulkCreate with pgx.Batch (batch size 500), but buildings don't.

Why it may be slow: 1,000 buildings = 4,000 DB round-trips. At ~1–2ms each, that's 4–8 seconds of pure latency, not counting the CSV parsing overhead.

Suggested fix:

  • Pre-resolve all parks and owners in bulk (deduplicate names, batch GetOrCreateByName)
  • Batch-insert buildings using pgx.Batch, same pattern already used for comps
  • Process in batches of 500 (matching the comp pattern)

Risk assessment: Moderate — requires transactional batch logic for multi-table inserts.


F-05: ILIKE search on unindexed columns (sequential scans)

Severity🔴 High
ImpactHigh
ConfidenceHigh
Filesgo-backend/internal/repository/postgres/building_repo.go (lines 311–317)
go-backend/internal/repository/postgres/comp_repo.go (lines 218–223)

What is happening: Search queries use ILIKE '%term%' on multiple columns OR'd together:

-- Building search
WHERE (b.building_name ILIKE $1 OR ba.street_name ILIKE $1 OR z.city ILIKE $1)

-- Comp search
WHERE (b.building_name ILIKE $1 OR t.tenant_name ILIKE $1 OR c.comments ILIKE $1)

No trigram (pg_trgm) or full-text search indexes exist on any of these columns.

Why it may be slow: ILIKE '%x%' (leading wildcard) cannot use B-tree indexes — it always performs a sequential scan. The OR across multiple joined tables forces the planner into worst-case join strategies.

Suggested fix: Add pg_trgm GIN indexes on search columns:

CREATE EXTENSION IF NOT EXISTS pg_trgm;
CREATE INDEX buildings_name_trgm_idx ON buildings USING gin (building_name gin_trgm_ops);
CREATE INDEX building_addresses_street_trgm_idx ON building_addresses USING gin (street_name gin_trgm_ops);
CREATE INDEX tenant_lookup_name_trgm_idx ON tenant_lookup USING gin (tenant_name gin_trgm_ops);

Risk assessment: Safe. Trigram indexes are purely additive — no schema change, no query syntax change needed. ILIKE automatically uses pg_trgm GIN indexes when available.


F-06: Missing composite indexes for common query patterns

Severity🟡 Medium
ImpactMedium
ConfidenceMedium (needs EXPLAIN ANALYZE to confirm)
Filesdatabase/PRISM_vNext_FINAL.sql

What is happening: Several high-frequency queries filter on (org_id + column) but only have separate single-column indexes. PostgreSQL may bitmap-combine them, but a composite index is faster.

Specific gaps:

Missing IndexUsed ByCurrent Coverage
comp_leases(organization_id, lease_date_signed DESC)Dashboard absorption, timeline, recent compsSeparate indexes on org_id and lease_date_signed
tims(organization_id, status, date_added DESC)TIM list, dashboard TIM metricsSeparate indexes on org_id, status, date_added
operations_issue_state(organization_id, queue_type, entity_type, entity_id, issue_code)Upsert identity lookupOnly (org, queue, status) index — doesn't cover entity lookup
comp_leases(organization_id, building_id)ListByBuildingIDSeparate indexes

Suggested fix:

CREATE INDEX comp_leases_org_signed_idx
  ON comp_leases(organization_id, lease_date_signed DESC);

CREATE INDEX tims_org_status_date_idx
  ON tims(organization_id, status, date_added DESC);

CREATE INDEX operations_issue_state_identity_idx
  ON operations_issue_state(organization_id, queue_type, entity_type, entity_id, issue_code);

Risk assessment: Safe — additive indexes. Monitor for write overhead on high-write tables.


F-07: DeletePattern collects all matching keys into memory before deletion

Severity🟡 Medium
ImpactMedium
ConfidenceHigh
Filesgo-backend/internal/cache/cache.go (lines 84–103)

What is happening:

func (c *RedisCache) DeletePattern(ctx context.Context, pattern string) error {
    iter := c.client.Scan(ctx, 0, c.key(pattern), 100).Iterator()
    var keys []string          // unbounded slice
    for iter.Next(ctx) {
        keys = append(keys, iter.Val())  // grows without limit
    }
    // ... then delete all at once
    c.client.Del(ctx, keys...)
}

Why it may be slow:

  • Called on every building/comp/TIM write (cache invalidation)
  • If an org has 10,000 cached list variants, this allocates a 10,000-element slice
  • No pre-allocation, so Go's append doubles capacity repeatedly, causing GC pressure
  • Single DEL call with 10,000 keys may timeout

Suggested fix: Delete in batches of 100–500 keys during SCAN iteration using Redis pipelines:

pipe := c.client.Pipeline()
count := 0
for iter.Next(ctx) {
    pipe.Del(ctx, iter.Val())
    count++
    if count >= 500 {
        pipe.Exec(ctx)
        pipe = c.client.Pipeline()
        count = 0
    }
}
if count > 0 { pipe.Exec(ctx) }

Risk assessment: Safe.


F-08: Rate limiter cleanup holds write lock during full map iteration

Severity🟡 Medium
ImpactMedium
ConfidenceMedium
Filesgo-backend/internal/middleware/rate_limit.go

What is happening: The cleanup goroutine acquires a write lock on the entire bucket map and iterates all entries to find stale ones. During this window, all Allow() calls are blocked (they need at least a read lock).

Additionally, the Allow() path itself has nested locking:

  1. RLock to check if bucket exists
  2. If miss → WLock to create bucket
  3. Bucket-level Mutex to check/update tokens

Why it may be slow: With thousands of distinct API keys/orgs, cleanup blocks all rate limiting for the full iteration duration. The nested lock pattern also creates contention under high concurrency.

Suggested fix:

  • Collect stale keys under a read lock, then delete them under a write lock (two-phase cleanup)
  • Or use sync.Map for lock-free reads
  • Consider sharding the bucket map by hash of key

Risk assessment: Needs benchmarking — sync.Map has different performance characteristics.


F-09: SHA-256 hash + 2 DB lookups on every API key auth request

Severity🟡 Medium
ImpactMedium
ConfidenceHigh
Filesgo-backend/internal/middleware/auth.go

What is happening: Every request authenticated via API key:

  1. SHA-256 hash of the raw key (~200ns)
  2. DB lookup: api_keys table by hash (~1–2ms)
  3. DB lookup: organizations table by org_id (~1–2ms)
  4. Async: UPDATE api_keys SET last_used_at = NOW() (bounded queue)

Why it may be slow: 2–4ms of auth overhead per request, purely from DB queries. JWT auth (Clerk) doesn't have this problem because JWKS are cached locally.

Suggested fix: Cache validated (key_hash → org, scopes, expiry, revoked_at) in Redis or a local LRU with 30–60s TTL. Invalidate on revoke.

Risk assessment: Low risk. Must handle key revocation correctly — check revoked_at on cache refresh.


F-10: Distance calculation fetches ALL key points regardless of proximity

Severity🟡 Medium
ImpactMedium
ConfidenceMedium
Filesgo-backend/internal/service/distance_calculation_service.go

What is happening: CalculateForBuilding() fetches up to 1,000 key points and makes one routing API call per key point (OSRM/Mapbox), with a 100ms delay between calls for rate limiting.

Why it may be slow:

  • A building in Texas doesn't need driving distance to a key point in New York
  • 100 key points × 100ms delay = 10 seconds minimum per building
  • External API calls are the bottleneck, not local compute
  • Wastes routing API quota on irrelevant distances

Suggested fix: Pre-filter key points by bounding box or haversine distance (e.g., within 200 miles) before calling the routing API. The key_points table has a GIST spatial index — use ST_DWithin to filter.

Risk assessment: Safe — reduces unnecessary external calls without changing results for relevant key points.


F-11: CSV is read 2–3 times during import

Severity🟡 Medium
ImpactLow–Medium
ConfidenceHigh
Filesgo-backend/internal/service/import_job_service.go

What is happening:

  1. readCSVMeta() — reads full file for row count + headers
  2. Validate() — reads full file again, validates each row
  3. Process() — reads full file a third time, inserts valid rows

Header-mapping logic is duplicated across all three functions.

Why it may be slow: Triple file I/O. For a 100MB CSV, that's 300MB of disk reads. Also duplicated code increases maintenance burden.

Suggested fix:

  • Read once, cache metadata from first pass
  • Combine validation and processing into a single streaming pass (validate row → insert if valid → accumulate errors)
  • Extract shared CSV mapping setup to a helper

Risk assessment: Moderate — requires refactoring the import pipeline. Current two-phase approach (validate then process) may be intentional UX.


F-12: Building list cache key is hyper-specific (likely low hit rate)

Severity🟢 Low–Medium
ImpactLow–Medium
ConfidenceMedium (needs Redis metrics)
Filesgo-backend/internal/service/building_service.go

What is happening: The list cache key includes: org ID, page number, page size, sort column, sort order, and all filter values (submarket IDs, corridor IDs, cities, states, property types, building statuses, classes, RBA ranges, score ranges, vacancy filters, etc.).

Any change in any filter = complete cache miss.

Why it may be slow: Users apply many different filter combinations. The cache is still written to on every miss (Redis SET), burning Redis bandwidth without corresponding hits. With 5-minute TTL and diverse queries, most keys expire unused.

Suggested fix:

  • Check Redis hit/miss metrics to validate this hypothesis
  • If confirmed: either cache at coarser granularity (unfiltered first page per org) or skip list caching entirely and rely on DB performance + singleflight
  • The singleflight already handles thundering herd — the cache may be redundant for lists

Risk assessment: Needs measurement before changing.


F-13: Scorecard profile fetched from DB on every single calculation

Severity🟢 Low
ImpactLow
ConfidenceHigh
Filesgo-backend/internal/service/building_scorecard_service.go

What is happening: CalculateForBuilding() calls scoreRepo.GetActiveProfile() on every invocation. The active profile is application-wide and rarely changes. In BulkCalculate for 500 buildings, that's 500 identical queries.

Suggested fix: Cache the active profile in-memory with 1–5 min TTL, or fetch once in BulkCalculate and pass as parameter.

Risk assessment: Safe.


F-14: HTTP WriteTimeout of 11 minutes applies to ALL endpoints

Severity🟢 Low
ImpactLow
ConfidenceMedium
Filesgo-backend/cmd/api/main.go

What is happening:

&http.Server{
    WriteTimeout: 11 * time.Minute,  // Must exceed export timeouts (10m)
}

This is set to accommodate long-running export streams. However, it applies to all endpoints. A slow client on a non-export endpoint holds a goroutine + connection for up to 11 minutes.

Suggested fix: Use per-route timeouts via http.TimeoutHandler wrapper:

  • CRUD endpoints: 30 seconds
  • Dashboard/search: 60 seconds
  • Export endpoints: 11 minutes (current behavior)

Risk assessment: Safe, but requires route-level middleware changes.


F-15: Sentry sample rate at 20% with no per-endpoint control

Severity🟢 Low
ImpactLow
ConfidenceLow
Filesgo-backend/cmd/api/main.go

What is happening: TracesSampleRate: 0.2 applies uniformly. High-traffic endpoints (building list, dashboard) are sampled at the same rate as rare admin endpoints.

Suggested fix: Use TracesSampler function to apply dynamic rates:

  • Health checks, metrics: 0% (skip)
  • High-traffic reads: 5%
  • Mutations: 20%
  • Errors/slow requests: 100%

Risk assessment: Safe.


3. Concrete Potential Changes

Ordered by impact-to-effort ratio (best ROI first):

PriorityChangeFilesEffortImpactRiskStatus
1Add pg_trgm GIN indexes on search columnsmigrations/add_performance_indexes.sqlLowHighSafeDone
2Add composite indexes (org+date, org+status+date)migrations/add_performance_indexes.sqlLowMediumSafeDone
3Add LIMIT to ListByBuildingID, ListByTenantIDcomp_repo.goLowHighSafeDone (1000 cap; ListByStatus left unbounded — single-table, no JOINs)
4Cache scorecard profile in BulkCalculatebuilding_scorecard_service.goLowLowSafeDone
5Batch DeletePattern during SCAN iterationcache.goLowMediumSafeDone
6Minimal COUNT queries (join only referenced tables)comp_repo.go, building_repo.goMediumHighSafeDone
7Cache API key validation in Redis (30s TTL)auth.goMediumMediumLow riskDeferred — crosses middleware→cache boundary
8Parallelize BulkCalculate with bounded worker poolbuilding_scorecard_service.goMediumHighNeeds benchmarkingDeferred — needs pool saturation data
9Pre-filter key points by proximitydistance_calculation_service.goMediumMediumSafeDone (300mi haversine pre-filter)
10Per-route write timeoutsmain.goMediumLowSafeDeferred
11Batch building imports (pre-resolve + batch insert)import_job_service.goHighHighNeeds testingDeferred — import loop refactor
12Single-pass CSV importimport_job_service.goHighLow–MedModerateDeferred — intentional UX split
13Fix cache key stability (sort slices, compact format)building_service.go, comp_service.go, tim_service.goLowLow–MedSafeDone (F-17)
14Dynamic Sentry trace samplingmain.goLowLowSafeDone (F-15: 0% health, 5% GET, 20% mutations)
15Batch spatial queries in BulkCreatebuilding_service.go, geography_repo.goMediumHighSafeDone (F-16: 2 queries instead of 2N)

4. Implementation Status & Next Steps

Implemented (2026-03-31)

All Phase 1 quick wins plus several Phase 3 items shipped in a single pass:

  • F-01: Minimal COUNT queries — comp and building list counts only JOIN tables referenced by active filters
  • F-02: LIMIT 1000 on ListByBuildingID and ListByTenantID (comp queries with 6 JOINs). ListByStatus left unbounded (single-table, lightweight rows, caller expects all results)
  • F-05: pg_trgm GIN indexes on building_name, street_name, city, tenant_name, comments
  • F-06: Composite indexes on (org, date), (org, status, date), (org, building), (org, tenant), operations identity
  • F-07: DeletePattern now deletes in batches of 500 via Redis pipelines
  • F-10: Haversine pre-filter (300mi radius) before routing API calls — skips distant key points
  • F-13: BulkCalculate pre-fetches the active scorecard profile once instead of N times
  • F-15: Dynamic Sentry trace sampling — 0% health/metrics, 5% GET reads, 20% mutations
  • F-16: Batch spatial queries in BulkCreate — 2 queries total instead of 2 per building
  • F-17: Cache key stability — slices sorted and comma-delimited for consistent keys

Remaining: Measure First

Before tackling these, gather production data:

  1. Enable pg_stat_statements — top 20 slowest queries by total time
  2. Check Redis INFO statskeyspace_hits vs keyspace_misses to validate F-12 (list cache hit rate)
  3. Check Prometheushttp_request_duration_seconds p95/p99 by endpoint
  4. Check db_pool_active_connections gauge during peak bulk operations

Remaining: Data-Dependent

ItemTriggerEffort
F-03: Parallelize BulkCalculate with errgroupPool not saturating during bulk opsMedium
F-04: Batch building imports (pre-resolve parks/owners)Import speed is a measured problemHigh
F-08: Rate limiter lock contention (sharded mutexes)High concurrent API key trafficMedium
F-09: Cache API key validation in RedisAPI key auth latency is measurableMedium
F-12: Evaluate list cache utilityRedis metrics show low hit rateLow

Remaining: Low Priority

  • F-11: Single-pass CSV import — validate-then-process is intentional UX
  • F-14: Per-route write timeouts — low impact

5. What Could Not Be Verified

AreaWhyHow to Verify
Actual query plansNo access to EXPLAIN ANALYZE on production data. Index recommendations are based on schema + query structure, not measured plans.Run EXPLAIN (ANALYZE, BUFFERS) on the queries from F-01, F-05, F-06 against production-scale data.
Cache hit ratesNo access to Redis metrics. F-12 (list cache utility) is an educated guess.Check redis-cli INFO stats — compare keyspace_hits to keyspace_misses. Also check per-key-pattern hit rates if available.
Real traffic patternsNo access to Prometheus or Sentry dashboards. Endpoint frequency and p95 latency unknown.Query Prometheus: histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) grouped by path.
Export stored proceduresexport.refresh_* functions are defined in the DB, not in repo code. Their internal performance is a black box.Run EXPLAIN ANALYZE on the refresh functions. Check if export tables have proper indexes for the refresh queries.
buildings_with_metrics_v viewDashboard queries reference this view. Whether it's materialized or a regular view affects performance significantly.Check: SELECT relkind FROM pg_class WHERE relname = 'buildings_with_metrics_v'v = view, m = materialized view.
Connection pool saturationPool is 25 max conns. Under concurrent BulkCalculate or import, this could saturate.Check Prometheus: db_pool_active_connections gauge during peak bulk operations.
Import worker concurrencyThe standalone import worker's actual concurrency setting in production is unknown.Check deployment config or env var IMPORT_WORKER_CONCURRENCY.
Rate limiter bucket countCan't assess real lock contention without knowing how many distinct orgs/API keys exist.Add a Prometheus gauge for rate_limit_bucket_count or log it periodically.

6. Code-Verified Review Notes

Reviewer: Claude (code-checked review, 2026-03-31) Method: Spot-checked findings F-01, F-02, F-03, F-05, F-07, F-13 against source code. All descriptions matched.

Findings confirmed as-described

  • F-01: COUNT query at comp_repo.go:273 unconditionally uses compJoinClause(). Accurate.
  • F-02: ListByBuildingID / ListByTenantID at comp_repo.go:334–387 have no LIMIT. Accurate.
  • F-03: BulkCalculate at building_scorecard_service.go:129 is a pure sequential loop. Accurate.
  • F-05: Leading-wildcard ILIKE at comp_repo.go:218–220 with no trigram index. Accurate.
  • F-07: DeletePattern at cache.go:84–103 matches the audit description verbatim. Accurate.
  • F-13: GetActiveProfile() called inside CalculateForBuilding at building_scorecard_service.go:86, re-executed per building in BulkCalculate. Accurate.

Pushback / nuance on specific findings

F-02 — Risk is understated. Adding LIMIT to ListByBuildingID/ListByTenantID is a behavioral change, not just a performance fix. If the building detail page or any aggregation logic relies on receiving all comps, a hard LIMIT silently truncates data. Callers must be audited before adding a limit. A soft limit with a warning log may be safer initially.

F-04 — Effort is understated. The per-building transaction spans multiple tables (buildings + addresses + electrical). The existing comp BulkCreate using pgx.Batch is single-table, so the same pattern doesn't directly apply. This is closer to "High" effort than the table suggests.

F-08 — sync.Map is the wrong suggestion. sync.Map is optimized for read-heavy, append-only workloads. The rate limiter mutates on every Allow() call (decrementing tokens), which is the worst case for sync.Map. Sharded sync.Mutex maps (partition by hash of key) would be a better fit for reducing contention without changing the locking model.

F-11 — Intentional UX deserves more weight. The validate-then-process split lets users see all validation errors before any data is written. Merging into a single streaming pass changes that contract. This should not be prioritized unless import speed is a measured problem.

F-12 — Measure before acting. Singleflight already handles thundering herd. The Redis writes are cheap even if most keys expire unused. Measure hit/miss rates before changing anything here.

Additional findings missed by the original audit

F-16: BulkCreate runs sequential spatial queries for geography derivation

Severity🔴 High
ImpactHigh
ConfidenceHigh
Filesgo-backend/internal/service/building_service.go (lines 264–268, 354–375)

BulkCreate calls deriveGeography() per building in a loop (line 265–268). Each call makes 2 spatial DB queries (GetSubmarketByPoint, GetCorridorByPoint). For 1,000 buildings that's 2,000 sequential round-trips — same class of problem as F-03 and F-04. A batch spatial lookup (e.g., ST_Within against all points in one query) would reduce this to 2 queries total. This should be added to Phase 3 alongside the other bulk-operation improvements.

F-17: Cache key generation produces unstable, verbose keys

Severity🟢 Low
ImpactLow
ConfidenceHigh
Filesgo-backend/internal/service/building_service.go (lines 450–475)

generateListCacheKey uses fmt.Sprintf(":sm%v", f.SubmarketIDs) which produces keys like :sm[1 2 3]. These are order-dependent (same filter set in different order = cache miss) and verbose. Reinforces the F-12 hypothesis that list cache hit rates are low. If list caching is kept, keys should sort slice values and use a compact delimiter (e.g., :sm1,2,3) or hash the filter struct.

Adjusted priority recommendation

All items below priority 7 have been implemented. See §3 and §4 for full status.

PriorityChangeStatus
1pg_trgm GIN indexes (F-05)Done
2Composite indexes (F-06)Done
3Minimal COUNT queries (F-01)Done
4Cache scorecard profile (F-13)Done
5Batch DeletePattern (F-07)Done
6LIMIT on comp queries (F-02)Done (1000 cap on comp queries; ListByStatus left unbounded)
7Pre-filter key points (F-10)Done (300mi haversine)
8Batch spatial queries (F-16)Done (LATERAL UNNEST, 2 queries total)
9Cache key stability (F-17)Done (sorted, comma-delimited)
10Dynamic Sentry sampling (F-15)Done
11+F-03, F-04, F-08, F-09, F-11, F-12, F-14Deferred — see §4

Appendix: Architecture Overview

For reference, the backend follows this structure:

HTTP Request
  → RealIP → Sentry → Recoverer → Compress → CORS → Logging
  → Prometheus → AuditLogger → MaxBody → SecurityHeaders
  → RateLimiter → Auth (JWT or API Key)
  → Handler (parse HTTP, validate input)
    → Service (business logic, caching, singleflight)
      → Repository (SQL, parameterized queries, org-scoped)
        → PostgreSQL (pgx/v5, pool of 25 conns)
      → Redis Cache (JSON, TTL 1min–1hr)
  → Response (JSON)

Background processes:

  • Import worker (standalone binary, polls import_jobs table)
  • JWKS refresh (hourly, in-process)
  • DB pool metrics collector (every 15s, in-process)
  • Async bounded queues: audit logging, error capture, API key last_used_at updates