Backend Performance Audit Report — Project Prism
Date: 2026-03-31
Scope: go-backend/ — all handlers, services, repositories, middleware, caching, database schema
Method: Static analysis of source code + schema. No runtime profiling or query plans available.
Author: Claude (automated audit)
Table of Contents
- Executive Summary
- Ranked Findings
- Concrete Potential Changes
- Recommended Next Steps
- What Could Not Be Verified
1. Executive Summary
The Go backend is well-architected with clean layering (handler → service → repository), proper org isolation, Redis caching with singleflight coalescing, async bounded queues, and Prometheus observability. The codebase is production-quality.
However, there are several concrete optimization opportunities, mostly in:
- Database query patterns — COUNT queries carrying unnecessary JOINs, unbounded result sets, ILIKE on unindexed columns
- Bulk/batch operations — sequential per-row DB calls where batching is possible (imports, scorecard calculation)
- Cache invalidation —
DeletePatterncollects all keys into memory before deletion - Middleware contention — rate limiter lock contention under high concurrency
- Missing composite indexes — filters that hit separate single-column indexes instead of compound ones
No critical correctness bugs found. No N+1 query loops in hot paths. Auth and org isolation are solid.
Codebase at a Glance
| Metric | Count |
|---|---|
| Handlers | 27 |
| Services | 29 |
| Repositories | 37 |
| Middleware | 9 |
| Domain entities | 32 |
| Database tables | 66 |
| Explicit indexes | 120+ |
| DB pool max conns | 25 |
| Cache layer | Redis (JSON, 1min–1hr TTL) |
| Auth | Clerk JWT + API keys (SHA-256) |
2. Ranked Findings
F-01: COUNT queries carry full JOIN chains unnecessarily
| Severity | 🔴 High |
| Impact | High |
| Confidence | High |
| Files | go-backend/internal/repository/postgres/comp_repo.go (lines 269–274) |
go-backend/internal/repository/postgres/building_repo.go (lines 428–433) |
What is happening:
List() runs a COUNT query with the same 5–6 LEFT JOINs as the data query, even when filters only reference the base table (organization_id, lease_type, square_feet, lease_date_signed).
For comps:
SELECT COUNT(*)
FROM comp_leases c
LEFT JOIN buildings b ON ...
LEFT JOIN building_addresses ba ON ...
LEFT JOIN zip_codes z ON ...
LEFT JOIN tenant_lookup t ON ...
LEFT JOIN owner_lookup o ON ...
LEFT JOIN contacts ct ON ...
WHERE c.organization_id = $1 AND c.lease_type = ANY($2)
For buildings:
SELECT COUNT(*)
FROM buildings b
LEFT JOIN building_addresses ba ON ...
LEFT JOIN zip_codes z ON ...
LEFT JOIN building_electrical be ON ...
LEFT JOIN building_metrics_current m ON ...
LEFT JOIN scorecard_profiles sp ON ...
LEFT JOIN building_scores bs ON ...
WHERE b.organization_id = $1
Why it may be slow:
PostgreSQL must execute all JOINs to count rows, even though the WHERE clause only references c.organization_id and c.lease_type — columns on the base table. The 5–6 LEFT JOINs multiply planning time and I/O for zero benefit in the count.
Suggested fix:
Build a minimal COUNT query that only joins tables referenced by the active WHERE filters. If no filter touches z.city, don't join zip_codes in the count.
Risk assessment: Safe. Only changes which JOINs appear in the count — no behavioral change.
F-02: ListByBuildingID / ListByTenantID return unbounded result sets
| Severity | 🔴 High |
| Impact | High |
| Confidence | High |
| Files | go-backend/internal/repository/postgres/comp_repo.go (lines 334–387) |
What is happening: These queries have no LIMIT:
SELECT [30+ columns with 6 JOINs]
FROM comp_leases c
LEFT JOIN buildings b ON ...
LEFT JOIN building_addresses ba ON ...
LEFT JOIN zip_codes z ON ...
LEFT JOIN tenant_lookup t ON ...
LEFT JOIN owner_lookup o ON ...
LEFT JOIN contacts ct ON ...
WHERE c.organization_id = $1 AND c.building_id = $2
ORDER BY c.lease_date_signed DESC NULLS LAST
-- NO LIMIT
A building with 5,000 comps returns all 5,000 rows with 6 JOINs each. Same issue exists in tim_repo.ListByStatus.
Why it may be slow: Unbounded result set + heavy JOIN chain. Memory scales linearly with rows. A single building detail page could trigger this.
Suggested fix: Add pagination (LIMIT/OFFSET or cursor) with a sensible default (e.g., 100).
Risk assessment: Safe, but frontend may need adjustment if it expects all results at once.
F-03: BulkCalculate (scorecard) is fully sequential
| Severity | 🔴 High |
| Impact | High |
| Confidence | High |
| Files | go-backend/internal/service/building_scorecard_service.go |
What is happening:
BulkCalculate() loops through buildings and calls CalculateForBuilding() sequentially. Each call makes 4–5 DB queries:
scoreRepo.GetActiveProfile()— fetches the active profile (same every time)buildingRepo.GetByID()— fetches building datadistRepo.ListByBuildingID()— fetches all distanceslaborRepo.ListPropertiesNearPoint()— spatial query with radiusscoreRepo.Upsert()— persists the score
Why it may be slow: For 500 buildings: 2,500 sequential DB round-trips with zero parallelism. Each round-trip is ~1–2ms, so 500 buildings takes ~5–10 seconds minimum, purely from DB latency.
Suggested fix:
- Pre-fetch the scorecard profile once and pass it to each calculation
- Use a bounded worker pool (
errgroupwith limit of 5–10) to parallelize - Consider batch-fetching distances for multiple buildings in one query
Risk assessment: Needs benchmarking — must respect connection pool limits (25 max conns).
F-04: Building import does 4 DB calls per row (no batching)
| Severity | 🔴 High |
| Impact | High |
| Confidence | High |
| Files | go-backend/internal/service/import_job_service.go |
What is happening: For each building row in a CSV import:
buildingParkRepo.GetOrCreateByName()— upsert by nameownerRepo.GetOrCreateByName()— upsert by name, org-scopedbuildingRepo.Create()— multi-table transaction (buildings + addresses + electrical)buildingMetricsRepo.Upsert()— upsert metrics
Comps and TIMs use BulkCreate with pgx.Batch (batch size 500), but buildings don't.
Why it may be slow: 1,000 buildings = 4,000 DB round-trips. At ~1–2ms each, that's 4–8 seconds of pure latency, not counting the CSV parsing overhead.
Suggested fix:
- Pre-resolve all parks and owners in bulk (deduplicate names, batch
GetOrCreateByName) - Batch-insert buildings using
pgx.Batch, same pattern already used for comps - Process in batches of 500 (matching the comp pattern)
Risk assessment: Moderate — requires transactional batch logic for multi-table inserts.
F-05: ILIKE search on unindexed columns (sequential scans)
| Severity | 🔴 High |
| Impact | High |
| Confidence | High |
| Files | go-backend/internal/repository/postgres/building_repo.go (lines 311–317) |
go-backend/internal/repository/postgres/comp_repo.go (lines 218–223) |
What is happening:
Search queries use ILIKE '%term%' on multiple columns OR'd together:
-- Building search
WHERE (b.building_name ILIKE $1 OR ba.street_name ILIKE $1 OR z.city ILIKE $1)
-- Comp search
WHERE (b.building_name ILIKE $1 OR t.tenant_name ILIKE $1 OR c.comments ILIKE $1)
No trigram (pg_trgm) or full-text search indexes exist on any of these columns.
Why it may be slow:
ILIKE '%x%' (leading wildcard) cannot use B-tree indexes — it always performs a sequential scan. The OR across multiple joined tables forces the planner into worst-case join strategies.
Suggested fix:
Add pg_trgm GIN indexes on search columns:
CREATE EXTENSION IF NOT EXISTS pg_trgm;
CREATE INDEX buildings_name_trgm_idx ON buildings USING gin (building_name gin_trgm_ops);
CREATE INDEX building_addresses_street_trgm_idx ON building_addresses USING gin (street_name gin_trgm_ops);
CREATE INDEX tenant_lookup_name_trgm_idx ON tenant_lookup USING gin (tenant_name gin_trgm_ops);
Risk assessment: Safe. Trigram indexes are purely additive — no schema change, no query syntax change needed. ILIKE automatically uses pg_trgm GIN indexes when available.
F-06: Missing composite indexes for common query patterns
| Severity | 🟡 Medium |
| Impact | Medium |
| Confidence | Medium (needs EXPLAIN ANALYZE to confirm) |
| Files | database/PRISM_vNext_FINAL.sql |
What is happening:
Several high-frequency queries filter on (org_id + column) but only have separate single-column indexes. PostgreSQL may bitmap-combine them, but a composite index is faster.
Specific gaps:
| Missing Index | Used By | Current Coverage |
|---|---|---|
comp_leases(organization_id, lease_date_signed DESC) | Dashboard absorption, timeline, recent comps | Separate indexes on org_id and lease_date_signed |
tims(organization_id, status, date_added DESC) | TIM list, dashboard TIM metrics | Separate indexes on org_id, status, date_added |
operations_issue_state(organization_id, queue_type, entity_type, entity_id, issue_code) | Upsert identity lookup | Only (org, queue, status) index — doesn't cover entity lookup |
comp_leases(organization_id, building_id) | ListByBuildingID | Separate indexes |
Suggested fix:
CREATE INDEX comp_leases_org_signed_idx
ON comp_leases(organization_id, lease_date_signed DESC);
CREATE INDEX tims_org_status_date_idx
ON tims(organization_id, status, date_added DESC);
CREATE INDEX operations_issue_state_identity_idx
ON operations_issue_state(organization_id, queue_type, entity_type, entity_id, issue_code);
Risk assessment: Safe — additive indexes. Monitor for write overhead on high-write tables.
F-07: DeletePattern collects all matching keys into memory before deletion
| Severity | 🟡 Medium |
| Impact | Medium |
| Confidence | High |
| Files | go-backend/internal/cache/cache.go (lines 84–103) |
What is happening:
func (c *RedisCache) DeletePattern(ctx context.Context, pattern string) error {
iter := c.client.Scan(ctx, 0, c.key(pattern), 100).Iterator()
var keys []string // unbounded slice
for iter.Next(ctx) {
keys = append(keys, iter.Val()) // grows without limit
}
// ... then delete all at once
c.client.Del(ctx, keys...)
}
Why it may be slow:
- Called on every building/comp/TIM write (cache invalidation)
- If an org has 10,000 cached list variants, this allocates a 10,000-element slice
- No pre-allocation, so Go's
appenddoubles capacity repeatedly, causing GC pressure - Single
DELcall with 10,000 keys may timeout
Suggested fix: Delete in batches of 100–500 keys during SCAN iteration using Redis pipelines:
pipe := c.client.Pipeline()
count := 0
for iter.Next(ctx) {
pipe.Del(ctx, iter.Val())
count++
if count >= 500 {
pipe.Exec(ctx)
pipe = c.client.Pipeline()
count = 0
}
}
if count > 0 { pipe.Exec(ctx) }
Risk assessment: Safe.
F-08: Rate limiter cleanup holds write lock during full map iteration
| Severity | 🟡 Medium |
| Impact | Medium |
| Confidence | Medium |
| Files | go-backend/internal/middleware/rate_limit.go |
What is happening:
The cleanup goroutine acquires a write lock on the entire bucket map and iterates all entries to find stale ones. During this window, all Allow() calls are blocked (they need at least a read lock).
Additionally, the Allow() path itself has nested locking:
- RLock to check if bucket exists
- If miss → WLock to create bucket
- Bucket-level Mutex to check/update tokens
Why it may be slow: With thousands of distinct API keys/orgs, cleanup blocks all rate limiting for the full iteration duration. The nested lock pattern also creates contention under high concurrency.
Suggested fix:
- Collect stale keys under a read lock, then delete them under a write lock (two-phase cleanup)
- Or use
sync.Mapfor lock-free reads - Consider sharding the bucket map by hash of key
Risk assessment: Needs benchmarking — sync.Map has different performance characteristics.
F-09: SHA-256 hash + 2 DB lookups on every API key auth request
| Severity | 🟡 Medium |
| Impact | Medium |
| Confidence | High |
| Files | go-backend/internal/middleware/auth.go |
What is happening: Every request authenticated via API key:
- SHA-256 hash of the raw key (~200ns)
- DB lookup:
api_keystable by hash (~1–2ms) - DB lookup:
organizationstable by org_id (~1–2ms) - Async:
UPDATE api_keys SET last_used_at = NOW()(bounded queue)
Why it may be slow: 2–4ms of auth overhead per request, purely from DB queries. JWT auth (Clerk) doesn't have this problem because JWKS are cached locally.
Suggested fix:
Cache validated (key_hash → org, scopes, expiry, revoked_at) in Redis or a local LRU with 30–60s TTL. Invalidate on revoke.
Risk assessment: Low risk. Must handle key revocation correctly — check revoked_at on cache refresh.
F-10: Distance calculation fetches ALL key points regardless of proximity
| Severity | 🟡 Medium |
| Impact | Medium |
| Confidence | Medium |
| Files | go-backend/internal/service/distance_calculation_service.go |
What is happening:
CalculateForBuilding() fetches up to 1,000 key points and makes one routing API call per key point (OSRM/Mapbox), with a 100ms delay between calls for rate limiting.
Why it may be slow:
- A building in Texas doesn't need driving distance to a key point in New York
- 100 key points × 100ms delay = 10 seconds minimum per building
- External API calls are the bottleneck, not local compute
- Wastes routing API quota on irrelevant distances
Suggested fix:
Pre-filter key points by bounding box or haversine distance (e.g., within 200 miles) before calling the routing API. The key_points table has a GIST spatial index — use ST_DWithin to filter.
Risk assessment: Safe — reduces unnecessary external calls without changing results for relevant key points.
F-11: CSV is read 2–3 times during import
| Severity | 🟡 Medium |
| Impact | Low–Medium |
| Confidence | High |
| Files | go-backend/internal/service/import_job_service.go |
What is happening:
readCSVMeta()— reads full file for row count + headersValidate()— reads full file again, validates each rowProcess()— reads full file a third time, inserts valid rows
Header-mapping logic is duplicated across all three functions.
Why it may be slow: Triple file I/O. For a 100MB CSV, that's 300MB of disk reads. Also duplicated code increases maintenance burden.
Suggested fix:
- Read once, cache metadata from first pass
- Combine validation and processing into a single streaming pass (validate row → insert if valid → accumulate errors)
- Extract shared CSV mapping setup to a helper
Risk assessment: Moderate — requires refactoring the import pipeline. Current two-phase approach (validate then process) may be intentional UX.
F-12: Building list cache key is hyper-specific (likely low hit rate)
| Severity | 🟢 Low–Medium |
| Impact | Low–Medium |
| Confidence | Medium (needs Redis metrics) |
| Files | go-backend/internal/service/building_service.go |
What is happening: The list cache key includes: org ID, page number, page size, sort column, sort order, and all filter values (submarket IDs, corridor IDs, cities, states, property types, building statuses, classes, RBA ranges, score ranges, vacancy filters, etc.).
Any change in any filter = complete cache miss.
Why it may be slow: Users apply many different filter combinations. The cache is still written to on every miss (Redis SET), burning Redis bandwidth without corresponding hits. With 5-minute TTL and diverse queries, most keys expire unused.
Suggested fix:
- Check Redis hit/miss metrics to validate this hypothesis
- If confirmed: either cache at coarser granularity (unfiltered first page per org) or skip list caching entirely and rely on DB performance + singleflight
- The singleflight already handles thundering herd — the cache may be redundant for lists
Risk assessment: Needs measurement before changing.
F-13: Scorecard profile fetched from DB on every single calculation
| Severity | 🟢 Low |
| Impact | Low |
| Confidence | High |
| Files | go-backend/internal/service/building_scorecard_service.go |
What is happening:
CalculateForBuilding() calls scoreRepo.GetActiveProfile() on every invocation. The active profile is application-wide and rarely changes. In BulkCalculate for 500 buildings, that's 500 identical queries.
Suggested fix:
Cache the active profile in-memory with 1–5 min TTL, or fetch once in BulkCalculate and pass as parameter.
Risk assessment: Safe.
F-14: HTTP WriteTimeout of 11 minutes applies to ALL endpoints
| Severity | 🟢 Low |
| Impact | Low |
| Confidence | Medium |
| Files | go-backend/cmd/api/main.go |
What is happening:
&http.Server{
WriteTimeout: 11 * time.Minute, // Must exceed export timeouts (10m)
}
This is set to accommodate long-running export streams. However, it applies to all endpoints. A slow client on a non-export endpoint holds a goroutine + connection for up to 11 minutes.
Suggested fix:
Use per-route timeouts via http.TimeoutHandler wrapper:
- CRUD endpoints: 30 seconds
- Dashboard/search: 60 seconds
- Export endpoints: 11 minutes (current behavior)
Risk assessment: Safe, but requires route-level middleware changes.
F-15: Sentry sample rate at 20% with no per-endpoint control
| Severity | 🟢 Low |
| Impact | Low |
| Confidence | Low |
| Files | go-backend/cmd/api/main.go |
What is happening:
TracesSampleRate: 0.2 applies uniformly. High-traffic endpoints (building list, dashboard) are sampled at the same rate as rare admin endpoints.
Suggested fix:
Use TracesSampler function to apply dynamic rates:
- Health checks, metrics: 0% (skip)
- High-traffic reads: 5%
- Mutations: 20%
- Errors/slow requests: 100%
Risk assessment: Safe.
3. Concrete Potential Changes
Ordered by impact-to-effort ratio (best ROI first):
| Priority | Change | Files | Effort | Impact | Risk | Status |
|---|---|---|---|---|---|---|
| 1 | Add pg_trgm GIN indexes on search columns | migrations/add_performance_indexes.sql | Low | High | Safe | Done |
| 2 | Add composite indexes (org+date, org+status+date) | migrations/add_performance_indexes.sql | Low | Medium | Safe | Done |
| 3 | Add LIMIT to ListByBuildingID, ListByTenantID | comp_repo.go | Low | High | Safe | Done (1000 cap; ListByStatus left unbounded — single-table, no JOINs) |
| 4 | Cache scorecard profile in BulkCalculate | building_scorecard_service.go | Low | Low | Safe | Done |
| 5 | Batch DeletePattern during SCAN iteration | cache.go | Low | Medium | Safe | Done |
| 6 | Minimal COUNT queries (join only referenced tables) | comp_repo.go, building_repo.go | Medium | High | Safe | Done |
| 7 | Cache API key validation in Redis (30s TTL) | auth.go | Medium | Medium | Low risk | Deferred — crosses middleware→cache boundary |
| 8 | Parallelize BulkCalculate with bounded worker pool | building_scorecard_service.go | Medium | High | Needs benchmarking | Deferred — needs pool saturation data |
| 9 | Pre-filter key points by proximity | distance_calculation_service.go | Medium | Medium | Safe | Done (300mi haversine pre-filter) |
| 10 | Per-route write timeouts | main.go | Medium | Low | Safe | Deferred |
| 11 | Batch building imports (pre-resolve + batch insert) | import_job_service.go | High | High | Needs testing | Deferred — import loop refactor |
| 12 | Single-pass CSV import | import_job_service.go | High | Low–Med | Moderate | Deferred — intentional UX split |
| 13 | Fix cache key stability (sort slices, compact format) | building_service.go, comp_service.go, tim_service.go | Low | Low–Med | Safe | Done (F-17) |
| 14 | Dynamic Sentry trace sampling | main.go | Low | Low | Safe | Done (F-15: 0% health, 5% GET, 20% mutations) |
| 15 | Batch spatial queries in BulkCreate | building_service.go, geography_repo.go | Medium | High | Safe | Done (F-16: 2 queries instead of 2N) |
4. Implementation Status & Next Steps
Implemented (2026-03-31)
All Phase 1 quick wins plus several Phase 3 items shipped in a single pass:
- F-01: Minimal COUNT queries — comp and building list counts only JOIN tables referenced by active filters
- F-02: LIMIT 1000 on
ListByBuildingIDandListByTenantID(comp queries with 6 JOINs).ListByStatusleft unbounded (single-table, lightweight rows, caller expects all results) - F-05:
pg_trgmGIN indexes onbuilding_name,street_name,city,tenant_name,comments - F-06: Composite indexes on
(org, date),(org, status, date),(org, building),(org, tenant), operations identity - F-07:
DeletePatternnow deletes in batches of 500 via Redis pipelines - F-10: Haversine pre-filter (300mi radius) before routing API calls — skips distant key points
- F-13:
BulkCalculatepre-fetches the active scorecard profile once instead of N times - F-15: Dynamic Sentry trace sampling — 0% health/metrics, 5% GET reads, 20% mutations
- F-16: Batch spatial queries in
BulkCreate— 2 queries total instead of 2 per building - F-17: Cache key stability — slices sorted and comma-delimited for consistent keys
Remaining: Measure First
Before tackling these, gather production data:
- Enable
pg_stat_statements— top 20 slowest queries by total time - Check Redis
INFO stats—keyspace_hitsvskeyspace_missesto validate F-12 (list cache hit rate) - Check Prometheus —
http_request_duration_secondsp95/p99 by endpoint - Check
db_pool_active_connectionsgauge during peak bulk operations
Remaining: Data-Dependent
| Item | Trigger | Effort |
|---|---|---|
F-03: Parallelize BulkCalculate with errgroup | Pool not saturating during bulk ops | Medium |
| F-04: Batch building imports (pre-resolve parks/owners) | Import speed is a measured problem | High |
| F-08: Rate limiter lock contention (sharded mutexes) | High concurrent API key traffic | Medium |
| F-09: Cache API key validation in Redis | API key auth latency is measurable | Medium |
| F-12: Evaluate list cache utility | Redis metrics show low hit rate | Low |
Remaining: Low Priority
- F-11: Single-pass CSV import — validate-then-process is intentional UX
- F-14: Per-route write timeouts — low impact
5. What Could Not Be Verified
| Area | Why | How to Verify |
|---|---|---|
| Actual query plans | No access to EXPLAIN ANALYZE on production data. Index recommendations are based on schema + query structure, not measured plans. | Run EXPLAIN (ANALYZE, BUFFERS) on the queries from F-01, F-05, F-06 against production-scale data. |
| Cache hit rates | No access to Redis metrics. F-12 (list cache utility) is an educated guess. | Check redis-cli INFO stats — compare keyspace_hits to keyspace_misses. Also check per-key-pattern hit rates if available. |
| Real traffic patterns | No access to Prometheus or Sentry dashboards. Endpoint frequency and p95 latency unknown. | Query Prometheus: histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) grouped by path. |
| Export stored procedures | export.refresh_* functions are defined in the DB, not in repo code. Their internal performance is a black box. | Run EXPLAIN ANALYZE on the refresh functions. Check if export tables have proper indexes for the refresh queries. |
buildings_with_metrics_v view | Dashboard queries reference this view. Whether it's materialized or a regular view affects performance significantly. | Check: SELECT relkind FROM pg_class WHERE relname = 'buildings_with_metrics_v' — v = view, m = materialized view. |
| Connection pool saturation | Pool is 25 max conns. Under concurrent BulkCalculate or import, this could saturate. | Check Prometheus: db_pool_active_connections gauge during peak bulk operations. |
| Import worker concurrency | The standalone import worker's actual concurrency setting in production is unknown. | Check deployment config or env var IMPORT_WORKER_CONCURRENCY. |
| Rate limiter bucket count | Can't assess real lock contention without knowing how many distinct orgs/API keys exist. | Add a Prometheus gauge for rate_limit_bucket_count or log it periodically. |
6. Code-Verified Review Notes
Reviewer: Claude (code-checked review, 2026-03-31) Method: Spot-checked findings F-01, F-02, F-03, F-05, F-07, F-13 against source code. All descriptions matched.
Findings confirmed as-described
- F-01: COUNT query at
comp_repo.go:273unconditionally usescompJoinClause(). Accurate. - F-02:
ListByBuildingID/ListByTenantIDatcomp_repo.go:334–387have no LIMIT. Accurate. - F-03:
BulkCalculateatbuilding_scorecard_service.go:129is a pure sequential loop. Accurate. - F-05: Leading-wildcard ILIKE at
comp_repo.go:218–220with no trigram index. Accurate. - F-07:
DeletePatternatcache.go:84–103matches the audit description verbatim. Accurate. - F-13:
GetActiveProfile()called insideCalculateForBuildingatbuilding_scorecard_service.go:86, re-executed per building inBulkCalculate. Accurate.
Pushback / nuance on specific findings
F-02 — Risk is understated. Adding LIMIT to ListByBuildingID/ListByTenantID is a behavioral change, not just a performance fix. If the building detail page or any aggregation logic relies on receiving all comps, a hard LIMIT silently truncates data. Callers must be audited before adding a limit. A soft limit with a warning log may be safer initially.
F-04 — Effort is understated. The per-building transaction spans multiple tables (buildings + addresses + electrical). The existing comp BulkCreate using pgx.Batch is single-table, so the same pattern doesn't directly apply. This is closer to "High" effort than the table suggests.
F-08 — sync.Map is the wrong suggestion. sync.Map is optimized for read-heavy, append-only workloads. The rate limiter mutates on every Allow() call (decrementing tokens), which is the worst case for sync.Map. Sharded sync.Mutex maps (partition by hash of key) would be a better fit for reducing contention without changing the locking model.
F-11 — Intentional UX deserves more weight. The validate-then-process split lets users see all validation errors before any data is written. Merging into a single streaming pass changes that contract. This should not be prioritized unless import speed is a measured problem.
F-12 — Measure before acting. Singleflight already handles thundering herd. The Redis writes are cheap even if most keys expire unused. Measure hit/miss rates before changing anything here.
Additional findings missed by the original audit
F-16: BulkCreate runs sequential spatial queries for geography derivation
| Severity | 🔴 High |
| Impact | High |
| Confidence | High |
| Files | go-backend/internal/service/building_service.go (lines 264–268, 354–375) |
BulkCreate calls deriveGeography() per building in a loop (line 265–268). Each call makes 2 spatial DB queries (GetSubmarketByPoint, GetCorridorByPoint). For 1,000 buildings that's 2,000 sequential round-trips — same class of problem as F-03 and F-04. A batch spatial lookup (e.g., ST_Within against all points in one query) would reduce this to 2 queries total. This should be added to Phase 3 alongside the other bulk-operation improvements.
F-17: Cache key generation produces unstable, verbose keys
| Severity | 🟢 Low |
| Impact | Low |
| Confidence | High |
| Files | go-backend/internal/service/building_service.go (lines 450–475) |
generateListCacheKey uses fmt.Sprintf(":sm%v", f.SubmarketIDs) which produces keys like :sm[1 2 3]. These are order-dependent (same filter set in different order = cache miss) and verbose. Reinforces the F-12 hypothesis that list cache hit rates are low. If list caching is kept, keys should sort slice values and use a compact delimiter (e.g., :sm1,2,3) or hash the filter struct.
Adjusted priority recommendation
All items below priority 7 have been implemented. See §3 and §4 for full status.
| Priority | Change | Status |
|---|---|---|
| 1 | pg_trgm GIN indexes (F-05) | Done |
| 2 | Composite indexes (F-06) | Done |
| 3 | Minimal COUNT queries (F-01) | Done |
| 4 | Cache scorecard profile (F-13) | Done |
| 5 | Batch DeletePattern (F-07) | Done |
| 6 | LIMIT on comp queries (F-02) | Done (1000 cap on comp queries; ListByStatus left unbounded) |
| 7 | Pre-filter key points (F-10) | Done (300mi haversine) |
| 8 | Batch spatial queries (F-16) | Done (LATERAL UNNEST, 2 queries total) |
| 9 | Cache key stability (F-17) | Done (sorted, comma-delimited) |
| 10 | Dynamic Sentry sampling (F-15) | Done |
| 11+ | F-03, F-04, F-08, F-09, F-11, F-12, F-14 | Deferred — see §4 |
Appendix: Architecture Overview
For reference, the backend follows this structure:
HTTP Request
→ RealIP → Sentry → Recoverer → Compress → CORS → Logging
→ Prometheus → AuditLogger → MaxBody → SecurityHeaders
→ RateLimiter → Auth (JWT or API Key)
→ Handler (parse HTTP, validate input)
→ Service (business logic, caching, singleflight)
→ Repository (SQL, parameterized queries, org-scoped)
→ PostgreSQL (pgx/v5, pool of 25 conns)
→ Redis Cache (JSON, TTL 1min–1hr)
→ Response (JSON)
Background processes:
- Import worker (standalone binary, polls
import_jobstable) - JWKS refresh (hourly, in-process)
- DB pool metrics collector (every 15s, in-process)
- Async bounded queues: audit logging, error capture, API key
last_used_atupdates