Infrastructure & Operations
Hosting Architecture
┌─────────────────────────────────────────────────────────────┐
│ Cloudflare Edge │
│ TLS termination, WAF, CDN caching, DNS │
├──────────────────────┬──────────────────────────────────────┤
│ Cloudflare Pages │ Cloudflare Tunnel/Proxy │
│ (Frontend SPA) │ → Render (Backend) │
└──────────────────────┴──────────────────────────────────────┘
│
┌────────────────────┼────────────────────┐
│ │ │
┌────▼────┐ ┌────▼────┐ ┌────▼────┐
│ Supabase│ │ Upstash │ │ Clerk │
│ Postgres│ │ Redis │ │ Auth │
└─────────┘ └─────────┘ └─────────┘
| Component | Platform | Notes |
|---|---|---|
| Frontend | Cloudflare Pages | Static SPA, auto-deploy from git |
| Backend | Render | Docker container, auto-deploy from git |
| Database | Supabase | Managed Postgres 15 + PostGIS |
| Cache | Upstash | Managed Redis |
| Auth | Clerk | SSO, MFA, JWT issuance |
| Secrets | Infisical | Auto-syncs to Render + Cloudflare Pages |
| Error Tracking | Sentry | Frontend + backend error/performance |
| DNS/CDN/TLS | Cloudflare | Edge termination, HSTS |
CI/CD Pipeline
GitHub Actions Workflow
File: .github/workflows/go-backend-ci.yml
Triggers:
- Pull requests touching
go-backend/** - Pushes to
mainandqabranches touchinggo-backend/**
CI Services:
- PostgreSQL 16 + PostGIS (via
postgis/postgis:16-3.4-alpineservice container)
Steps:
- Checkout code
- Apply database schema — runs
PRISM_vNext_FINAL.sql+ all migrations against test DB - Create test roles —
authenticated,prism_approles with RLS stubs - Setup Go toolchain
- Go tests —
go test ./...(includes integration tests against real Postgres) - Race detection —
go test -race ./internal/... - Go vet —
go vet ./... - Swagger drift check —
make api-docs-check(ensures generated docs match handler annotations) - Docker compose validation —
docker compose configfor ops configs
Branch Strategy
feature → qa → main (production)
| Branch | Purpose | Deploys To |
|---|---|---|
| Feature branches | Development work | — |
qa | Integration testing, QA validation | Staging |
main | Production releases | Production |
Release Flow
- Feature branches merge into
qavia PR - CI runs automatically on PR + push
- QA validation on staging environment
- PR from
qa→mainrequires 1 approving review + all CI green - Merge to
maintriggers production deploy - Post-deploy verification via
/health,/ready
Docker Compose Stack
Full Development Stack
File: go-backend/ops/docker-compose.yml
Services:
- postgres — PostgreSQL 15 + PostGIS
- redis — Redis 7
- api — Prism Go backend (builds from Dockerfile)
- prometheus — Metrics collection and alerting rules
- alertmanager — Alert routing to Slack
- grafana — Dashboards and visualization
cd go-backend/ops
cp .env.example .env
docker compose up --build -d
VM/EC2 Deploy Stack
File: go-backend/ops/deploy/docker-compose.yml
Minimal stack — API container only (point at managed Postgres/Redis):
cd go-backend/ops/deploy
cp .env.example .env
docker compose up -d --build
Monitoring
Production Monitoring
| Tool | Purpose |
|---|---|
| Sentry | Error tracking, performance monitoring, release tracking |
| Render Metrics | CPU, memory, request stats, restarts |
| Cloudflare Analytics | Edge traffic, WAF events, cache hit rates |
Local Development Monitoring
| Tool | Port | Purpose |
|---|---|---|
| Prometheus | :9090 | Metrics scraping and alerting rules |
| Alertmanager | :9093 | Alert routing to Slack |
| Grafana | :3000 | Dashboards and visualization |
Prometheus Alert Rules
File: go-backend/ops/prometheus/prism-api-alerts.yml (full profile)
File: go-backend/ops/prometheus/prism-api-alerts.dev.yml (low-noise dev profile)
Alert categories:
- Request latency (warning: P95 > 500ms, critical: P95 > 2s)
- Error rate (warning: > 1% 5xx, critical: > 5%)
- Request rate anomalies
- Go runtime (goroutine count, memory)
Slack Alert Routing
- Warning alerts →
SLACK_WARNING_CHANNEL - Critical alerts →
SLACK_CRITICAL_CHANNEL - Webhook URLs configured in
go-backend/ops/.env
Grafana Dashboard
Pre-configured dashboard: go-backend/ops/grafana/dashboards/prism-api-overview.json
Panels: request rate, latency percentiles, error rate, active goroutines, memory usage.
Grafana Alloy (OpenTelemetry)
File: go-backend/ops/alloy/config.alloy
Grafana Alloy is configured as an OpenTelemetry collector for traces and metrics forwarding.
Backend Metrics Endpoint
The backend exposes Prometheus metrics at GET /metrics.
Production protection: When ENV=production, the /metrics endpoint requires either:
- Bearer token matching
METRICS_SCRAPE_TOKEN - Auth with
audit:readscope
Metrics exposed:
http_requests_total— request counter by method, path, statushttp_request_duration_seconds— request latency histogramhttp_request_size_bytes— request body sizehttp_response_size_bytes— response body size- Go runtime metrics (goroutines, memory, GC)
Performance Testing
k6 Smoke Test
Script: go-backend/perf/k6-smoke.js
Run: cd go-backend && make load-smoke
SLO thresholds (documented in docs/ops/k6-smoke-test.md):
- P95 latency < target
- Error rate < threshold
- Throughput > minimum
Use as a release gate before promoting qa → main.
Secrets Management
All secrets managed in Infisical as single source of truth.
Infisical Folder Structure
/backend/ → Go backend secrets (synced to Render)
/frontend/ → Frontend secrets + config (synced to Cloudflare Pages)
/ops/ → Monitoring / alerting secrets (deferred)
Sync Targets
| Folder | Syncs To |
|---|---|
/backend/ | Render (backend service) |
/frontend/ | Cloudflare Pages (build environment) |
Key Secrets
| Secret | Platform | Sensitivity |
|---|---|---|
DATABASE_URL | Render | High (embedded password) |
REDIS_URL | Render | High (embedded password) |
CLERK_SECRET_KEY | Render | High |
CLERK_SECRET_KEY | Cloudflare | High (Clerk proxy function) |
SENTRY_AUTH_TOKEN | Cloudflare | High |
SENTRY_DSN | Render | Low |
ALLOWED_ORIGINS | Render | Low |
Rotation Policy
- Database credentials: 90 days or on compromise
- Clerk keys: on staff change or compromise
- Sentry tokens: on compromise
- Slack webhooks: rotate immediately if exposed
Deployment Procedures
Pre-Deploy Checklist
- All CI checks pass on target commit
- No open high/critical Dependabot alerts
-
ENV=productionset in Infisical/backend/ -
STRICT_CLERK_SCOPES=trueset - Database backup exists before migrations
- Any pending migrations reviewed and ready
- Edge/TLS/HSTS healthy
Database Migration Policy
- Fresh environments: Apply
database/PRISM_vNext_FINAL.sql - Existing databases: Apply targeted files from
database/migrations/(non-destructive) - Always back up before applying migrations
- Validate key objects after schema changes
Rollback Procedure
- Identify the last known-good commit
- Revert in Render to previous deploy
- If migration caused the issue: apply rollback migration or restore from backup
- Verify
/healthand/readyendpoints - Spot-check key business endpoints
Post-Deploy Verification
- Check
GET /health→ 200 - Check
GET /ready→ 200 - Verify Sentry for new error groups
- Spot-check: building list, comp list, auth flow
- Monitor for 15 minutes
Frontend Deployment
Platform: Cloudflare Pages
Build: npm run build (Vite production build)
Output: frontend/dist/ (static assets)
Environment variables injected at build time via Cloudflare Pages settings (synced from Infisical /frontend/).
See docs/frontend/deploy-cloudflare-pages.md for detailed setup.
TLS / HTTPS
TLS is terminated at the Cloudflare edge. The backend service listens on :8080 (HTTP) behind the edge proxy.
- HSTS enabled at edge
- Backend is not directly reachable from public internet
TRUST_CLOUDFLARE_HEADERSmust be set correctly for real IP extraction
See docs/ops/https-tls.md for detailed configuration.