Skip to main content

Risks & Challenges

Top 10 Engineering Challenges

These are the hardest problems to get right in SalesArck. They should inform architecture decisions, not be discovered in production.

1. Customer Identity Matching Quality

POS systems often have incomplete consumer data. Square may have a phone number; Clover might have an email; some transactions have no identity signal at all.

Impact: Points cannot be awarded if we can't link a consumer to a transaction.
Mitigation:

  • Store anonymous transactions and attempt retroactive linking
  • Support multiple identity signals (phone, email) with priority ordering
  • Never invent identity — prefer missing points over incorrect attribution

2. Webhook Reliability and Replay Behavior

Square and Clover both retry failed webhooks with their own backoff logic. The same event may arrive multiple times — seconds apart or hours apart.

Impact: Duplicate point grants if not handled correctly.
Mitigation:

  • Idempotency key stored per (provider, tenant, provider_txn_id) with 30-day TTL
  • Fast-ack webhook endpoint (store raw + 200 OK in < 200ms)
  • Async processing pipeline for actual reward computation

3. Clover App Approval Lead Time

Publishing a Clover app requires Clover App Market review. This approval process can take weeks and blocks Clover merchant onboarding entirely.

Impact: Could delay go-live for Clover merchants.
Mitigation:

  • Submit Clover app as early as possible, before Phase 1 is complete
  • Design Square integration to be 100% functional independently
  • Have a documented fallback plan for polling-only access during approval window

4. Token Lifecycle Management

OAuth access tokens expire. Refresh tokens can also be revoked by the merchant. Handling these failure modes silently leads to halted ingestion.

Impact: Merchant POS connection silently stops working.
Mitigation:

  • Token refresh daemon runs hourly, pre-emptively refreshes tokens expiring in < 1 hour
  • Connection status field: connected → degraded → disconnected
  • Merchant portal shows connection health prominently
  • Email notification when refresh fails after N retries

5. Multi-Tenant Security Bugs (Highest Severity Risk)

If a bug allows one tenant's data to be read or modified by another, this is a critical breach. It's the #1 risk to platform trust.

Impact: Critical — existential reputational and legal risk.
Mitigation:

  • Tenant middleware applied to every authenticated route, no exceptions
  • Every DB query includes explicit tenant_id filter
  • Automated test suite specifically tests cross-tenant access attempts
  • Security review of every PR touching data access patterns

6. Reward Rule Complexity Creep

Rules start simple (1 point per $1) but merchants will want variations. Without careful abstraction, the rule engine becomes a spaghetti of special cases.

Impact: Bugs in edge cases → incorrect point awards → consumer complaints.
Mitigation:

  • Define a strict, versioned rule schema from day 1
  • Implement rules as pure functions with deterministic output
  • Extensive test coverage of rule evaluation with fixture-based tests

7. Idempotency Errors (Duplicate Points)

Under high load, retries, or message queue redelivery, the same transaction can be processed twice. Without a robust idempotency layer, consumers get double points.

Impact: Financial liability — points are a real liability on the books.
Mitigation:

  • idempotency_keys table with unique constraint on the key
  • All wallet writes happen inside a DB transaction with idempotency key insert
  • Idempotency keys have a 30-day TTL (sufficient for all realistic replay windows)

8. SMS OTP Cost and Deliverability

SMS is not free at production scale. Deliverability varies by carrier and country, and Supabase Auth still depends on reliable OTP delivery infrastructure.

Impact: Auth failures for users in low-coverage regions; unexpected cost spikes.
Mitigation:

  • Keep email OTP enabled where product requirements allow
  • Monitor OTP delivery success rates and sign-in failure rates by region
  • Budget for OTP delivery cost from day 1 (managed auth does not mean free delivery)
  • Provide a support path for users who cannot receive SMS reliably

9. Data Reconciliation Drift

The database can drift from the true POS state if webhooks are missed or poll windows fail. Over time, wallet balances may not reflect reality.

Impact: Consumer complaints about missing points; merchant trust issues.
Mitigation:

  • Nightly reconciliation job compares POS transaction windows vs ingested records
  • Discrepancy reports surfaced in admin console
  • Auto-heal wallet snapshots from ledger (snapshot is derived, not source of truth)
  • Never mutate historical ledger entries — append correction entries instead

10. Operational Support Burden

Merchant onboarding involves OAuth flows, webhook configuration, and rule setup. When things go wrong, merchants escalate to support. Without tooling, this is expensive.

Impact: Support cost grows super-linearly with merchant count without tooling.
Mitigation:

  • Admin console with transaction replay, ledger inspection, and connection status
  • Self-service reconnect flow for common OAuth failures
  • Runbooks for top 10 support scenarios
  • Health status indicator on merchant portal for POS connection

Risk Register (MVP)

RiskImpactLikelihoodMitigation
Cross-tenant data exposure🔴 CriticalMediumTenant middleware + per-query guard + automated tests
Duplicate reward grants🟠 HighMediumIdempotency keys + unique DB constraints + replay-safe workers
OAuth token expiration breakage🟠 HighMediumRefresh daemon + retry + merchant alert email
OTP failure / delivery delay🟡 MediumHighMulti-provider OTP abstraction + fallback channel
POS downtime / webhook delay🟡 MediumMediumPoll fallback + dead-letter queue + consumer notification
Clover approval delay🟡 MediumHighEarly submission + Square-first launch plan
Rule engine edge-case bugs🟡 MediumMediumFixture-based unit tests + rule version audit
Queue consumer crash loop🟡 MediumLowDead-letter queue + circuit breaker + alerting
DB connection pool saturation🟡 MediumLowConnection pooler (PgBouncer / Neon built-in) + alerts
Reconciliation drift > 0.01%🟢 LowLowNightly reconciliation job + discrepancy dashboard

Non-Negotiable Guardrails

These are architectural constraints that must never be compromised for velocity:

These are not optional
  1. Never trust tenant from client UI — derive tenantId exclusively from the validated JWT on the server
  2. Every write path must be idempotent where possible — use Idempotency-Key headers and DB-level constraints
  3. Reward events are append-only — no UPDATE or DELETE on wallet_ledger or reward_events records
  4. All admin actions must generate immutable audit logs — including before state, after state, actor ID, and explicit reason
  5. No PII in application logs — implement PII redaction middleware before the first production deploy
Written byDhruv Doshi