Reference Architecture for B2B SaaS Platforms: Boundaries, APIs, and Data Flow
A B2B SaaS reference architecture defines how clients, APIs, identity, domain services, data stores, and async messaging connect—with explicit tenancy isolation and failure modes—so new features reuse stable boundaries instead of reinventing the stack each time.
A B2B SaaS reference architecture is an opinionated template for how web and mobile clients, APIs, identity, workflows, and data stores fit together so teams can ship predictably without redrawing the whole system on every feature. This guide gives a production-minded baseline: where to draw boundaries, when to call synchronously versus publish events, how to isolate tenants, and which failure modes to design for first. It distills patterns we apply at Baaz on product builds from MVP through scale-up—aligned with widely cited operational guidance from Google's Site Reliability Engineering practice on managing reliability via clear service boundaries and measured risk (Google SRE books, O'Reilly).
Key takeaways
Start with a modular monolith or small set of services until team size and deployment pain justify finer splits; premature microservices add latency, data consistency work, and operational cost without proportional benefit.
Treat identity (IdP), billing, notifications, and reporting as explicit bounded contexts—either modules with clear interfaces or separate deployables—so policy and compliance changes do not ripple unpredictably.
Prefer async integration (events/outbox) for cross-context side effects; reserve synchronous HTTP/gRPC for user-facing read paths and operations that must complete in a single request for correctness.
Define per-tenant isolation early (row-level security, schema-per-tenant, or cluster-per-tenant) and document recovery: backup scope, RPO/RTO targets, and how you detect cross-tenant leakage.
What belongs in the core platform layer
The platform layer typically owns authentication federation (OIDC/OAuth2 against your IdP), coarse authorization (roles, org membership), API gateway concerns (rate limits, WAF, request validation), audit logging of security-relevant actions, feature flags, and shared infrastructure such as observability agents. Google's SRE material emphasizes instrumenting golden signals (latency, traffic, errors, saturation) per service so operators can reason about user impact—plan for these hooks in the platform from week one.
Keep domain business rules inside domain services (orders, projects, workflows specific to your product) and out of the edge gateway. The gateway should terminate TLS, enforce authn, and route—not embed business branching.
Platform teams should publish paved-road templates: service skeletons with tracing enabled, standard middleware for auth context propagation, and default dashboards—reduces drift as squads multiply.
Centralise cross-cutting policy (mTLS between services, certificate rotation, baseline pod security) while pushing business logic outward—this split keeps compliance auditable without bottlenecking product features.
Comparison: modular monolith vs microservices (when each wins)
Modular monolith: single deploy, in-process calls, one operational surface—best for <~15 engineers, rapid iteration, and when domains are still moving. Draw module boundaries with package rules and enforce with code ownership.
Microservices: independent deploy and scaling—best when different components have 10x different load profiles, compliance needs physical isolation, or teams are large enough to own full services end-to-end. Expect 25–150 ms extra latency per hop (order-of-magnitude; workload-dependent) and invest in tracing (OpenTelemetry), contract tests, and idempotent consumers.
Table-style summary — Team size: modular monolith favors <15 people; microservices often appear above ~20–30 with mature platform teams. — Deploy risk: monolith couples releases; services isolate blast radius but multiply integration failures. — Data: monolith uses one primary database with transactions; services push you toward sagas, outbox, and eventual consistency.
Data stores: OLTP, search, cache, analytics
Use a relational OLTP database (PostgreSQL is common) as the system of record for transactional state. Add Redis (or similar) for session, rate limiting, and hot read caching with explicit TTLs and cache-aside patterns—never as the only copy of financial or contractual truth.
Introduce Elasticsearch or OpenSearch when full-text search or complex filters exceed what indexes on Postgres can support at your scale; synchronize via CDC or application-level indexing workers.
Route heavy reporting and BI to a warehouse (Snowflake, BigQuery, Redshift) fed by CDC or batch ELT so analytical queries do not destabilize OLTP. The goal is failure isolation: a bad analyst query should not raise API p99 latency.
API and event boundaries
Expose versioned HTTP APIs (or gRPC internally) at context boundaries. Version in the URL or Accept header; never rely on silent breakage for external integrators.
For cross-context effects (“order placed” → notify billing, search index, CRM), publish domain events from an outbox table in the same transaction as the state change so you avoid dual-write bugs. Consumers must be idempotent (natural keys or idempotency keys) because events are delivered at-least-once in every major broker (Kafka, SNS+SQS, Pub/Sub).
Document ordering guarantees per topic: global order is expensive; partition by aggregate or tenant when strict ordering is required.
Publish OpenAPI/Proto contracts in CI; breaking changes should fail builds or require explicit consumer approvals. Ad-hoc JSON drift is how integrations rot.
Observability, tracing, and SLO hooks
Standardise trace context propagation (W3C traceparent or vendor equivalent) from edge to database and message publishers. Without it, p99 hunts become guesswork.
Log structured fields: tenant_id, user_id (hashed when needed), request_id, and feature flags active. Free-text-only logs do not scale with cardinality.
Define SLOs per critical API journey early—even if initially loose—so alerting targets user pain, not CPU graphs alone. See the companion article on SLOs and error budgets in this blog for a practical pattern.
CI/CD, environments, and release safety
Minimum environments: development, staging, production; add preview environments per PR when feasible. Parity gaps between staging and prod recreate incidents.
Gate releases with automated tests, canary or blue/green where rollback cost is high, and database migration strategies that are backward compatible across at least one deploy window.
Infrastructure as code (Terraform/Pulumi/CDK) should live beside application repos or a governed platform repo—click-ops does not survive audits.
Cost controls and FinOps discipline
Tag every resource with tenant, environment, and cost centre. Untagged spend becomes mystery spend within two quarters.
Right-size databases and caches from measured utilisation, not guesses; schedule non-prod shutdowns where possible.
Watch egress and managed service per-request fees—they dominate surprise bills for API-heavy SaaS.
Multi-tenant performance isolation
Noisy-neighbour risk rises with shared infrastructure: one tenant's heavy job can starve others without quotas, rate limits, and background queue segregation.
Expose per-tenant usage metrics to support and success teams early—helps diagnose "slow for me" tickets without guessing.
Extensibility: webhooks, public API, and partner sandboxes
If partners integrate, treat external API stability as a product: versioning, deprecation windows, sandbox environments, and status pages.
Webhooks need signing secrets, retry policies, and DLQs—partners will fail callbacks; your system must tolerate it.
Failure modes to design for first
Dependency timeouts: every outbound call gets deadlines and bounded retries with jitter; failing closed or open should be an explicit product decision.
Partial outages: degrade features that depend on optional services rather than returning 500 for the whole page where possible.
Data corruption & rollback: test restores from backups quarterly; if you cannot restore, you do not have a backup.
Tenant isolation bugs: add integration tests that prove tenant A cannot read tenant B's IDs; log tenant_id on every request in structured logs.
Limitations of this reference model
This architecture assumes a cloud or colocated deployment with staffed operations or a strong managed-service strategy; regulated on-prem or air-gapped environments need different network and key-management patterns.
Ultra-high-scale consumer products may shard earlier and adopt cell-based architectures—this B2B-oriented baseline intentionally trades some scalability headroom for simpler operations.
Numbers (latency per hop, team thresholds) are rules of thumb, not guarantees; measure on your stack.
Highly specialised domains—real-time trading, embedded firmware paired with cloud control planes—require domain-specific safety and timing patterns not covered here.
Explore Product Strategy, Custom Software, and AI Development. If a build has stalled, see software project rescue. When you are ready to talk, get in touch.