The most important decision affecting your B2B service database model
In this post, we delve into one of the most crucial decisions that B2B SaaS builders face when designing a database data model: choosing between a namespace-per-tenant model and a shared namespace model. We examine the advantages and disadvantages of each approach and share tips that will make it easier to adapt to future demands.
Multi-tenancy is a critical concept in the B2B SaaS world, where a single application or service serves multiple customers or tenants. The core of tenancy is implemented at the database level, and choosing the right database tenancy model is essential for the long term success of your service, especially as it starts to grow and scale. Choose wrong here and you’ll find yourself chasing your tail putting out fires, undergo painful migrations or complicate and slow down your deployment process.
At SuperTenant, we build cool tech that empowers our users with single-tenant-like advantages in their existing multi-tenant architecture, while mitigating the common drawbacks associated with multi-tenancy. We love to share our experience and insights from working closely with our users. In this post we will delve into the major tradeoffs between the two models and how to hedge your bets so future changes won’t require major rewrites.
Choosing the right tenancy model
When designing a new (micro)service, you should decide on the data tenancy model first, before diving into any other aspect of your data model, and even before choosing the database technology to use. Essentially, you need to decide between two models: namespace-per-tenant or shared namespace. In this context, a namespace can refer to a schema, database, table space, database instance and so on.
Avoid the false sense of isolation
When implementing a namespace-per-tenant model, it’s easy to get tempted to the dark side and do shortcuts like using cross-namespace JOINs. Our objective is to ensure complete isolation between namespaces, so even if cross-namespace JOINs are technically feasible, to quote Yoda: once you start down the dark path, forever will it dominate your destiny.
There is no free lunch of course: you now have to decide what to do with data shared across tenants. There are several options here:
- Query it separately from the client. Beware it can lead to a lot of duplicated code.
- Replicate the shared data to all namespaces. Complicated and can only work for small amounts of data, with update lag increasing as the number of namespaces increase, not really a recommended option.
- Use a query frontend. Products like Presto, Spark SQL, or Trino in the SQL universe or Apollo for GraphQL can join different database backends together - with the downside of adding additional components to your system and what that entails.
- Use foreign (also called federated) tables. A lot of database vendors support foreign/federated tables these days, so check if this approach can work for you and take into consideration when choosing the database vendor.
7 challenges with namespace-per-tenant approach
A namespace-per-tenant data model is more complicated to design and implement compared to a fully shared architecture since you need to address these issues:
- New customer onboarding and orchestration: with a shared data model, onboarding a new tenant requires no special DDL execution. With a per-tenant data model, however, a specific DDL must be executed for each new tenant to provision a new namespace.
- Database migrations: migrations need to run across all namespaces, making them non-atomic from a client’s perspective. You'll need to iterate through all namespaces and update them to a new schema, sometimes with considerable lag between namespace updates. A "tenant-to-namespace mapping" is often needed as well. Due to the non-atomic and migration lags, the code you develop needs to be compatible with both old and new schemas (a subject for a separate blog post).
- Database access: Your code needs to access the relevant namespace, so you’ll need a tenant-to-namespace mapping. When selecting the database technology, pay attention to whether each namespace requires its own network connection or network roundtrips to switch between namespaces, because that will make connection pooling ineffective and other solutions like connection pooling database proxy (e.g. pgbouncer, proxysql). You'll need to check that your connection-pooling library and ORM (if you use one) can be tenant-aware.
- Batch processing: per-tenant isolation apply to batch processing, so you can’t just do a single UPDATE on multiple tenants. Like database migrations, batch processing jobs must know which namespaces they need to work with and as your number of tenants grow you need to account for processing lags or concurrently run several jobs to cover all namespaces.
- Backups: If you choose a database-per-tenant model for PostgreSQL, for example, you'll need to run a separate backup for each tenant.
- Operational Analytics: While per-tenant analytics are feasible, answering questions about all customers becomes more difficult.
- Database limits: Each database has a cap on the number of namespaces it can effectively support.
With all these engineering challenges, why go for a dedicated namespace-per-tenant data model? Because on the other side of the ring there are heavyweight reasons as well.
6 reasons to choose namespace-per-tenant data model
There are several areas in which namespace-per-tenant model outshines the shared namespace data model approach:
- Blast radius: keeping all customers in a single database can have a potentially huge blast radius, either because of a noisy neighbor (especially since databases generally can’t prioritize workload) or a database outage.
- Scale: not all databases scale well horizontally and even the ones that scale horizontally better are not really scaling linearly, so scaling a single large database cluster horizontally may sometimes cost a lot more than having a few medium database clusters (see next bullet).
- Tenant migration: if for whatever reason you want to migrate a tenant from one database cluster to another, having the tenant in its own namespace makes a world of a difference due to off-the-shelf tools, time to implement, test and run such migration - with or without a downtime.
- Per-tenant data retention: when dealing with data-intensive applications for example, databases can very efficiently drop partitions, but much less efficient to find and delete specific rows inside partitions.
- Security: mitigating security risks or meeting customer security requirements sometimes require a higher level of isolation. While there are shared-namespace features sometimes like row-level security to some databases, when it comes to customer requirements it’s sometimes a hard requirement.
- Hybrid or different SLOs: you may want to separate your free customers from your paying customers, so that all the free customers will be served from a single database cluster with different performance characteristics than the paying customers that get served from multiple database clusters.
It’s a numbers game (and KYC)
Reviewing the decisions made by hundreds of B2B SaaS companies, the two most critical factors to consider first are:
- Number of potential tenants the service is expected to have in the realistic future. Serving hundreds to a few thousand medium-to-large customers is different to serving one hundred thousand customers.
- Tenant security requirements. Services made for healthcare or finance for example can expect to receive customer-managed-encryption-keys (CMEK) requirements from large customers and should be extra sensitive to the potential calamity of a cross-tenant data leakage (yes, bugs do happen unfortunately).
Guideline questions for selecting per-tenant vs shared namespace design:
Operating namespace-per-tenant costs more, so ask yourself:
- Q: Do you offer a freemium service? If so, you probably need a shared namespace.
- Q: Does your service sell for <100$ per month? If so, you probably need a shared namespace.
- Q: Are you selling mostly to Enterprise? If so, you probably need a per-tenant namespace.
- Q: Are you selling to SMBs but plan to sell to Enterprise as well? Check the iterative hybrid model.
Covering all bases: the iterative hybrid model
The iterative hybrid model is a fancy way of saying: restrict yourself to what will work in a shared or per-tenant model. Exploring this more requires a separate blog post, but here are a few ideas to get you started.
Starting with a shared model, model your schema as a shared schema and keep a single database cluster, but treat your shared compute code as if you’re using a per-tenant namespace:
- Pass the tenant identifier when requesting a connection from your connection factory, even if in the beginning it just returns a connection from a tenant-agnostic pool. Later on, you can extend it to query/cache a “per-tenant namespace registry” without impacting the rest of the code.
- Choose a multi-tenant-aware ORM if you plan on using an ORM.
- Invest in an onboarding and off-boarding automation jobs, they will repay themselves and are also great when writing tests.
In each model decision (per-tenant or shared), we recommend to put the tenant identifier in each table. Yes, it adds more capacity but it will pay its dividends. It makes it feasible to implement robust security mechanisms. If you ever need to migrate a tenant, cleanse its data, or partially clone it for testing it simplifies it considerably. Batch jobs can efficiently report on which tenants they performed a batch operation on and understanding how much each tenant consumes of your database is easier too.
As a side note, choosing an internal tenant identifier has large implications on database performance so consider using a database index "friendlier" method than UUID such as ULID for example (if storage capacity or database memory are a concern, you can trim down ULID to 64-bit in some use cases, if the number of new customers joining per second is low).
Using these simple rules, you’ll have much less of a headache as you start scaling if you ever want to pick-and-choose different models as your service evolves (e.g. a single shared cluster to everyone but the Enterprise customers who each will have their own database cluster).
In conclusion
Selecting the right database tenancy model for your use case is a critical decision when architecting your B2B service. Whether you choose a namespace-per-tenant or a shared namespace model, take the time to carefully consider the trade-offs and any implications for your service's long-term success.