The Supergraph Manifesto #

Supergraph is an architecture framework that offers reference architectures, design guidelines/principles and an operating model to help multiple teams to collaborate on a self-serve platform for federated data access, API integration/composition or GraphQL APIs. An implementation artifact of the Supergraph architecture is called a supergraph (lowercase s).

Before / After Supergraph

When a supergraph is built with a GraphQL federation stack, the engine is often called a gateway or a router and the subgraph connectors are often GraphQL services.

A supergraph is typically used for the following 2 use-cases:

Self-serve API composition platform: A self-serve operating model for API integration, orchestration & aggregation
Federated data access layer: A federated data layer that allows realtime access to data sources with cross-domain composability (joins, filtering etc.) Related: Data mesh, data products

Strategy and Core concepts #

A supergraph approach aims to build a flywheel of data access and supply to incrementally improve self-service access to data and APIs.

I. CONNECT domains #

Domain owners (or data owners, or API producers) should be able to seamlessly connect their domains to the platform. A major challenge in building supergraph is the resistance to change by the domain owners. They often oppose having to build, operate and maintain another API layer, such as a GraphQL server that creates another wrapper on their domain. This reluctance and concern is understandable and completely valid and must be systematically addressed by the supergraph platform strategy and the supergraph reference architecture.

This has two main implications for the subgraph connector’s lifecycle and runtime:

Subgraph connector CI/CD: As domain owners change their domains, the API contract published via the supergraph engine, must stay in sync with the least amount of overhead for the domain owner. The SDLC, change-management or CI/CD process of the domain owners must involve updating their API contract (eg: versioning), prevent breaking changes and keeping documentation up to date.
Subgraph connector performance: The subgraph connector must not reduce performance as compared to what is provided by accessing the underlying domain directly. API performance characteristics as measured by latency, payload size & concurrency.

Guaranteeing a smooth CI/CD process and high-performance connectivity gives domain owners confidence that they can connect their domains to the supergraph platform and iterate on changes to their domains fearlessly.

This unlocks self-serve connectivity for domain owners.

II. CONSUME APIs #

API consumers should be able to discover and consume APIs in a way that doesn’t require manual API integration, aggregation or composition effort as far as possible. API consumers have several common needs when they’re dealing with fixed API endpoints or specific data queries:

fetch different projections of data to prevent over-fetching
join data from multiple places to prevent under-fetching
filter, paginate, sort and aggregate data from multiple places

To provide an API experience that makes the consumption experience truly self-serve, there are two key requirements:

Composable API design: The API presented by the supergraph engine must allow for on-demand composability. GraphQL is a great API to express composability semantics, but regardless of the API format used, a standardized, composable API design is a critical requirement.
API portal: High-quality search, discovery and documentation of both the API and the underlying API models is critical to enable self-serve consumption. The more information that can be made available to API consumers the better. Eg: Data lineage, Authorization policies etc as appropriate.

This unlocks self-serve consumption for API consumers

III. DISCOVER demand #

Understanding how API consumers use their domain and identify their unmet needs is crucial for API producers. This insight allows API producers to enhance their domain. It also helps discover new domain owners to connect their domain into the supergraph.

This necessitates 2 key capabilities of the supergraph platform to create a consumer-first, agile culture:

API consumption, API schema & portal analytics: A supergraph is analogous to a marketplace and needs to provide the marketplace owners and producers with insights to help improve the marketplace for the consumers.
Ecosystem integrations: The supergraph platform should be able to integrate with existing communication and catalog tools, in particular to help understand unmet demand of API consumers.

This closes the loop and allows the supergraph platform to create a virtuous cycle of success for producers and consumers.

Architecture guide #

CI/CD and build system (control plane) #

The control plane of the supergraph is critical to help domain owners connect their domains to the supergraph.

There are 3 components in the control plane of the supergraph

The domain itself
The subgraph
The supergraph

The control plane should define the following SDLC to help keep the supergraph in sync with the domain as the underlying domain changes.

Distributed data plane #

The supergraph data plane is critical to enable high performance access to upstream domains so that API producers can maintain their domain without hidden future maintenance costs:

API schema design guide #

Standardization #

A supergraph API schema should create standardized conventions on the following:

Standardization Attribute	Capability
S1	Separating models (resources) & commands (methods) Example Models are collections of data that can be queried in standardized source-agnostic ways (eg: resources) Commands are methods that map to particular pieces of business logic that might return references to other commands or models(eg: methods) `# A standardized way to fetch a list of authors query GetAuthors { authors { id name } } # A specific method to search for authors query findAuthors { search_authors(args: {search: "Einstein"}) { id name } }`
S2	Model filtering Example Get a list of articles published this year `query articlesThisYear { articles(where: {publishDate: {_gt: "2024-01-01"}}) { id name } }`
S3	Model sorting Example Get a list of articles sorted in reverse by the date of publishing `query sortedArticles { article(order_by: {publishDate: desc}) { id title author_id } }`
S4	Model pagination Example Paginate the above list with 20 objects per page and fetch the 3rd page `query sortedArticlesThirdPage { article(order_by: {publishDate: desc}, offset: 40, limit: 20) { id title author_id } }`
S5	Model aggregations over fields Example Get a count of authors and their average age `query authorStatistics { author_aggregate { aggregate { count # basic aggregation support by any model avg { # supported over any numeric fields of a type age } } } }`

Prior art

Google Cloud API design guide
- Resource: A resource-oriented API is generally modeled as a resource hierarchy, where each node is either a simple resource or a collection resource
- Method: Resources are manipulated via a small set of methods

Composability #

The supergraph API is typically a GraphQL / JSON API. There are varying degrees of composability an API can offer, as listed out in the following table:

Composability Attribute	Capability	Description
C1	Joining data	Join related data together in a "foreign key" like join Example Get a list of authors and their articles `query authorWithArticles { author { id name articles { id title } } }`
C2	Nested filtering	Filter a parent by a property of its child (i.e. a property of a related entity) Example Get a list of authors whose have published an article this year `query recentlyActiveAuthors { author(where: {articles: {publishDate: {_gt: "2024-01-01"}}}) { id name } }`
C3	Nested sorting	Sort a parent by a property of its child (i.e. a property of a related entity) Example Get a list of articles sorted by the names of their author `query sortedArticles { article(order_by: {author: {name: asc}}) { id title } }`
C4	Nested pagination	Fetch a paginated list of parents, along with a paginated & sorted list of children for each parent Example Get the 2nd page of a list of authors and the first page of their articles, sorted by the article's title field `query paginatedAuthorsWithSortedPaginatedArticles { author(offset: 10, limit: 20) { id name articles(offset: 0, limit: 25, order_by: {title: asc}) { title publishDate } } }`
C5	Nested aggregation	Aggregate a child/parent in the context of its parent/child Example Get a list of authors and the number of articles written by each author `query prolificAuthors { author (limit: 10) { id name articles_aggregate { count } } }`

These composability attributes are what increase the level of self-serve composition and reduce the need for manual API aggregation and composition.