Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CDC Polling multiple databases #6036

Open
kberesfo opened this issue Mar 3, 2025 · 8 comments
Open

CDC Polling multiple databases #6036

kberesfo opened this issue Mar 3, 2025 · 8 comments
Labels
enhancement New feature or request feature

Comments

@kberesfo
Copy link

kberesfo commented Mar 3, 2025

Is your feature request related to a problem? Please describe.
I'm developing a B2B SaaS application where customer data is segregated at the database level, and requests are routed via the session and database driver. However, the current polling mechanism only allows polling from a single database, which creates a limitation for change data capture (CDC).

I need to track events in real time, such as user actions triggering notifications. Since CDC does not work across multiple databases efficiently, I'm forced to use the v5 LTS version of the GraphQL library, which allows me to handle this at the resolver level. While this works for now, it presents scalability concerns as the application grows.

Describe the solution you'd like
I would like a way to support polling across multiple databases natively, ensuring that events from different databases can be captured and processed efficiently. Ideally, this would be done in a way that minimizes performance overhead and ensures real-time event streaming without requiring resolver-level workarounds.

Describe alternatives you've considered
Polling per database instance – This is inefficient and scales poorly.
Using the v5 LTS GraphQL library – Works at the resolver level but is not sustainable long-term.
Event-driven architecture with a separate event bus – Would require significant changes and additional infrastructure, which may not be feasible in the short term.
Additional context
This is a common challenge for multi-tenant applications that segregate data by database. A more efficient CDC mechanism that supports multi-database polling would benefit many SaaS applications following this model.

@kberesfo kberesfo added the enhancement New feature or request label Mar 3, 2025
@angrykoala
Copy link
Member

In order to implement multi-db subscription, an option we have discussed would be to allow passing an array of queryConfig or sessionConfig to Neo4jGraphQLSubscriptionsCDCEngine

A bit like:

const engine = new Neo4jGraphQLSubscriptionsCDCEngine({
    driver,
    pollTime: 5000
    sessionConfig: [{ database: "my-database-1" }, {database: "my-database-2"}]
})

Then, the subscriptionCDCEngine would poll on all the sessions provided. Note that this will have a performance impact, as the polling itself will happen N times instead of once.

Additionally, the database name can be added to the event payload, so it is available to hooks listening to CDC events. In any case the database name will not be exposed to the GraphQL API.

@kberesfo
Copy link
Author

kberesfo commented Mar 4, 2025

I'm not necessarily looking for GraphQL subscriptions that span multiple databases. From an end-user perspective, I only want them to subscribe to a single database at a time. However, right now, i don't think the Neo4j GraphQL library supports routing client GraphQL subscriptions based on the database session.

At the same time, I would like to poll across multiple databases to build a pub/sub system, which probably needs to exist at the Neo4j driver level. For example, if a user joins a team ((user)-[:MEMBER]->(team)), I want to detect that a membership relationship was created and publish an event for another service to act on—such as notifying other team members.

Feature Requests

  1. GraphQL Subscription Routing (Neo4j GraphQL Library)

    • The Neo4j GraphQL library should support routing client GraphQL subscriptions based on the active database session.
    • This would allow multi-tenant applications with separate databases per customer to properly scope subscriptions without manual workarounds.
  2. Multi-Database Polling & Event Publishing (Neo4j Driver)

    • The Neo4j driver should support polling across multiple databases.
    • This would enable event detection across databases and allow for building a pub/sub system that reacts to changes across tenants.

These two features would improve event-driven workflows for multi-tenant applications while ensuring GraphQL subscriptions remain scoped to a single database per user.

Would love to hear thoughts on this!

@angrykoala
Copy link
Member

I see @kberesfo, thanks for the clarification

As you mention, the CDC polling across all databases should be doable with neo4j drivers directly, and it is not really relevant to the GraphQL library.

Regarding the first point. Keep in mind that CDC subscription work by a single permanent connection to a database that is polling CDC events, then these are routed to the subscriptions. This designs allows to have a single polling, regardless of number of subscriptions, but may make your use case a bit trickier to implement.

If I understand your use case, you want to have different subscriptions pointing to different databases, but going through the same GraphQL server instance?

@kberesfo
Copy link
Author

kberesfo commented Mar 4, 2025

Yea, thats correct, basically i have a graphql server that uses a plugin to read the jwt and routes the query to the correct database using driver.session. Since the polling only seems to route to a single database i can't enable subscriptions.

@angrykoala
Copy link
Member

angrykoala commented Mar 5, 2025

Ok, this will require further design then, the subscriptions should still work with multiple databases, as described above, but a mechanism to route subscriptions to the correct user.

Just to double check I understood the problem.

We have a single GraphQL API, user A subscribes to createdMovies, user B subscribes to createdMovies (both using the same query, to the same endpoint), but due to their jwt being different, user A should subscribe to DB1, while user B should subscribe to DB2

When a movie is created in DB1, only User A should be notified.

This example scenario is accurate to your needs @kberesfo?

@kberesfo
Copy link
Author

kberesfo commented Mar 6, 2025

yes that would be correct.

@angrykoala
Copy link
Member

An alternative solution for this approach would be to support passing the database in the subscription context for a subscription request as a callback, then begin polling.

This would make multi-tenancy more efficient, as only the databases that are subscribed to will be polled, instead of all the databases, however it presents the challenge of reusing the polling queries between users connected to the same database and stop the polling once al the subscriptions to a database have been closed

Additionally, the old behavior (poll regardless of subscriptions) should still be present, as it is needed for use cases involving hooking into the events for sending these to an external (non-graphql) service

@kberesfo
Copy link
Author

kberesfo commented Mar 7, 2025

I think both of those are viable solutions im not 100% certain how the first would work but it makes sense at a high level.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request feature
Projects
None yet
Development

No branches or pull requests

2 participants