Ideas for improvements and new features in EventStoreDB, client libraries, and database extensions.
One of the main concerns developers have about impoementing Event Sourcing is so-called "eventual consistency". It means that when a system implements one or more query models either in ESDB using custom projections or in other databases using catch-up subscriptions, it takes a bit of time until those query models get updated after an append transaction on a stream.
If the system behaviour involves retrieving data from a query model right after the transaction is executed, it most likely retrieve stale data because the query model hasn't been updated yet.
Defining policies that list downstream data propagation (custom projection, catch-up or persistent subscription) would allow eliminating this concern.
For example, a named policy could include one or more downsream data propagators by name or id. Then, when the client executes a transaction, it requests the server to ensure that the policy is satisfied. It means that the server will execute the transaction, but won't acknowledge it to the client until all the downstream data propagators listed in the policy have passed the transaction's commit position. When the policy is satisfied, the server acknowledges the transaction to the client.
The main implementation concern here are:
Transaction awareness across the cluster when the leader changes. It could be that data propagation takes several seconds, and while the server (leader) is waiting, an election is triggered, and the cluster gets a new leader, which is unaware of the pending acknowledgement.
Severe delays in downstream propagation can cause long acknowledgement times. Essentially, the transaction can timeout, but the append operation has already been executed and cannot be reverted. It means that the append operation has been completed but the policy condition wasn't fulfilled before the operation timed out. Potentially, this issue can be mitigated by returning a timeout error where the policy conditions are explained (i.e. two of three propagators completed the work but one hasn't) so the user can decide how to deal with that. Eventually, the data will be propagated anyway.
Poison messages can cause downstream query model updtes to crash in a loop, so transactions with consistency policies would never be acknowledged, and the system eventually halts. Today, if one subscription has an issue like that, the problem is isolated there. With consistency policies, both read and write operations will be affected. The issue might be less relevant because using consistency policies is a user's choice, and without using them everything will work as today.
Looks good ,
I think we should stress out very explicitly that it is not a transactional behavior between Append & downstream consumption.
so in the research bit one of the requirement should be that there must be a timeout and that it can not be "too" long .
Dependency on clients:
the append response will probably need to be changed to inform about the append & downstream consistency
something like
(Appended, Consumer Consistent)
(Appended , Consumer Consistency Timeout)
( Not Appended, _ )