I came across this one a couple of times today in different places, and also tripped it once locally:
The CI race job is also has failures in the same suite, across the branches. For example:
A cursory look at the code shows that these tests are racy. We are starting a worker that we hope is done before we start making assertions on data that it writes.
There are multiple approaches in the code-base to syncing on worker events, but it feels to me like we add excessive test-only logic to accommodate these quirks, and that it chips at the edges instead of being declarative and forthright. Is it worth working towards a uniform approach here?
Perhaps some of the suites can be modified to use mocks instead of spinning up Mongos.
I have some notions on refining the approach, but I think what I have done for the upgrade-series worker is worth looking at - it is a stab at being declarative about how the the test will behave when creating the worker, passing “behaviours”.
One of these behaviours is how many invocations of loop watchers will be triggered. There is a test-level synchronisation on the provision of the notifications, which is working well in the tests.
I am hoping to play around with some more ideas prior to the Brussels engineering sprint, but if anyone has opinions on that code, I’d welcome the discussion.