-
-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Testing of ConcurrentCache via logging #960
base: memory_cache
Are you sure you want to change the base?
Conversation
No new unit tests were added - merely the sole not so important unit-test related to ConcurrentCache was moved to the new test file.
NamedThread::getCurrentThreadName() didn't work correctly if a new NamedThread was created after an existing NamedThread object was joined (the problem came from the join()-ed thread's id being reused for the new thread).
It fails since orchestration of concurrent execution via logging is not yet implemented.
Up to this point the purpose was to implement orchestration of concurrent execution via logging. It was done entirely in the unit test file test/log.cpp with the purpose of minimizing the build+test cycle (modifying the src/log.h header would trigger recompilation of much more stuff). Now that the code is (almost) ready, it has to be moved into the right header and source files. This is the first step.
With NamedThread now having a dedicated .cpp file, its static data members can be hidden there in an unnamed namespace, taking the associated standard library headers with them.
Added testing of log_debug_func_call() in multithreaded environment. The new test sporadically fails due to some not yet understood race condition (probably coming since orchestration was added). Here is an example failure: [ RUN ] Log.nestedFunctionCallsMultithreaded ../../../../home/leon/freelancing/kiwix/builddir/SOURCE/libzim/test/log.cpp:243: Failure Expected equality of these values: Logging::getInMemLogContent() Which is: "thread#2: Calling bar(qwerty)...\nthread#1: Calling bar(......" targetOutput Which is: "thread#2: Calling bar(qwerty)...\nthread#1: Calling bar(......" With diff: @@ -7,5 +7,6 @@ thread#2: foo(\"qwerty\", 1) { thread#2: fooing qwerty -thread#1: foo(\"asdf\", thread#2: 1) {} (return value: 6) +thread#1: foo(\"asdf\", 1) { +thread#2: } (return value: 6) thread#1: fooing asdf thread#1: } (return value: 4) [ FAILED ] Log.nestedFunctionCallsMultithreaded (1 ms)
Note that string values are recorded in FunctionCallLogger::returnValue_ quoted with double quotes, thus an empty string return value doesn't confuse the logic of this change.
class Logging now has a clean public API. All the stuff that is needed to support the implementation of the log_debug*() macros was moved into a separate namespace LoggingImpl.
... from LoggingImpl::FunctionCallLogger class to LoggingImpl namespace. Also moved the definition of the non-template variants of those functions to the .cpp file.
This should have been done in an earlier commit.
(I only have a global look, not a thorough review of each commits) Testing that something works correctly through logging seems a bit odd to me. I am also uncomfortable with the orchestration. I was about to ask you how you handle the unpredictability of the output ordering caused by multithreading until I realize you orchestrate the thread based on the expected output. But doing so, you "linearize" the code workflow. The log system is locking threads to ensure that only one thread can log at the same time. So testing is changing the behavior of what is tested (mostly, locks in the concurrent cache become somehow useless) and we are not testing what will run in production. |
That is exactly the trick. Each instantiation of a mutlithreaded execution is equivalent to some linearization of it (assuming that there are no race conditions, i.e. access to shared resources is properly protected). Then I can define in the tests the linearizations that the code should handle.
Consider the situation that you currently have in More generally, tests for concurrent flows must be written so as to cover various linearizations. In a TDD approach, such tests can help producing the right code for concurrency management. You are right that the approach lacks means for detecting absence of correct synchronization between threads. Thanks for pointing this out! I was mostly focused on testing the logic governing the execution under different scenarios, assuming that all measures against race conditions are in place. It should still allows to catch bugs leading to deadlocks or livelocks. I will think how that approach (possibly, after some enhancement) can be utilized to catch issues with potential race conditions. |
Said like that I agree with you. But then I would prefer a proper name (as |
This has a small user-observable change in behavior. Previously lru_cache::setMaxCost() might leave the cache empty. Currently, at least one (most-recently used) item is preserved in the cache (even if its cost exceeds the set limit). This is consistent with adding that same item to an empty cache.
The changes in the unit tests are better understood if whitespace changes are ignored (git diff -w).
This test too intermittently fails (likely to the suspected but not yet tracked-down race condition in logging and/or namedthread code). This time however it fails either due to a deadlock, or obvious memory corruption like below: [ RUN ] ConcurrentCacheMultithreadedTest.accessSameExistingItem ../../../../home/leon/freelancing/kiwix/builddir/SOURCE/libzim/test/concurrentcache.cpp:429: Failure Expected equality of these values: zim::Logging::getInMemLogContent() Which is: "thread#0: Output of interest starts from the next line\na : ConcurrentCache::getOrPut(5) {\n b: ConcurrentCache::getOrPut(5) {\na : ConcurrentCache::getCacheSlot(5) {\na : entered synchronized section\n b: ConcurrentCache::getCacheSlot(5) {\na : lru_cache::getOrPut(5) {\na : already in cache, moved to the beginning of the LRU list.\na : }\na : exiting synchronized section\na : }\n b: entered synchronized section\na : Obtained the cache slot\n b: lru_cache::getOrPut(5) {\na : } (return value: 50)\n\0 b: already in cache, moved to the beginning of the LRU list.\n b: }\n b: exiting synchronized section\n b: }\n b: Obtained the cache slot\n b: } (return value: 50)\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0 " targetOutput Which is: "thread#0: Output of interest starts from the next line\na : ConcurrentCache::getOrPut(5) {\n b: ConcurrentCache::getOrPut(5) {\na : ConcurrentCache::getCacheSlot(5) {\na : entered synchronized section\n b: ConcurrentCache::getCacheSlot(5) {\na : lru_cache::getOrPut(5) {\na : already in cache, moved to the beginning of the LRU list.\na : }\na : exiting synchronized section\na : }\n b: entered synchronized section\na : Obtained the cache slot\n b: lru_cache::getOrPut(5) {\na : } (return value: 50)\n b: already in cache, moved to the beginning of the LRU list.\n b: }\n b: exiting synchronized section\n b: }\n b: Obtained the cache slot\n b: } (return value: 50)\n" With diff: @@ -14,5 +14,5 @@ b: lru_cache::getOrPut(5) { a : } (return value: 50) -\0 b: already in cache, moved to the beginning of the LRU list. + b: already in cache, moved to the beginning of the LRU list. b: } b: exiting synchronized section @@ -19,4 +19,3 @@ b: } b: Obtained the cache slot - b: } (return value: 50) -\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0 + b: } (return value: 50)\n [ FAILED ] ConcurrentCacheMultithreadedTest.accessSameExistingItem (2 ms)
47181b2
to
5a114d5
Compare
Without this fix there were 107 failures out of 700 runs of the ConcurrentCacheMultithreadedTest.accessSameExistingItem unit test (with timeout value of 100 ms). The fix brought the failure count to 0.
Diff much simpler if whitespaces changes are ignored (git diff -w)
If an item that is in the process of being added to the cache is requested from the cache within the time window when it has been (temporarily) evicted from the cache (as a result of the concurrent turmoil), a concurrent evaluation of the same item is started while the previous one is still in progress.
An item being added to the cache is never evicted due to concurrent turmoil. LRU eviction heuristics was changed to only evict items with non-zero cost.
e229738
to
e34bd71
Compare
@mgautierfr I am done with the preliminary version of my contribution to your PR. I will have to clean it up significantly. But its second part (focusing on the testing of ConcurrentCache, starting from 6aa5aee) will be almost unaffected, so I invite you to look at it from there. |
Syncrhonization is only one of the uses of logging/tracing. Currently it is done with two logging statements (before and after the mutex locking operation) that must kept together in the target log of the simulated flow. A But logging also helps to visualize the operation of what is otherwise difficult to tell (for example, absence of double-work addressed by the last commit). |
This is my work in pursue of #956 (comment)
I haven't started the redesign of ConcurrentCache yet. I am creating this PR so that you can see my approach to the testing of multithreaded flows and provide some early feedback while I keep working on the unit-tests of the enhanced concurrent cache and its clean-up.