Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_against_cluster_broken tests can't be added to #354

Closed
KJTsanaktsidis opened this issue Apr 26, 2024 · 11 comments
Closed

test_against_cluster_broken tests can't be added to #354

KJTsanaktsidis opened this issue Apr 26, 2024 · 11 comments
Assignees

Comments

@KJTsanaktsidis
Copy link
Contributor

The tests in test_against_cluster_broken.rb send a SHUTDOWN command to a redis node and validate how the client responds to this. These are great integration tests against real failure scenarios!

However, the tests have no way to actually bring that node back up afterwards, so it's impossible to add more test cases to that file; the second test case already has a missing node, the third test case will have two missing nodes, etc.

I'd like to refactor how these integration tests are set up to make them more self-contained and able to actually reset the cluster to its original state after every test case. This will enable me to add more test cases to the file.

What I'm thinking, roughly, is:

  • Make cluster_controller.rb actually able to start a redis cluster on its own. That means have it download the Redis binaries, spawn child processes, and configure the clustering on them, like https://github.com/redis/redis-rb/blob/a06456c720eb3148a42db8d5af474073663b40d9/makefile#L108 does.
  • Make two rake tasks which uses the cluster_controller to start & stop a cluster
  • Make the integration tests reset the state of the cluster (restarting dead nodes, reassigning shards back in balance, etc) with the cluster_controller in between test cases

So the idea would be that you could run the tests like this:

$ bundle exec rake cluster:start # Starts a cluster, saves the port & pid details to a file in tmp/
$ bundle exec rake test # Runs the normal tests, using the just-started cluster
$ bundle exec rake test:integration # Runs the test_against_cluster_{scale,broken,state} tests, using the cluster from step 1 and restarting/re-creating it between each test case
$ bundle exec rake cluster:stop # Stop the cluster

As a bonus, this would make running the tests work on both MacOS & Linux the same way (at the moment, the dockerfiles are not usable on macos because the internal network is not accessible from mac, only the forwarded ports are (e.g. see redis/redis#7460).

What do you think? Is this something I can begin work on?

@supercaracal
Copy link
Member

Thank you for your proposal to enhance integration test cases.

I had acknowleged the difference of the docker network mode between Linux and MacOS. I decided to use docker compose bacause I don't want to build Redis server, GitHub Actions machine is Ubuntu and I'm using WSL2. But I know majority of engineers use MacOS. So as you said, current our development environment may be awkward.

Please give me some time to think about the follows.

  • MacOS user friendly redis server management
  • Flexibility of integration test cases
  • Shortening the time of integration test cases

@KJTsanaktsidis
Copy link
Contributor Author

OK - let me know when you've got something you'd like to discuss. Happy to try out any way you think is good. I need to fix this so I can get the test in #355 working :)

@KJTsanaktsidis
Copy link
Contributor Author

Hey @supercaracal have you had any more thoughts about my plan to fix the test_against_cluster_broken tests? Is this something I can begin working on?

@KJTsanaktsidis
Copy link
Contributor Author

@supercaracal I would like to come back to this at some point in the near future - would you be receptive to a PR along the lines of what's in my original message?

@supercaracal
Copy link
Member

Hello, @KJTsanaktsidis

I apologize for my too late replying about this. I think it would be better to avoid a concurrent use of containers and binaries. So I'm still wondering about this change because it needs drastically modifications for the testing environment. Since this issue is low priority in my plan, I'm hard to take my time to review or to improve our test environment for a while. Although I'm keen to review or to enhance our test environment if it would be simpler and elegant anytime.

Sincerely,

@KJTsanaktsidis
Copy link
Contributor Author

I think it would be better to avoid a concurrent use of containers and binaries

This is a fair point - I can make the test cases communicate with the Docker daemon (either directly over the Docker socket, or with the docker-compose CLI tool) and manage the redis instances that way. This would mean that you must run the tests with docker-compose though. Is that acceptable?

@supercaracal
Copy link
Member

I think it may be an option to correspond this issue. Or DEBUG command may be useful.

https://github.com/redis/redis/blob/f60370ce28b946c1146dcea77c9c399d39601aaa/src/debug.c#L393-L497
enable-debug-command yes

@KJTsanaktsidis
Copy link
Contributor Author

The DEBUG command doesn’t have a way to bring a server back up after shutting it down (other than with CRASH-AND-RECOVER, but that will be flaky because all it can do is recover after a certain timeout, not “the end of this test”).

I’ll try and put something together based on docker-compose this week.

@supercaracal
Copy link
Member

supercaracal commented Sep 30, 2024

Although they still doesn't work for macOS, I've organized integration test cases as the followings:

test senario
test/test_against_cluster_broken.rb failover, a part of nodes down, recovering
test/test_against_cluster_down.rb whole cluster down, rebuild
test/test_against_cluster_scale.rb scale out, scale in
test/test_against_cluster_state.rb resharding

@supercaracal
Copy link
Member

The version 0.11.6 has been released.
https://rubygems.org/gems/redis-cluster-client/versions/0.11.6

@supercaracal
Copy link
Member

Although the other issue about the docker network mode per os still remained, I'll close this issue because the main problem was solved. Feel free to reopen it whenever some additional related problems should be resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants