Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: cancel filesystem traversal when listing request cancelled #672

Merged
merged 1 commit into from
Jul 15, 2024

Conversation

benmcclelland
Copy link
Member

For large directories, the treewalk can take longer than the client request timeout. If the client times out the request then we need to stop walking the filesystem and just return the context error.

This should prevent the gateway from consuming system resources uneccessarily after an incoming request is terminated.

Copy link

@chaowang-versity chaowang-versity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to add a unit test case for the cancellation of a dir walk?

@benmcclelland
Copy link
Member Author

Do you want to add a unit test case for the cancellation of a dir walk?

Thats a good idea, but the "testing/fstest" doesn't seem to have a good way to add delays in the stat or readdir implementations. I guess we could fork this and add our own, but not sure its worth the added complexity?

@chaowang-versity
Copy link

chaowang-versity commented Jul 15, 2024

Do you want to add a unit test case for the cancellation of a dir walk?

Thats a good idea, but the "testing/fstest" doesn't seem to have a good way to add delays in the stat or readdir implementations. I guess we could fork this and add our own, but not sure its worth the added complexity?

I think you can overload the ReadDir method of fstest.MapFS to add a delay as below. Another way that may be feasible in the long run is to use a Mockup library like GoMock, so that we can just mock up the io/fs.FS and don't need to rely on the testing/fstest.

type slowFS struct {
	fstest.MapFS
}

func (s *slowFS) ReadDir(name string) ([]fs.DirEntry, error) {
	time.Sleep(100 * time.Millisecond)
	return s.MapFS.ReadDir(name)
}

func TestWalkStop(t *testing.T) {
	s := &slowFS{MapFS: fstest.MapFS{
		"/a/b/c/d/e/f/g/h/i/g/k/l/m/n": &fstest.MapFile{},
	}}

	ctx, cancel := context.WithTimeout(context.Background(), 500*time.Millisecond)
	defer cancel()

	var err error
	var wg sync.WaitGroup
	wg.Add(1)
	go func() {
		defer wg.Done()
		_, err = backend.Walk(ctx, s, "", "/", "", 1000, func(path string, d fs.DirEntry) (types.Object, error) {
			return types.Object{}, nil
		}, []string{})
	}()

	select {
	case <-time.After(1 * time.Second):
		t.Fatalf("walk is not terminated in time")
	case <-ctx.Done():
	}
	wg.Wait()
	if err != ctx.Err() {
		t.Fatalf("unexpected error: %v", err)
	}
}

mlt180
mlt180 previously approved these changes Jul 15, 2024
@benmcclelland
Copy link
Member Author

Do you want to add a unit test case for the cancellation of a dir walk?

Thats a good idea, but the "testing/fstest" doesn't seem to have a good way to add delays in the stat or readdir implementations. I guess we could fork this and add our own, but not sure its worth the added complexity?

I think you can overload the ReadDir method of fstest.MapFS to add a delay as below. Another way that may be feasible in the long run is to use a Mockup library like GoMock, so that we can just mock up the io/fs.FS and don't need to rely on the testing/fstest.

type slowFS struct {
	fstest.MapFS
}

func (s *slowFS) ReadDir(name string) ([]fs.DirEntry, error) {
	time.Sleep(100 * time.Millisecond)
	return s.MapFS.ReadDir(name)
}

func TestWalkStop(t *testing.T) {
	s := &slowFS{MapFS: fstest.MapFS{
		"/a/b/c/d/e/f/g/h/i/g/k/l/m/n": &fstest.MapFile{},
	}}

	ctx, cancel := context.WithTimeout(context.Background(), 500*time.Millisecond)
	defer cancel()

	var err error
	var wg sync.WaitGroup
	wg.Add(1)
	go func() {
		defer wg.Done()
		_, err = backend.Walk(ctx, s, "", "/", "", 1000, func(path string, d fs.DirEntry) (types.Object, error) {
			return types.Object{}, nil
		}, []string{})
	}()

	select {
	case <-time.After(1 * time.Second):
		t.Fatalf("walk is not terminated in time")
	case <-ctx.Done():
	}
	wg.Wait()
	if err != ctx.Err() {
		t.Fatalf("unexpected error: %v", err)
	}
}

oh, nice test! ok, yeah I can add this in

For large directories, the treewalk can take longer than the
client request timeout. If the client times out the request
then we need to stop walking the filesystem and just return
the context error.

This should prevent the gateway from consuming system resources
uneccessarily after an incoming request is terminated.
@benmcclelland benmcclelland merged commit f949e2d into main Jul 15, 2024
12 checks passed
@benmcclelland benmcclelland deleted the ben/ctx_walk branch July 15, 2024 21:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants