Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed Up Publishing Times #1120

Open
wants to merge 15 commits into
base: develop
Choose a base branch
from

Conversation

r42-chun
Copy link
Contributor

@r42-chun r42-chun commented Feb 4, 2025

  1. Use concurrent.futures to copy files
  2. Use speedcopy.copyfile for windows when copying Hero files as shutil.copy is extremely slow.

Results.
On a test publish job on the farm, have cut down publishing times from 16 mins 51 seconds to 6 mins 50 seconds.
image
image

Copy link
Collaborator

@BigRoy BigRoy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! :) I've added some comments which are mostly cosmetics. I wouldn't have expected that concurrency on the file transfers would've made such a big difference.

The amount of threads being started to do the transfer - I wonder whether we should still customize that - because it may be quite many (e.g. 32 threads), see: https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.ThreadPoolExecutor

But if this doesn't give issues in production on "regular publishing" on a workstation and doesn't block the user's machine significantly I'd say we can keep the defaults just fine.

Comment on lines +6 to +10
# this is needed until speedcopy for linux is fixed
if sys.platform == "win32":
from speedcopy import copyfile
else:
from shutil import copyfile
Copy link
Collaborator

@BigRoy BigRoy Feb 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is copied from here.

However, we're also using speedcopy in the codebase without that fallback, e.g. here:

Which may hint that either the fallback is redundant nowadays - or we have other areas in the codebase that are potentially buggy on Linux? @iLLiCiTiT @antirotor do you know?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Speedcopy is in dependencies of AYON (

speedcopy = "^2.1"
) and it has its own fallback if it cannot use server-side copy. What is the issue with spedcopy on linux?

client/ayon_core/lib/file_transaction.py Outdated Show resolved Hide resolved
client/ayon_core/lib/file_transaction.py Outdated Show resolved Hide resolved
client/ayon_core/lib/file_transaction.py Outdated Show resolved Hide resolved
client/ayon_core/lib/file_transaction.py Outdated Show resolved Hide resolved
client/ayon_core/lib/file_transaction.py Outdated Show resolved Hide resolved
client/ayon_core/lib/file_transaction.py Outdated Show resolved Hide resolved
@krishnaavril
Copy link

Commenting to show the intrest and following this up

This a major problem actually, our large FX caches are taking hours to copy, Finally some relief!

@iLLiCiTiT iLLiCiTiT added the type: enhancement Improvement of existing functionality or minor addition label Feb 4, 2025
r42-chun and others added 7 commits February 4, 2025 10:23
Co-authored-by: Roy Nieterau <roy_nieterau@hotmail.com>
Co-authored-by: Roy Nieterau <roy_nieterau@hotmail.com>
Co-authored-by: Roy Nieterau <roy_nieterau@hotmail.com>
Co-authored-by: Roy Nieterau <roy_nieterau@hotmail.com>
Co-authored-by: Roy Nieterau <roy_nieterau@hotmail.com>
Co-authored-by: Roy Nieterau <roy_nieterau@hotmail.com>

self._transferred.append(dst)
self._transferred.append(dst)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically this runs in a thread now - and hence it should not do this on an object outside of the thread. However, in Python implementations this should still be safe due to the GIL and how the interpreter locks, etc.

Nonetheless, in a potential non-GIL or free threaded Python 3 world this would be unsafe and instead we should be instead relying on the future.result() instead.

Copy link
Collaborator

@BigRoy BigRoy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've cleaned up the code to match code style guide + added error handling in both cases (or at least, have it raise and log the errors and stop as soon as possible on an error instead of continuing to finish the rest of the threaded queue.

@iLLiCiTiT @antirotor thoughts?

I have done some simple tests from Maya, Fusion and Silhouette; all good! - however I did not do any speed comparisons.

As far as I know, usually 'threading' file transfers should not make things much faster because the slowness would be IO-bound or network-speed bound regardless. So this making a huge performance impact might need some testing in a few production scenarios. I did see some shuffled outputs in the logs where it logged e.g. frame 59, 55 then 60. It'd hint at the very least that it did speed up something as another part what still waiting to continue.

Also note that the max_workers=8 seems relatively arbitrarily chosen, and depending on actual speed test runs and comparisons a more optimal figure could be found. Without doing actual speed comparisons we may just as well remove the max_workers and have it use the defaults of ThreadPoolExecutor instead (which may be much higher depending on the machine's CPU count in the running process however.)


For completeness sake, if it now DOES error then e.g. an output log turns into:

ERROR: Error occurred in worker
Traceback (most recent call last):
  File "C:\Program Files\BorisFX\Silhouette 2024.5\resources\python\lib\concurrent\futures\thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "F:\dev\ayon-core\client\ayon_core\lib\file_transaction.py", line 155, in _transfer_file
    copyfile(src, dst)
  File "C:\Users\User\AppData\Local\Ynput\AYON\dependency_packages\ayon_2406251801_windows.zip\dependencies\speedcopy\__init__.py", line 332, in copyfile
    ret = COPYFILE('\\\\?\\' + source_file,
  File "_ctypes/callproc.c", line 997, in GetResult
PermissionError: [WinError -2147024891] Access is denied

CRITICAL: Error when registering
Traceback (most recent call last):
  File "F:\dev\ayon-core\client\ayon_core\plugins\publish\integrate.py", line 158, in process
    self.register(instance, file_transactions, filtered_repres)
  File "F:\dev\ayon-core\client\ayon_core\plugins\publish\integrate.py", line 290, in register
    file_transactions.process()
  File "F:\dev\ayon-core\client\ayon_core\lib\file_transaction.py", line 129, in process
    as_completed_stop_and_raise_on_error(
  File "F:\dev\ayon-core\client\ayon_core\lib\threadpool.py", line 49, in as_completed_stop_and_raise_on_error
    raise exceptions[-1]
  File "C:\Program Files\BorisFX\Silhouette 2024.5\resources\python\lib\concurrent\futures\thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "F:\dev\ayon-core\client\ayon_core\lib\file_transaction.py", line 155, in _transfer_file
    copyfile(src, dst)
  File "C:\Users\User\AppData\Local\Ynput\AYON\dependency_packages\ayon_2406251801_windows.zip\dependencies\speedcopy\__init__.py", line 332, in copyfile
    ret = COPYFILE('\\\\?\\' + source_file,
  File "_ctypes/callproc.c", line 997, in GetResult
PermissionError: [WinError -2147024891] Access is denied

Traceback (most recent call last):
  File "C:\Users\User\AppData\Local\Ynput\AYON\dependency_packages\ayon_2406251801_windows.zip\dependencies\pyblish\plugin.py", line 528, in __explicit_process
    runner(*args)
  File "F:\dev\ayon-core\client\ayon_core\plugins\publish\integrate.py", line 170, in process
    raise exc
  File "F:\dev\ayon-core\client\ayon_core\plugins\publish\integrate.py", line 158, in process
    self.register(instance, file_transactions, filtered_repres)
  File "F:\dev\ayon-core\client\ayon_core\plugins\publish\integrate.py", line 290, in register
    file_transactions.process()
  File "F:\dev\ayon-core\client\ayon_core\lib\file_transaction.py", line 129, in process
    as_completed_stop_and_raise_on_error(
  File "F:\dev\ayon-core\client\ayon_core\lib\threadpool.py", line 49, in as_completed_stop_and_raise_on_error
    raise exceptions[-1]
  File "C:\Program Files\BorisFX\Silhouette 2024.5\resources\python\lib\concurrent\futures\thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "F:\dev\ayon-core\client\ayon_core\lib\file_transaction.py", line 155, in _transfer_file
    copyfile(src, dst)
  File "C:\Users\User\AppData\Local\Ynput\AYON\dependency_packages\ayon_2406251801_windows.zip\dependencies\speedcopy\__init__.py", line 332, in copyfile
    ret = COPYFILE('\\\\?\\' + source_file,
  File "_ctypes/callproc.c", line 997, in GetResult
PermissionError: [WinError -2147024891] Access is denied

Note that this in particular is from the speedcopy on trying to copy a folder (which never worked anyway) but was the easiest error reproducable I could find. As can be seen - it does make for quite a verbose output.

@r42-chun
Copy link
Contributor Author

r42-chun commented Feb 5, 2025

Thanks so much Roy for helping me with the error handling! Much appreciated

Also note that the max_workers=8 seems relatively arbitrarily chosen,

Yes, it was chosen randomly. Wonder if we can make it a parameter or just remove it entirely?

As far as I know, usually 'threading' file transfers should not make things much faster because the slowness would be IO-bound or network-speed bound regardless. So this making a huge performance impact might need some testing in a few production scenarios.

Yes, there are other little speed ups here and there as well so it's not only because of concurrent. For example speedcopy for hero copying instead of shutil.copy. So I just did another test with all the other changes implemented except we don't have concurrent. And I clocked in at 11 min 48 seconds. Just want to manage people's expectations. But I agree, would be good to test in a few production scenarios as you mentioned.

image

log = logging.getLogger(__name__)


def as_completed_stop_and_raise_on_error(
Copy link
Member

@iLLiCiTiT iLLiCiTiT Feb 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add this as method of file transaction. This is not much reusable elsewhere. I guess it is here for hero version integration? If yes, we should use file transactions there too. If that would be too much work, we can skip speed enhancement of hero integration in this PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm re-using it in two places currently - that's why I added it into a separate file. But you mean to add it to file_transaction.py but not necessarily the FileTransaction class itself. If so, I can definitely do that! :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've now moved it to the other file (but not a method because it's also used outside of the class). See 9c67bf1

@iLLiCiTiT better?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I assume the function may get more use if we were to move more things to thread pools, e.g. transcoding, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size/XS type: enhancement Improvement of existing functionality or minor addition
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants