-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed Up Publishing Times #1120
base: develop
Are you sure you want to change the base?
Conversation
…peedcopy on windows
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice! :) I've added some comments which are mostly cosmetics. I wouldn't have expected that concurrency on the file transfers would've made such a big difference.
The amount of threads being started to do the transfer - I wonder whether we should still customize that - because it may be quite many (e.g. 32 threads), see: https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.ThreadPoolExecutor
But if this doesn't give issues in production on "regular publishing" on a workstation and doesn't block the user's machine significantly I'd say we can keep the defaults just fine.
# this is needed until speedcopy for linux is fixed | ||
if sys.platform == "win32": | ||
from speedcopy import copyfile | ||
else: | ||
from shutil import copyfile |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is copied from here.
However, we're also using speedcopy
in the codebase without that fallback, e.g. here:
import speedcopy import speedcopy
Which may hint that either the fallback is redundant nowadays - or we have other areas in the codebase that are potentially buggy on Linux? @iLLiCiTiT @antirotor do you know?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Speedcopy is in dependencies of AYON (
ayon-core/client/pyproject.toml
Line 10 in c7899fb
speedcopy = "^2.1" |
Commenting to show the intrest and following this up This a major problem actually, our large FX caches are taking hours to copy, Finally some relief! |
Co-authored-by: Roy Nieterau <roy_nieterau@hotmail.com>
Co-authored-by: Roy Nieterau <roy_nieterau@hotmail.com>
Co-authored-by: Roy Nieterau <roy_nieterau@hotmail.com>
Co-authored-by: Roy Nieterau <roy_nieterau@hotmail.com>
Co-authored-by: Roy Nieterau <roy_nieterau@hotmail.com>
Co-authored-by: Roy Nieterau <roy_nieterau@hotmail.com>
|
||
self._transferred.append(dst) | ||
self._transferred.append(dst) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically this runs in a thread now - and hence it should not do this on an object outside of the thread. However, in Python implementations this should still be safe due to the GIL and how the interpreter locks, etc.
Nonetheless, in a potential non-GIL or free threaded Python 3 world this would be unsafe and instead we should be instead relying on the future.result()
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've cleaned up the code to match code style guide + added error handling in both cases (or at least, have it raise and log the errors and stop as soon as possible on an error instead of continuing to finish the rest of the threaded queue.
@iLLiCiTiT @antirotor thoughts?
I have done some simple tests from Maya, Fusion and Silhouette; all good! - however I did not do any speed comparisons.
As far as I know, usually 'threading' file transfers should not make things much faster because the slowness would be IO-bound or network-speed bound regardless. So this making a huge performance impact might need some testing in a few production scenarios. I did see some shuffled outputs in the logs where it logged e.g. frame 59, 55 then 60. It'd hint at the very least that it did speed up something as another part what still waiting to continue.
Also note that the max_workers=8
seems relatively arbitrarily chosen, and depending on actual speed test runs and comparisons a more optimal figure could be found. Without doing actual speed comparisons we may just as well remove the max_workers
and have it use the defaults of ThreadPoolExecutor
instead (which may be much higher depending on the machine's CPU count in the running process however.)
For completeness sake, if it now DOES error then e.g. an output log turns into:
ERROR: Error occurred in worker
Traceback (most recent call last):
File "C:\Program Files\BorisFX\Silhouette 2024.5\resources\python\lib\concurrent\futures\thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "F:\dev\ayon-core\client\ayon_core\lib\file_transaction.py", line 155, in _transfer_file
copyfile(src, dst)
File "C:\Users\User\AppData\Local\Ynput\AYON\dependency_packages\ayon_2406251801_windows.zip\dependencies\speedcopy\__init__.py", line 332, in copyfile
ret = COPYFILE('\\\\?\\' + source_file,
File "_ctypes/callproc.c", line 997, in GetResult
PermissionError: [WinError -2147024891] Access is denied
CRITICAL: Error when registering
Traceback (most recent call last):
File "F:\dev\ayon-core\client\ayon_core\plugins\publish\integrate.py", line 158, in process
self.register(instance, file_transactions, filtered_repres)
File "F:\dev\ayon-core\client\ayon_core\plugins\publish\integrate.py", line 290, in register
file_transactions.process()
File "F:\dev\ayon-core\client\ayon_core\lib\file_transaction.py", line 129, in process
as_completed_stop_and_raise_on_error(
File "F:\dev\ayon-core\client\ayon_core\lib\threadpool.py", line 49, in as_completed_stop_and_raise_on_error
raise exceptions[-1]
File "C:\Program Files\BorisFX\Silhouette 2024.5\resources\python\lib\concurrent\futures\thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "F:\dev\ayon-core\client\ayon_core\lib\file_transaction.py", line 155, in _transfer_file
copyfile(src, dst)
File "C:\Users\User\AppData\Local\Ynput\AYON\dependency_packages\ayon_2406251801_windows.zip\dependencies\speedcopy\__init__.py", line 332, in copyfile
ret = COPYFILE('\\\\?\\' + source_file,
File "_ctypes/callproc.c", line 997, in GetResult
PermissionError: [WinError -2147024891] Access is denied
Traceback (most recent call last):
File "C:\Users\User\AppData\Local\Ynput\AYON\dependency_packages\ayon_2406251801_windows.zip\dependencies\pyblish\plugin.py", line 528, in __explicit_process
runner(*args)
File "F:\dev\ayon-core\client\ayon_core\plugins\publish\integrate.py", line 170, in process
raise exc
File "F:\dev\ayon-core\client\ayon_core\plugins\publish\integrate.py", line 158, in process
self.register(instance, file_transactions, filtered_repres)
File "F:\dev\ayon-core\client\ayon_core\plugins\publish\integrate.py", line 290, in register
file_transactions.process()
File "F:\dev\ayon-core\client\ayon_core\lib\file_transaction.py", line 129, in process
as_completed_stop_and_raise_on_error(
File "F:\dev\ayon-core\client\ayon_core\lib\threadpool.py", line 49, in as_completed_stop_and_raise_on_error
raise exceptions[-1]
File "C:\Program Files\BorisFX\Silhouette 2024.5\resources\python\lib\concurrent\futures\thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "F:\dev\ayon-core\client\ayon_core\lib\file_transaction.py", line 155, in _transfer_file
copyfile(src, dst)
File "C:\Users\User\AppData\Local\Ynput\AYON\dependency_packages\ayon_2406251801_windows.zip\dependencies\speedcopy\__init__.py", line 332, in copyfile
ret = COPYFILE('\\\\?\\' + source_file,
File "_ctypes/callproc.c", line 997, in GetResult
PermissionError: [WinError -2147024891] Access is denied
Note that this in particular is from the speedcopy on trying to copy a folder (which never worked anyway) but was the easiest error reproducable I could find. As can be seen - it does make for quite a verbose output.
Thanks so much Roy for helping me with the error handling! Much appreciated
Yes, it was chosen randomly. Wonder if we can make it a parameter or just remove it entirely?
Yes, there are other little speed ups here and there as well so it's not only because of concurrent. For example speedcopy for hero copying instead of shutil.copy. So I just did another test with all the other changes implemented except we don't have concurrent. And I clocked in at 11 min 48 seconds. Just want to manage people's expectations. But I agree, would be good to test in a few production scenarios as you mentioned. |
client/ayon_core/lib/threadpool.py
Outdated
log = logging.getLogger(__name__) | ||
|
||
|
||
def as_completed_stop_and_raise_on_error( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add this as method of file transaction. This is not much reusable elsewhere. I guess it is here for hero version integration? If yes, we should use file transactions there too. If that would be too much work, we can skip speed enhancement of hero integration in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm re-using it in two places currently - that's why I added it into a separate file. But you mean to add it to file_transaction.py
but not necessarily the FileTransaction
class itself. If so, I can definitely do that! :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've now moved it to the other file (but not a method because it's also used outside of the class). See 9c67bf1
@iLLiCiTiT better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I assume the function may get more use if we were to move more things to thread pools, e.g. transcoding, etc.
Results.
![image](https://private-user-images.githubusercontent.com/73248638/409360386-0203f546-2c1b-46dc-8c14-5f9abd88cb7e.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzg5ODc0MTAsIm5iZiI6MTczODk4NzExMCwicGF0aCI6Ii83MzI0ODYzOC80MDkzNjAzODYtMDIwM2Y1NDYtMmMxYi00NmRjLThjMTQtNWY5YWJkODhjYjdlLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMDglMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjA4VDAzNTgzMFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWQ5NDhkZWY3YWQyNzc1YTk5ZjU1ZGFhZGZlNmQ2MWRlOTBjMzc5YTQ4NmM5ZGMwY2JmYzVmNjYyZDI2YzYwMjYmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.bMnNYmJN_xxQfS0WslStS3mBXNpzDa7CJmH0rk2sfGQ)
![image](https://private-user-images.githubusercontent.com/73248638/409360322-3e7ae577-cdbc-4435-894c-9ddedff26d5c.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzg5ODc0MTAsIm5iZiI6MTczODk4NzExMCwicGF0aCI6Ii83MzI0ODYzOC80MDkzNjAzMjItM2U3YWU1NzctY2RiYy00NDM1LTg5NGMtOWRkZWRmZjI2ZDVjLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMDglMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjA4VDAzNTgzMFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWRkYzk4ZWZiZTU3Y2YwNWViZTg2ODE2YjFmMmMyYzkxZDUwOTkxYjgyYjIxMDEwMzFlYjY1YTZhMGQ2OTEyMTYmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.sKpYimRc9waL6zi2YGKwlJTfX0IODVSCdiLXbSu58ZU)
On a test publish job on the farm, have cut down publishing times from 16 mins 51 seconds to 6 mins 50 seconds.