-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perform 1-3 HTTP requests for each wheel using fast-deps #12208
base: main
Are you sure you want to change the base?
Conversation
44418d9
to
182b351
Compare
@dholth: got this working again; on my connection to pypi, this new code (adapted from your solution) is much much faster than multiple small range requests. A couple of things that popped up:
# download Keras wheel
> curl -L -O 'https://files.pythonhosted.org/packages/44/e1/dc0757b20b56c980b5553c1b5c4c32d378c7055ab7bfa92006801ad359ab/Keras-2.4.3-py2.py3-none-any.whl'
# this is normal; the dist-info is at the end
> unzip -l Keras-2.4.3-py2.py3-none-any.whl | tail -n10
143 2020-06-24 22:38 keras/utils/vis_utils.py
28 2020-06-24 22:38 keras/wrappers/__init__.py
117 2020-06-24 22:38 keras/wrappers/scikit_learn.py
1616 2020-06-24 22:40 Keras-2.4.3.dist-info/LICENSE
1496 2020-06-24 22:40 Keras-2.4.3.dist-info/METADATA
110 2020-06-24 22:40 Keras-2.4.3.dist-info/WHEEL
11 2020-06-24 22:40 Keras-2.4.3.dist-info/top_level.txt
6766 2020-06-24 22:40 Keras-2.4.3.dist-info/RECORD
--------- -------
73432 83 files
# download tensorflow-gpu wheel
> curl -L -O 'https://files.pythonhosted.org/packages/80/4d/3a008dc31225768318e7ba0f7f95aa4677b0936805be40e37036b7755d62/tensorflow_gpu-2.5.3-cp38-cp38-manylinux2010_x86_64.whl'
# hell
> unzip -l tensorflow_gpu-2.5.3-cp38-cp38-manylinux2010_x86_64.whl | head -n10
Archive: tensorflow_gpu-2.5.3-cp38-cp38-manylinux2010_x86_64.whl
Length Date Time Name
--------- ---------- ----- ----
111 2022-01-30 11:38 tensorflow_gpu-2.5.3.dist-info/WHEEL
11 2022-01-30 17:16 tensorflow_gpu-2.5.3.dist-info/top_level.txt
2810 2022-01-30 17:17 tensorflow_gpu-2.5.3.dist-info/METADATA
550 2022-01-30 17:16 tensorflow_gpu-2.5.3.dist-info/entry_points.txt
1046068 2022-01-30 11:38 tensorflow_gpu-2.5.3.dist-info/RECORD
13777 2022-01-30 17:17 tensorflow_gpu-2.5.3.dist-info/LICENSE
25049 2022-01-30 17:16 tensorflow/__init__.py So this change had to expand your approach to cover the case of dist-info dirs placed at the very start of a zip file. I'm assuming this was done intentionally by google in order to do zip file hacks like we're trying to do, but from the front of the file. Regardless, this also works now. |
33f2431
to
49ad8c5
Compare
cc @ewdurbin: are you familiar with why pypi might be failing to accept negative byte range requests (of the form |
Still marking this as draft until we get some test cases going, which will have to be sometime next week. |
|
Thanks so much! |
ec80945
to
a80dc04
Compare
4a03cc8
to
35ae8bc
Compare
ec4f803
to
e024a28
Compare
12c78b0
to
ac6cda5
Compare
yes!!!!!!!! omg praise |
cc @pradyunsg @notatallshaw: no rush at all, but CI passes extremely quickly now and the fix was very simple. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a partial review - I haven't looked at the changes to lazy_wheel.py
itself yet. They are much more extensive, and I've not had time to review the context of all this. It's been a long while since I last looked at lazy_wheel, and given that PyPI now serves package metadata separately (and other indexes should do the same), I'd like to review how important the whole lazy_wheel code path even is these days, before spending too much time diving into the code itself.
If you have any current data on how often this code path gets hit in typical usage of pip, and how much of a speedup it provides, then that would be very useful. Not microbenchmarks showing that the code is x% faster than it was before, but real-world figures like how much does this improve installing Jupyter, or Django, or big packages with complex dependency trees such as Apache Airflow or Home Assistant?
if src_fp := source.fp: | ||
src_name = getattr(src_fp, "name", None) | ||
logger.debug("extracting entry '%s' from zip '%s'", path, src_name) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like it's debugging output only indirectly related to this change. While it's not a bad change, it should probably be separate from this PR. Also, printing a raw None
when the source doesn't have a name attribute seems unfriendly in user-facing output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't realize that logger.debug
output was considered user-facing. I found this output useful to debug this feature, and I think it has value in order to help users debug if this feature fails. I would really prefer that if this PR is shipped, that it has this debug log, so I'm confused as to why separating it would be beneficial.
I will change the None
to the string <anonymous file object>
. If you would still prefer that this be removed from this PR, I will do so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds reasonable, let's leave it in.
link.netloc, | ||
str(e), | ||
) | ||
self._domains_without_http_range.add(link.netloc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the question here is, which is faster? To not use lazy wheel on sites where a range request ever failed, or to check every time and use lazy wheel whenever possible. A comment here explaining the trade-off and why you made the choice you did would be useful.
(I haven't looked at the three linked PRs, as IMO this PR should stand alone, so "this check will go away when further PRs are merged" isn't a reason to not document what's going on here).
@@ -362,34 +360,46 @@ def make_wheel( | |||
:param entry_points: | |||
:param record: if provided and None, then no RECORD file is generated; | |||
else if a string then sets the content of the RECORD file | |||
:param metadata_first: Put the .dist-info metadata at the front of the zip file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The wheel spec explicitly states that archivers should put the wheel metadata at the end of the archive. We absolutely should not default to True
here, and I question why we should make this configurable at all. If it's to test edge cases where pip encounters wheels with the metadata in a non-recommended place, I'd prefer to see the main test suite unchanged, but separate, well-documented tests that explicitly create and test such wheels.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, this should not be configurable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are specific wheels where this fails, in particular certain uploads of tensorflow-gpu
. See this comment #12208 (comment). tensorflow-gpu
is explicitly one of the wheels that we achieve a speedup for. I will break this out as you have proposed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added comments explaining why this test case is necessary. The main reason we have all these test cases in a single fake_packages()
fixture is because it allows us to reuse a single session-scoped server in html_index_with_range_server
instead of spinning up multiple server instances at once. Additionally, it is useful for documentation purposes to have all of our test cases described declaratively together like this. I agree that it would be useful to not have to deal with this, but since there are real wheels with this incorrect .dist-info/
, it seems appropriate to me to incorporate it into our make_wheel()
generator.
Please let me know if this is satisfactory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still uncomfortable with this, and I am definitely against making it the default - we should be explicit every time we're testing something that's not spec-compliant.
And I really don't understand why it's on us to make things work better for people who don't follow the spec, rather than reporting the issue to the offending project(s) and getting them to fix it. I'm not clear here whether this PR includes specific non-test code to improve performance for non-compliant wheels, and that in itself bothers me. Is this test checking code that would otherwise not be checked? If so, then has anyone discussed whether we're OK with including and maintaining code that has the express purpose of catering for projects that don't follow the standards?
If, on the other hand, this test is simply to ensure that the code we have doesn't break when faced with non-compliant files, then that's a different matter. In that case, my discomfort is with having code that's tricky enough to need such a test.
I don't know if I can give you an answer as to whether this is "satisfactory" - the maintainability of the code is a question for the whole @pypa/pip-committers group, and I don't want my personal views to be the only measure here.
Hiya! I don't have any great insight into the finer details, I'm just a user with > 30minute resolve excited for any performance improvements. Happy to help with any testing grunt work. @ b06d73b without fast-deps(cache cleared between runs)
@ b06d73b with fast-deps(cache cleared between runs)
So fast-deps appears slower against PyPI in this naive test. I admit I'm not sure what to make of these results or if I'm testing the right hing. My expectations is that fast-deps is only expected to be beneficial against indexes that do not support PEP 658+backfill (aka not PyPI) and would otherwise be no different in performance. Again I'm just wearing my user hat here, not sure if I'm understanding the case exactly right, and happy to run other tests if I can help. ( |
It would be a big help for me if you can you open a new issue as something like "Slow resolver performance" and give a reproducible example. I will profile it and use it as a target for improved resolution, which once pip vendors a new version of resolvelib I will be working on hopefully very big improvements to these kinds of cases. |
This needs to be tested with slow round-trip times and/or slow bandwidth. Sometimes a user who lives next to a pypi server (or an alternate index with full range request support) may not see the full story. Is there an easy way to simulate those conditions? |
There are lots of HTTP proxies available that allow you to add delay to responses/requests, .e.g https://github.com/ruwanta/delaying-proxy, https://github.com/KevCui/mitm-scripts/blob/master/mitm-delay-request.py, https://github.com/chimurai/http-proxy-middleware/blob/master/recipes/delay.md |
I reran the prior `time python3.10 -m pip install --report test.json --cache-dir=.cache-pr12208-fast --use-feature=fast-deps --dry-run --ignore-installed 'acryl-datahub[all]==0.13.2.3' tests, this time interleaving the tests. I think it confirms that variable network bandwidth makes this difficult to measure. Or perhaps that "download a bunch of stuff from PyPI" isn't the right framing. My apologies if the driv-by testing was not helpful.
|
@pfmoore: I will now review your line-by-line comments. However, I'm bewildered by your top-level comment.
I am very confused by your mention of "microbenchmarks", when in the PR description I directly provide a command line with specific dependency trees which are improved by this specific change: > python3.8 -m pip install --report test.json --dry-run --ignore-installed --use-feature=fast-deps 'numpy>=1.19.5' 'keras==2.4.3' 'mtcnn' 'pillow>=7.0.0' 'bleach>=2.1.0' 'tensorflow-gpu==2.5.3' As I specifically note in the PR description:
Please let me know if this satisfies your requirements. |
- handle short files `416`; prefetch entire dist-info - lazy_wheel: translate BadZipfile to InvalidWheel - handle 501 on negative bytes range from pypi - catch UnsupportedWheel - make lazy wheel work against tensorflow-gpu - link to pypi discussion on negative byte ranges - check for case when server doesn't support byte ranges at all - remove _check_zip() method since we do that in prefetch_dist_info() now - clean up error handling code and don't warn when negative ranges fail - remove TODO, as Cache-Control: no-cache is the correct behavior - rearrange explanatory comments - specify how we handle 200 OK and interpret 405 as no range requests - rename LazyZipOverHTTP to LazyWheelOverHTTP because of the assumption of the structure of *.dist-info/ - stream the initial chunk request too, since it's 1MB - add note about fast-deps being on by default - add testing for request limiting - fix download metadata testing - factor out the laziness from the wheel-specific logic - factor out LazyRemoteResource from FixedSizeLazyResource - add metadata_first=True arg to make_wheel - reduce initial chunk size to 10K - remove hardcoded compilewheel and instead generate wheels - catch new requests decoding error - support NegativeRangeOverflowing - support SneakilyCoerceNegativeRange - time out test server thread joining within 3 seconds Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com> - make FakePackageSource to abstract over generated and hardcoded whls - ensure InvalidWheel retains context from inner errors Co-authored-by: Randy Döring <30527984+radoering@users.noreply.github.com> - add link to perf evaluation from radoering
ideally, this should avoid collisions. no clue if this matters at all
b06d73b
to
7125a88
Compare
Again we find macOS CI issues relating to the mock server, so I will look to fix those now. relevant stacktracetests/functional/test_install_config.py::test_config_file_override_stack
/Users/runner/work/pip/pip/.nox/test-3-10/lib/python3.10/site-packages/_pytest/threadexception.py:82: PytestUnhandledThreadExceptionWarning: Exception in thread Thread-14 (serve_forever)
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "/Users/runner/work/pip/pip/.nox/test-3-10/lib/python3.10/site-packages/werkzeug/serving.py", line 819, in serve_forever
super().serve_forever(poll_interval=poll_interval)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/socketserver.py", line 229, in serve_forever
selector.register(self, selectors.EVENT_READ)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/selectors.py", line 353, in register
key = super().register(fileobj, events, data)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/selectors.py", line 239, in register
key = SelectorKey(fileobj, self._fileobj_lookup(fileobj), events, data)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/selectors.py", line 226, in _fileobj_lookup
return _fileobj_to_fd(fileobj)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/selectors.py", line 42, in _fileobj_to_fd
raise ValueError("Invalid file descriptor: {}".format(fd))
ValueError: Invalid file descriptor: -1
warnings.warn(pytest.PytestUnhandledThreadExceptionWarning(msg))
tests/functional/test_install_config.py::test_prompt_for_authentication
/Users/runner/work/pip/pip/tests/functional/test_install_config.py:269: DeprecationWarning: ssl.PROTOCOL_TLS is deprecated
ctx = ssl.SSLContext(ssl.PROTOCOL_SSLv23)
tests/functional/test_install_config.py::test_do_not_prompt_for_authentication
/Users/runner/work/pip/pip/tests/functional/test_install_config.py:309: DeprecationWarning: ssl.PROTOCOL_TLS is deprecated
ctx = ssl.SSLContext(ssl.PROTOCOL_SSLv23)
tests/functional/test_install_config.py: 36 warnings
/Users/runner/work/pip/pip/tests/functional/test_install_config.py:437: DeprecationWarning: ssl.PROTOCOL_TLS is deprecated
ctx = ssl.SSLContext(ssl.PROTOCOL_SSLv23)
tests/functional/test_install.py::test_install_sends_client_cert[install_args0]
tests/functional/test_install.py::test_install_sends_client_cert[install_args1]
/Users/runner/work/pip/pip/tests/functional/test_install.py:2369: DeprecationWarning: ssl.PROTOCOL_TLS is deprecated
ctx = ssl.SSLContext(ssl.PROTOCOL_SSLv23)
tests/functional/test_install.py::test_install_sends_certs_for_pep518_deps
/Users/runner/work/pip/pip/tests/functional/test_install.py:2410: DeprecationWarning: ssl.PROTOCOL_TLS is deprecated
ctx = ssl.SSLContext(ssl.PROTOCOL_SSLv23)
tests/functional/test_install_wheel.py::test_wheel_install_fails_with_badly_encoded_metadata
/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/zipfile.py:1519: UserWarning: Duplicate name: 'simple-0.1.0.dist-info/METADATA'
return self._open_to_write(zinfo, force_zip64=force_zip64)
tests/functional/test_proxy.py::test_proxy_does_not_override_netrc
/Users/runner/work/pip/pip/tests/functional/test_proxy.py:55: DeprecationWarning: ssl.PROTOCOL_TLS is deprecated
ctx = ssl.SSLContext(ssl.PROTOCOL_SSLv23) |
160974f
to
56184b0
Compare
56184b0
to
01dae57
Compare
Ok, pretty sure the macOS CI issues are fixed now. |
I apologise - I honestly cannot recollect why I said that. I can only assume I'd somehow managed to miss your examples by focusing on the code changes and not the PR description. There were some more recent comments about performance testing - maybe I was distracted by those? In any case, it was wrong of me to make that comment, and I retract it. |
I've added some responses to your comments on individual review items. But I think that's about all I can do here. I'm frustrated that we don't seem to have a good way to move this PR forward, but I think we need someone who has the time (and expertise) to review this PR as a whole - attempting to make incremental comments doesn't feel like it's very productive. |
Continued motivation for
fast-deps
While PEP 658 is the standards-compliant solution and metadata from there is already preferred when available,
--use-feature=fast-deps
avoids downloading wheels against--find-links
repos and any pypi index not supporting PEP 658 yet. Most non-pypi indices will be either of these, because it's very easy to expose those to pip with a simple file server, so improvingfast-deps
(and turning it on by default) is necessary to extend the benefits from the recent metadata resolve work to most users hosting their own index, especially corporations running pip in their internal CI.Problem
--use-feature=fast-deps
currently takes a while to perform multiple small range requests against pypi wheels which do not have PEP 658 metadata available, such astensorflow-gpu==2.5.3
. This is likely because of delays built into the pypi file host when responding to GET requests for very large files, to reduce the risk of a denial of service. This is pretty reasonable behavior on pypi's part, so we would like to minimize the number of range requests made, as described by @McSinyx in followup work at The fast-deps feature is not a fast way to obtain dependencies #8670.Solution
@dholth realized two optimizations we could perform:
Range
header accepts a negative valuebytes=-N
, which acts like negative slice indices in Python, returning a chunk from the end of the file. This avoids aHEAD
request to get the file length.*.dist-info/
directory is all that's going to be extracted from our lazy wheels, and this directory's contents form a contiguous substring of the total file content. After extracting the central directory from the end of the file with our first request, we can perform a single range request to populate the contents of every file in the*.dist-info/
directory in the lazy wheel, so no further HTTP requests need to be made to continue the resolution.Additional Fixes
Two additional issues have popped up since #11447:
tensorflow-gpu==2.5.3
have begun to put their own*.dist-info/
directories at the beginning of the zip file, possibly as a result of generating them from other build tools which sort zip file entries lexicographically.Both of these are considered to be reasonable behavior, and this change handles both cases gracefully.
Result
This halves the time to resolve dependencies from the below requirements for
pip install --dry-run --report
(note that the fixes of #12186 must be merged to avoid downloading the wheels anyway):As with PEP 658 metadata, in pathological cases which involve lots of backtracking, this will avoid downloading more than a single version of each wheel even for
pip download
orpip install
without--dry-run
. If--use-feature=fast-deps
is enabled by default, this will also significantly improve performance of all resolves involvingtensorflow-gpu==2.5.3
and other wheels which do not have PEP 658 metadata available on pypi, or against indices which do not serve PEP 658 metadata. I therefore propose turning onfast-deps
by default, either in this PR or in #12186 which will be merged after this one.