Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] immich_microservices jobs handler error. #4734

Closed
1 of 3 tasks
davidpan opened this issue Oct 31, 2023 · 19 comments
Closed
1 of 3 tasks

[BUG] immich_microservices jobs handler error. #4734

davidpan opened this issue Oct 31, 2023 · 19 comments

Comments

@davidpan
Copy link

The bug

immich_microservices jobs handler error.

There are two types of images: directly uploaded and External Library, of which the Extended Library is about 500GB and the directly uploaded one is about 50GB.

  1. Task processing error in both local and remote immich_machine_learning cases. Only on the host of the microservice there is an error message, while on the immich_machine_learning host there is no log.
  2. The RECOGNIZE FACES and ENCODE CLIP tasks that need to be used all indicate an error.
  3. After configuring the Machine Learning Settings url, restart the whole set of servers and start missing jobs again, same error.
  4. Local test environment - use official compose, verify immich_machine_learning is working properly, then open port 3003 and configure it on the server, same error.

The OS that Immich Server is running on

Ubuntu 22.04.3 LTS

Version of Immich Server

v1.83.0

Version of Immich Mobile App

v1.83.0

Platform with the issue

  • Server
  • Web
  • Mobile

Your docker-compose.yml content

https://github.com/immich-app/immich/releases/latest/download/docker-compose.yml

Your .env content

https://github.com/immich-app/immich/releases/latest/download/example.env

Reproduction steps

...

Additional information

error log:
[Nest] 7 - 10/31/2023, 1:20:33 AM ERROR [JobService] Object:
{
"id": "f631de14-e3a6-41e1-92df-4f47ae9138be"
}
[Nest] 7 - 10/31/2023, 1:20:33 AM ERROR [JobService] Unable to run job handler (recognizeFaces/recognize-faces): Error: Request for facial recognition failed with status 404: Not Found
[Nest] 7 - 10/31/2023, 1:20:33 AM ERROR [JobService] Error: Request for facial recognition failed with status 404: Not Found
at MachineLearningRepository.post (/usr/src/app/dist/infra/repositories/machine-learning.repository.js:29:19)
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
at async PersonService.handleRecognizeFaces (/usr/src/app/dist/domain/person/person.service.js:208:23)
at async /usr/src/app/dist/domain/job/job.service.js:108:37
at async Worker.processJob (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:350:28)
at async Worker.retryIfFailed (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:535:24)

@alextran1502
Copy link
Contributor

Im pretty sure you are encountering this issue #4117. You can find the fix in the issue

@davidpan
Copy link
Author

davidpan commented Oct 31, 2023

It's not that. Look at the test procedure I wrote for point 4, it works fine in the test environment, but the remote machine learning as a server reports the same error. The files under /cache/ I have checked.

Im pretty sure you are encountering this issue #4117. You can find the fix in the issue

@davidpan
Copy link
Author

davidpan commented Oct 31, 2023

@alextran1502 Please see the chart below:
image

@alextran1502
Copy link
Contributor

@davidpan where are these file located at?

@alextran1502
Copy link
Contributor

Can you help grabbing the log from the machine learning container?

@davidpan
Copy link
Author

davidpan commented Oct 31, 2023

Can you help grabbing the log from the machine learning container?

[10/31/23 00:43:29] INFO Starting gunicorn 21.2.0
[10/31/23 00:43:29] INFO Listening at: http://0.0.0.0:3003 #(9)
[10/31/23 00:43:29] INFO Using worker: uvicorn.workers.UvicornWorker
[10/31/23 00:43:29] INFO Booting worker with pid: 10
[10/31/23 00:43:48] INFO Created in-memory cache with unloading disabled.
[10/31/23 00:43:48] INFO Initialized request thread pool with 12 threads.

The following logs only appeared when I tried the local jobs.

[10/31/23 01:59:25] INFO Loading clip model 'ViT-B-32::openai'
[10/31/23 01:59:25] INFO Loading image classification model
'microsoft/resnet-50'
Could not find image processor class in the image processor config or the model config. Loading based on pattern matching with the model's feature extractor configuration.
Could not find image processor class in the image processor config or the model config. Loading based on pattern matching with the model's feature extractor configuration.
/opt/venv/lib/python3.11/site-packages/transformers/models/convnext/feature_extraction_convnext.py:28: FutureWarning: The class ConvNextFeatureExtractor is deprecated and will be removed in version 5 of Transformers. Please use ConvNextImageProcessor instead.
warnings.warn(
[10/31/23 02:03:15] INFO Loading facial recognition model 'buffalo_l'

@jrasm91
Copy link
Contributor

jrasm91 commented Oct 31, 2023

What url are you using for the machine learning url? Can you connect to it from inside the immich microservices container?

@jrasm91
Copy link
Contributor

jrasm91 commented Oct 31, 2023

If you are using the IP of the host, that does not resolve from inside a docker container.

You should pass the compose service name, container name, or add extra configuration to pass the docker gateway IP to the container.

@davidpan
Copy link
Author

davidpan commented Oct 31, 2023

@davidpan where are these file located at?
/media/usb/immich, server and microserver are configured and loaded and accessible within the virtual machine.

Am I misunderstanding your question, do you mean the location of the model related files?

immich_machine_learning server /cache,Load from volume: immich_model-cache, in different docker server environments, in different locations on the docker host.

image

The reason why I think there is no problem with the model location is that in a local test environment, the locally launched immich is able to do the machine learning in question properly. I just opened the immich_machine_learning host port of the local test environment and gave it to the official environment on the remote server.

@davidpan
Copy link
Author

davidpan commented Oct 31, 2023

If you are using the IP of the host, that does not resolve from inside a docker container.

You should have to pass the compose service name, container name, or add extra configuration to load the docker gateway IP to the container.

Access testing within microserver

image

setup :
image

@davidpan
Copy link
Author

What url are you using for the machine learning url? Can you connect to it from inside the immich microservices container?

yes,When not upgraded yet, it still works fine at v1.82.

@jrasm91
Copy link
Contributor

jrasm91 commented Oct 31, 2023

What url are you using for the machine learning url? Can you connect to it from inside the immich microservices container?

yes,When not upgraded yet, it still works fine at v1.82.

What do you mean by this? It is a problem in 1.83 but not 1.82?

@davidpan
Copy link
Author

What url are you using for the machine learning url? Can you connect to it from inside the immich microservices container?

yes,When not upgraded yet, it still works fine at v1.82.

What do you mean by this? It is a problem in 1.83 but not 1.82?

I'm not quite sure if it's a matter of upgrading or not, as I started with v1.82 and was still in the middle of the photo processing process when I saw that there was a release of v1.83 and upgraded.

Unknown correlation although the issue was discovered right after the upgrade.

In troubleshooting the issue, I rebuilt the system using a local computer and imported a small portion of photos and the machine learning portion was fine. Then mapped out port 3003 on that machine for server use and the immich_microservices host on the server reported the same error.

@davidpan
Copy link
Author

Also tried to clean up redis manually to prevent leftover historical tasks.

redis-cli flushall

@davidpan
Copy link
Author

davidpan commented Nov 1, 2023

@alextran1502 @jrasm91

Thanks for the previous responses.

After rebuilding the server and local environment, test verification confirmed that immich_machine_learning can now recognize faces normally. Setting the corresponding IP on the server also allows the remote machine_learning service to recognize faces normally.

However, stopping the immich_machine_learning VM on the server while the remote machine learning host is working and configured will cause the current host's CPU load to drop to zero and network transmission to cease. restart the immich_machine_learning VM and the remote service will resume again.

This can be replicated consistently.

@jrasm91
Copy link
Contributor

jrasm91 commented Nov 1, 2023

This cannot be an immich bug or issue. Immich simply sends requests to the IP/hostname provided for the machine learning endpoint. If turning off an "unrelated" container changes the behavior/availability/reachability of the target endpoint then you have some misconfiguration in your system.

@mw2c
Copy link

mw2c commented Feb 7, 2024

I faced the same issue and solved it by removing the "/" from the end of the server URL.

@yuanmomo
Copy link

solved it by removing the "/" from the end of the server URL.

Thanks, this helps.

@tanmaychimurkar
Copy link

I faced the same issue and solved it by removing the "/" from the end of the server URL.

I had a similar issue, and this solve the same error linked in this thread. Thank you so much 👍🏻

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants