Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OverflowError: cannot serialize a bytes object larger than 4 GiB (reduction.py) #80

Open
asumser opened this issue Dec 7, 2021 · 10 comments

Comments

@asumser
Copy link

asumser commented Dec 7, 2021

When starting training on windows 10, tensorflow 2.4.4
OverflowError: cannot serialize a bytes object larger than 4 GiB
error appears in multiprocessing/reduction.py (line 60)

not really knowing what I am doing, but the error goes away if I change line 58 to:
def dump(obj, file, protocol=4):

@jeromelecoq
Copy link
Collaborator

Man, I was stumbling upon the exact same error today. I am trying to deploy an processing container agent on AWS ECR.

Do you have a link that made you make this change? It could be related to a particular version of the multiprocessing package.

@asumser
Copy link
Author

asumser commented Dec 7, 2021

I found it on some random forum. but here i think is the source:

https://docs.python.org/3/library/pickle.html

Protocol version 4 was added in Python 3.4. It adds support for very large objects, pickling more kinds of objects, and some data format optimizations. It is the default protocol starting with Python 3.8. Refer to PEP 3154 for information about improvements brought by protocol 4.

@jeromelecoq
Copy link
Collaborator

Ok. I will try upgrading to python 3.8 and see if that fixes it.

@asumser
Copy link
Author

asumser commented Dec 7, 2021

My workaround doesn't work and gives a invalid syntax error. should I try with python 3.8?

@jeromelecoq
Copy link
Collaborator

jeromelecoq commented Dec 7, 2021

Yes, I have found this to go away with a docker in python 3.8 on AWS

@nataliekoh
Copy link

nataliekoh commented Nov 17, 2022

I am not sure if this is related but I've been getting the following error:

Epoch 1/2
WARNING:tensorflow:multiprocessing can interact badly with TensorFlow, causing nondeterministic deadlocks. For high performance data pipelines tf.data is recommended.
Exception in thread Thread-3:
Traceback (most recent call last):
File "c:\Users---.conda\envs\deepinterp\lib\threading.py", line 926, in _bootstrap_inner
self.run()
File "c:\Users---.conda\envs\deepinterp\lib\threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "c:\Users---.conda\envs\deepinterp\lib\site-packages\tensorflow\python\keras\utils\data_utils.py", line 748, in _run
with closing(self.executor_fn(_SHARED_SEQUENCES)) as executor:
File "c:\Users---.conda\envs\deepinterp\lib\site-packages\tensorflow\python\keras\utils\data_utils.py", line 727, in pool_fn
initargs=(seqs, None, get_worker_id_queue()))
File "c:\Users---.conda\envs\deepinterp\lib\multiprocessing\context.py", line 119, in Pool
context=self.get_context())
File "c:\Users---.conda\envs\deepinterp\lib\multiprocessing\pool.py", line 176, in init
self._repopulate_pool()
File "c:\Users---.conda\envs\deepinterp\lib\multiprocessing\pool.py", line 241, in _repopulate_pool
w.start()
File "c:\Users---.conda\envs\deepinterp\lib\multiprocessing\process.py", line 112, in start
self._popen = self._Popen(self)
File "c:\Users---.conda\envs\deepinterp\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "c:\Users---.conda\envs\deepinterp\lib\multiprocessing\popen_spawn_win32.py", line 89, in init
reduction.dump(process_obj, to_child)
File "c:\Users---.conda\envs\deepinterp\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
MemoryError

I tried reducing the batch size and increasing the number of steps per epoch but to no avail.
I also tried updating the conda environment to python 3.8, but that throws up the following error instead:

Traceback (most recent call last):
File "Documents\DeepInterpTest\traineg1.py", line 3, in
from deepinterpolation.generic import JsonSaver, ClassLoader
File "C:\Users---.conda\envs\deepinterp\lib\site-packages\deepinterpolation\generic.py", line 98, in
class ClassLoader:
File "C:\Users---.conda\envs\deepinterp\lib\site-packages\deepinterpolation\generic.py", line 105, in ClassLoader
from deepinterpolation import network_collection
File "C:\Users---.conda\envs\deepinterp\lib\site-packages\deepinterpolation\network_collection.py", line 1, in
from tensorflow.keras.layers import (
File "C:\Users---.conda\envs\deepinterp\lib\site-packages\tensorflow_init_.py", line 38, in
import six as _six
ModuleNotFoundError: No module named 'six'

Can anyone help me out with this?

@jeromelecoq
Copy link
Collaborator

How much RAM memory do you have on your machine?

@nataliekoh
Copy link

Hi Jerome, I have 46 GB of RAM.

@jeromelecoq
Copy link
Collaborator

Turn off multi_processing. When using multi-processing, it can duplicate your generator in each thread. This can results in larger memory use (depending on the generator you use). This is an option of the training object.

@nataliekoh
Copy link

That seems to have solved the problem. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants