Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running CMC-COSMIC on Ubuntu #40

Open
Jay13inspace opened this issue Feb 9, 2022 · 8 comments
Open

Running CMC-COSMIC on Ubuntu #40

Jay13inspace opened this issue Feb 9, 2022 · 8 comments

Comments

@Jay13inspace
Copy link

Hi,

I am working with Jan (@JJEldridge) on getting the CMC-COSMIC code running. I was originally receiving the same error as Jan, (COSMIC-PopSynth/COSMIC#552) but managed to fix this by taking the dynamics_apply() function and moving it into the main file where it is being called. This fixed the original segmentation fault I was receiving while running the Plummer Sphere simulation. I was unfortunately not able to find a way to have the function outside of the main file and have been running all the current simulations with this function in there.

I am able to run the King Profile simulation with 60,000 stars but I get the segmentation fault again when attempting to run with 70,000 stars.

I am running this on Ubuntu 20.04 with 8GB memory, 4GB swap memory, and 8 cores.

Is there any way of getting this to run on this system without inserting the dynamics_apply() function into the main file? Or does it require a higher power computer?

I am using the line "mpirun -np 4 ../CMC/bin/cmc KingProfile.ini king" to run the program.

Thanks,
Jason Sampson

@carlrodriguez
Copy link
Member

Hi Jason,

The compilation working with dynamics_apply in the main file but not in cmc_dynamics.c is extremely odd to me. This sounds like some sort of linker issue. We haven't tested the code on an Ubuntu linux distro (just the Red Hat that many HPC systems use and a mac).

What C compiler are you using with CMAKE?

As for the segfault, this could be a memory issue. Can you try reducing the number of cores (to -np 2) and see if you can run with 70,000 stars?

Carl

@Jay13inspace
Copy link
Author

Hi Carl,

I realize I didn't explain myself the best earlier. I had to put the dynamics_apply() function directly into the main function code where it was being called.

if (PERTURB > 0) {
        long j, si, p=AVEKERNEL, N_LIMIT, k, kp, ksin, kbin;
        ...
        break_wide_binaries(curr_st);
}
timeEndSimple(tmpTimeStart, &t_dyn);

When I moved it into the main file, below the main function, I had the same issue as earlier. The fault only resolved with the code from the function being inserted directly into the main function.

I am using the GCC version 4:9.3.0-1ubuntu2 compiler with CMAKE.

I have tried to reduce the cores for the 70,000 stars run through and it still gets an immediate segfault.

Thanks,
Jason Sampson

@carlrodriguez
Copy link
Member

Hi Jason,

Ok this is an unusual problem, which to me sounds like a linker issue. What version of CMAKE are you using, and if it's not a recent one can you try updating it?

Carl

@zhang2023-byte
Copy link

I met the same question when I try to run CMC on Ubuntu(The errors in my program are as follows"):

Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.


mpirun noticed that process rank 2 with PID 0 on node Savior exited on signal 11 (Segmentation fault).

I didn't understand why the original poster's operation fixed the problem. Is this an issue with the Ubuntu system? How can it be fixed?

@giulianoiorio
Copy link

Hello,
I am experiencing a similar problem using directly the Docker image of CMC.
When I try to run both the Plummer and King examples I trigger a segfault error both using the serial version of CMC and parallel one (with a number of processes ranging from 2 to 4).

@carlrodriguez I wonder if you can replicate this error using the Docker images in one of your machines

@zhang2023-byte
Copy link

Hi, I solved my issue by this way #54 (comment), I suspect this might help you.

@carlrodriguez
Copy link
Member

Thank you @zhang2023-byte for providing that! Yes right now most large runs need to be run with ulimit -s unlimited as well as the linux command mentioned in that comment. This is because the allocation of the memory for the hdf5 files is currently done in the stack instead of the heap (where it would not run into kernel memory issues).

I'm hoping to take a look at that in the next week and issue a PR to fix it.

FYI, the Docker image has not been updated in quite some time, and we may drop support for it soon. However I will also look at that and see if we can update it for the time being.

@giulianoiorio
Copy link

Thank @zhang2023-byte for the suggestion, this fixes the issue.

@carlrodriguez this confirms that was a general issue and not particularly related to the Docker image.
By the way, I think the Docker image could be very useful and it does not require frequent updates. The current one already includes everything needed to install and use CMC, the most updated version of the code can be easily get and installed in the container by simply pulling it from the git repository. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants