Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large files are accidentally Git-tracked? #217

Open
ilhamv opened this issue Jul 25, 2024 · 8 comments
Open

Large files are accidentally Git-tracked? #217

ilhamv opened this issue Jul 25, 2024 · 8 comments
Assignees
Milestone

Comments

@ilhamv
Copy link
Member

ilhamv commented Jul 25, 2024

Can we use this to remove unwanted files accidentally tracked in the git history?

@jpmorgan98 jpmorgan98 added this to the v0.10.1 milestone Jul 25, 2024
@jpmorgan98
Copy link
Collaborator

Look for __ptxcache__ files .o .ptx files specifically per @braxtoncuneo

@clemekay
Copy link
Collaborator

You can use du -sh * in a directory for a human-readable list of how large each item in the directory is.
The large files all seem to be due to inf_shem361 examples, the answer.h5 and data .npz files.

Possible ways to handle that:

  • zip all of the inf_shem361 examples and keep them where they are
  • move the inf_shem361 to a separate repo to use as tests when desired
  • remove the inf_shem361 examples/tests entirely

@ilhamv
Copy link
Member Author

ilhamv commented Aug 14, 2024

The plan is to replace the infinite medium 361-group problem with an infinite medium few-group problem (probably the 7 group c5g7 data).

@ilhamv
Copy link
Member Author

ilhamv commented Aug 14, 2024

The largest memory seems to come from

 68M	.git/objects/b3
142M	.git/objects/pack

Now I'm less sure if the ~4 MB 361-group data is actually the culprit.
I'll try to use https://rtyley.github.io/bfg-repo-cleaner/ which may provide us with more info.

@ilhamv
Copy link
Member Author

ilhamv commented Aug 14, 2024

So,,,

Deleted files
-------------

	Filename                             Git id            
	-------------------------------------------------------
	Miniconda3-latest-Linux-ppc64le.sh | cdb26f99 (94.9 MB)
	analytic.zip                       | b3859ac8 (92.5 MB)

@ilhamv
Copy link
Member Author

ilhamv commented Aug 14, 2024

Now the .git/objects folder is 44M. More reasonable!

However, the next step is:

Finally, once you're happy with the updated state of your repo, push it back up (note that because your clone command used the --mirror flag, this push will update all refs on your remote server):

$ git push

At this point, you're ready for everyone to ditch their old copies of the repo and do fresh clones of the nice, new pristine data. It's best to delete all old clones, as they'll have dirty history that you don't want to risk pushing back into your newly cleaned repo.

Any thoughts?
@clemekay @jpmorgan98

@ilhamv
Copy link
Member Author

ilhamv commented Aug 15, 2024

We may be able to reduce the size further when we remove the SHEM361 test problems and examples. I'll rerun the repo cleaner. Nevertheless, we still need to think about the final step of the cleaning I mentioned in the previous comment.

@ilhamv ilhamv modified the milestones: v0.11, v0.12 Oct 15, 2024
@clemekay
Copy link
Collaborator

Currently looking into whether we need to use the cleanup function or whether we can just delete these files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants