Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scaling JPK data values correctly #118

Open
wants to merge 19 commits into
base: main
Choose a base branch
from

Conversation

derollins
Copy link

@derollins derollins commented Feb 25, 2025

As described in issue #116, currently the data (z values) scaling factors used for most channels in JPK images are incorrect and some channels cannot be read.

This pull request creates the fucntion _get_z_scaling() that extracts the correct scaling factor for each channel and uses these to scale the image.

Additonally corrects a typo in the example notebook.

This fixes #116

@derollins
Copy link
Author

I am working on improving the documentation however I think the test failure is due to the nature of the bug and the test itself is incorrect. The height test channel used in one that is incorrectly scaled.

Copy link
Collaborator

@ns-rse ns-rse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the Pull Request @derollins

I've left some suggestions on style in place.

I'm wondering if the tags should be parameterised rather than hard coded so that if they change in the future it is easier to update them.

The values I've seen are...

JPK_TAGS = {
  "n_slots": 32896,
  "default": 32897,
  "tag_name": 32912,
  "scaling_type": 32931,
  "scaling_name": 32932,
  "offset_name": 32933,
}

This could be a dictionary passed in, although the default should be tags: dict[str: int] | None = None as setting the default to be a dictionary falls foul of a linting error.

The dictionary could be define globally and passed as an argument when calling on line 150.

derollins and others added 2 commits February 25, 2025 13:24
Co-authored-by: Neil Shephard <n.shephard@sheffield.ac.uk>
@derollins
Copy link
Author

The correct sum for the height_trace channel of the sample image is 219242202.8256843 verified by opening the image in gwyddion and exporting and summing a numpy array. This agrees with the output of this pull request.

@derollins
Copy link
Author

I'm wondering if the tags should be parameterised rather than hard coded so that if they change in the future it is easier to update them.

The values I've seen are...

JPK_TAGS = {
  "n_slots": 32896,
  "default": 32897,
  "tag_name": 32912,
  "scaling_type": 32931,
  "scaling_name": 32932,
  "offset_name": 32933,
}

This could be a dictionary passed in, although the default should be tags: dict[str: int] | None = None as setting the default to be a dictionary falls foul of a linting error.

The dictionary could be define globally and passed as an argument when calling on line 150.

Thanks for the suggestion @ns-rse, the tags are certainly the most confusing aspect of reading these files. I'm uncertain of the best approach as many of the tags are 'relative' and found by adding multiples of 48 depending on the slot. Some are used as stings while others are used as integers.
Do you think the tags should be held and used as strings, as most are now, and converted to integers for manipulation, then back to strings for use or is it better to hold them as integers like in your example and convert them all to strings when used?
When you suggest the dictionary should be defined globally do you mean in the load_jpk() function rather than _get_z_scaling()?

@ns-rse
Copy link
Collaborator

ns-rse commented Feb 25, 2025

Thanks for the prompt work @derollins 👍 Looked through and noticed one small thing, and also that default_slot_number still has _number which we can do away with I think.

Do you think the tags should be held and used as strings, as most are now, and converted to integers for manipulation, then back to strings for use or is it better to hold them as integers like in your example and convert them all to strings when used?

Always a tricky one when you have to go back and forth, thankfully they're not floats or large integers so precision won't be a problem. I'd let the balance of how they are most frequently used determine their default type.

When you suggest the dictionary should be defined globally do you mean in the load_jpk() function rather than _get_z_scaling()?

I'd define it near the top of the jpk.py for now. I suspect over time we might look to set a default_config.yaml that holds defaults across all files. I've created #119 to capture that so we don't forget.

@derollins
Copy link
Author

Thanks for the prompt work @derollins 👍 Looked through and noticed one small thing, and also that default_slot_number still has _number which we can do away with I think.

Hi @ns-rse I kept _number since default_slot is already used above and default_slot_number is the index that refers to the default slot. I don't think _number is too confusing since it is a number and used as one although it could also be default_slot_index.

Always a tricky one when you have to go back and forth, thankfully they're not floats or large integers so precision won't be a problem. I'd let the balance of how they are most frequently used determine their default type.

In that case I think storing them as strings makes sence as that is how they are primarily used.

I'd define it near the top of the jpk.py for now. I suspect over time we might look to set a default_config.yaml that holds defaults across all files. I've created #119 to capture that so we don't forget.

Great, I'll have a go at this.

@ns-rse
Copy link
Collaborator

ns-rse commented Feb 26, 2025

Thanks @derollins I think this is much easier to understand now than with the numbers hard coded.

One final suggestion on naming conventions made in-line, I'd be inclined to use the prefix approach and this will be good to merge.

You may want to look at setting up `pre-commit` locally to get faster ~~frustration~~ feedback than waiting on the CI to pass/fail/make changes.

There are some notes on this in the AFMReader Documentation if inclined to do so.

@ns-rse
Copy link
Collaborator

ns-rse commented Feb 26, 2025

That was quick @derollins 👍 😁

You might want to check out the magic of git commit --amend when you realise you need to add something to the last commit you made. 😉

@derollins
Copy link
Author

derollins commented Feb 26, 2025

One final suggestion on naming conventions made in-line, I'd be inclined to use the prefix approach and this will be good to merge.

I went with _default_slot, is this approariate?

You may want to look at setting up pre-commit locally to get faster frustration feedback than waiting on the CI to pass/fail/make changes.

There are some notes on this in the AFMReader Documentation if inclined to do so.

I was using this but the company security settings on the work laptop I was using was blocking it from running. I might try making a commit on a personal computer before I push in the future.

@derollins
Copy link
Author

That was quick @derollins 👍 😁

You might want to check out the magic of git commit --amend when you realise you need to add something to the last commit you made. 😉

Thanks! I will do that, I had just forgotten to pull the changes that pre-commit had made

Copy link
Collaborator

@ns-rse ns-rse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your patience and for taking the time to provide a solution @derollins

This will be merged and available, not sure when we'll make a new release just yet though but you can install from main branch to get these changes if you start a new env or a colleague does.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: Incorect scaling of JPK image files
2 participants