-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[JOSS REVIEW] Notebook #2 incorrect label_k values? #7
Comments
Hi @chrisleaman , I figured this out.
The above cell, I will render it in Markdown rather than code so that other people don't fall in this trap. Basically, when following the notebook, running I added the above cell just to show how one could customise the number of clusters in each survey to use, and I made the point that 10 clusters is a good number of clusters based on our experience, that's why surveys are set to 10. I replicated your image by running that cell, therefore using k = 10 in all surveys, and by using rule-based symbology in Qgis for survey in St. Leonards 20180606 ( note that when using rule-based symbology, if a rule is not satisfied it doesn't break the classification just no points get rendered). Using the opt_k function, for that survey the suboptimal k is 11. In fact, I think your images doesn't actually have label_k = 10, you should have only 10 labels starting from 0 so the max k should be 9. Can you double check please? If you don't run that cell and use k=11 for that survey, this is what you get: Then, you will be able to actually run Can you please confirm that was the issue by just skipping that misleading code cell?? |
Hi @npucino - yep, omitting that cell allows me to run I can tell Some debugging output
`P.cleanit()` output
These are my profile .csv files before and after running |
Hi @chrisleaman , I checked your profile-cleaned file and I see the confusion now! The here is the pt_class column displayed of your profile-cleaned.csv file! The changes you see there is exactly what you noted, the result of the shoremask, which is simply a clipping mask that discards all observation outside of the area of interest, in this case, landward. Thanks for raising this issue, this is critical information I need to specify in the documentation and in the notebooks! note: I actually created the classification dictionaries looking at full resolution imagery (2.5cm), but in order to speed up testing and load test data in GitHub, I needed to downsample imagery to 1m. That is why the classification might seem not perfect on this test workflow. Morever, swash is included into the water class as Structure from Motion in the swash is really bad and not reliable for sand volumetric computations. |
Thanks @npucino, everything makes sense now - I didn't realise that extra column had been added! I think an extra sentence or two in the notebook just stating the additional Re: performance issues, if you haven't already, I'd recommend looking into using https://github.com/pyutils/line_profiler to see exactly where your code is slow. It identifies which lines in your code are taking the most time, so you can focus on increasing performance just on those lines. Sometimes the results are surprising and you can get some easy wins using this! Feel free to close this issue when you're ready 😊 |
Thanks for the suggestion @chrisleaman , forked now! Cheers! |
Comments are for openjournals/joss-reviews#3666 (comment).
Following
2 - Profiles extraction, unsupervised sand labelling and cleaning.ipynb
I tried plotting the givenlabel_k
values forleo_20180606
, with the profile dataframe I saved but they don't look quite right (see plot below). Can you confirm the values given inwater_dict
,no_sand_dict
etc in the notebook are correct? Or is the purpose of thelabel_correction.gpkg
to fix this? I'm also wondering if there is some random state that results in different label numbers if you rerun the kmeans clustering?sandpyper/examples/2 - Profiles extraction, unsupervised sand labelling and cleaning.ipynb
Lines 2836 to 2839 in ce542c6
The text was updated successfully, but these errors were encountered: