Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix scaling issues by using absolute cell counts #5

Merged
merged 4 commits into from
Sep 18, 2024
Merged

Conversation

mwlkhoo
Copy link
Collaborator

@mwlkhoo mwlkhoo commented Sep 16, 2024

Changelog

Previously, the implementation involved several conversions between relative and absolute error. This introduced scaling issues and discrepancies between Poisson error and the rule of three.

The updated code computes 95% confidence bound as number of parasites. This value is then converted to % parasitemia with the scaling factor 100 / N where N is the number of RBCs.

Note that the bounds (according to both the rule of 3 and poisson statistics) scale by 1 / N:

  • Rule of 3 -- 95% confidence bound:
    • Parasite counts: 3
    • % parasitemia: 3 * 100 / N
  • Poisson -- 95% confidence bound:
    • Parasite counts: 1.69 * sqrt(parasites)
    • % parasitemia: (1.69 * sqrt(parasites)) * 100 / N

Example outputs

All outputs are formatted as (% parasitemia, % confidence bound)

In [4]: a = CountCompensator("frightful-wendigo-1931", skip=True, conf_thresh=0.
   ...: 9)

In [5]: a.get_res_from_counts(np.asarray([100000000, 0, 0, 0, 0, 0, 0]))
Out[5]: (0.0, 3e-06)

In [6]: a.get_res_from_counts(np.asarray([100000000, 1, 0, 0, 0, 0, 0]))
Out[6]: (9.9999999e-07, 1.6899999831000002e-06)

In [7]: a.get_res_from_counts(np.asarray([100, 0, 0, 0, 0, 0, 0]))
Out[7]: (0.0, 3.0)

In [8]: a.get_res_from_counts(np.asarray([100, 1, 0, 0, 0, 0, 0]))
   ...: 
Out[8]: (0.9900990099009901, 1.6732673267326732)

In [9]: a.get_res_from_counts(np.asarray([2000000, 0, 0, 0, 0, 0, 0]))
Out[9]: (0.0, 0.00015)

In [10]: a.get_res_from_counts(np.asarray([2000000, 10, 0, 0, 0, 0, 0]))
Out[10]: (0.0004999975000125, 0.00026721112622859697)

In [11]: a.get_res_from_counts(np.asarray([2000000, 1000, 0, 0, 0, 0, 0]))
Out[11]: (0.04997501249375312, 0.002670789228228166)

@paul-lebel
Copy link
Collaborator

Hey @mwlkhoo, I have a few comments:

Note that the bounds for rule of 3 scale by 1 / N^2, whereas the poisson bound scales by 1 / N

I don't think this is true. Rule of three just says that when you don't count any parasites, the confidence bound (in percent) is [0, 100*3/N], where 3/N is the rate, not the number of events (same as parasitemia, but not in percent units). So if your 'number of calls' with rule of three is always exactly three, and the confidence bound scales with N, not N^2.

Parasite counts: 3 / N
% parasitemia: (3 / N) * 100 / N

This is wrong. It should just be 3 for parasite counts, and the interval should be [0, 100*3/N]

Poisson -- 95% confidence bound:
Parasite counts: 1.69 * sqrt(parasites)
% parasitemia: (1.69 * sqrt(parasites)) * 100 / N

I assume 1.69 is the scaling factor between standard deviation and confidence interval.

@paul-lebel
Copy link
Collaborator

Following up on the 1.69, the full interval would be [mean - 1.69sigma, mean + 1.69sigma]. correct?

@mwlkhoo
Copy link
Collaborator Author

mwlkhoo commented Sep 17, 2024

Hey @mwlkhoo, I have a few comments:

Note that the bounds for rule of 3 scale by 1 / N^2, whereas the poisson bound scales by 1 / N

I don't think this is true. Rule of three just says that when you don't count any parasites, the confidence bound (in percent) is [0, 100*3/N], where 3/N is the rate, not the number of events (same as parasitemia, but not in percent units). So if your 'number of calls' with rule of three is always exactly three, and the confidence bound scales with N, not N^2.

Parasite counts: 3 / N
% parasitemia: (3 / N) * 100 / N

This is wrong. It should just be 3 for parasite counts, and the interval should be [0, 100*3/N]

Poisson -- 95% confidence bound:
Parasite counts: 1.69 * sqrt(parasites)
% parasitemia: (1.69 * sqrt(parasites)) * 100 / N

I assume 1.69 is the scaling factor between standard deviation and confidence interval.

Ah good catch, thanks!

And yes, 1.69 is the scaling factor to convert standard deviation into confidence interval

@mwlkhoo mwlkhoo merged commit f7d1399 into main Sep 18, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants