We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement a feature to detect attribute drift detection. We have features at a dataset (joint distribution) level, it looks like azure can do this at the attribute level. This is not difficult to do. It requires the following, check the nature of the attribute: (1) If it is continuous (numeric)- the numpy dtype should be float, use the kolmogorov-smirnov 2 sample test to see if the attribute distribution in the training data and the data received in deployment have the same distribution: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ks_2samp.html (2) If it is categorical - the numpy dtype is object, use the chi-square test of independence: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.chi2_contingency.html We need a contingency table to do this. We can get this using the group by functionality from pandas: https://stackoverflow.com/questions/29901436/is-there-a-pythonic-way-to-do-a-contingency-table-in-pandas
Note: Check https://towardsdatascience.com/how-to-compare-two-distributions-in-practice-8c676904a285 to see if a completely discrete non-parametric test makes sense.
The text was updated successfully, but these errors were encountered:
rajivsam
No branches or pull requests
Implement a feature to detect attribute drift detection. We have features at a dataset (joint distribution) level, it looks like azure can do this at the attribute level. This is not difficult to do. It requires the following, check the nature of the attribute:
(1) If it is continuous (numeric)- the numpy dtype should be float, use the kolmogorov-smirnov 2 sample test to see if the attribute distribution in the training data and the data received in deployment have the same distribution: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ks_2samp.html
(2) If it is categorical - the numpy dtype is object, use the chi-square test of independence:
https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.chi2_contingency.html
We need a contingency table to do this. We can get this using the group by functionality from pandas:
https://stackoverflow.com/questions/29901436/is-there-a-pythonic-way-to-do-a-contingency-table-in-pandas
Note: Check https://towardsdatascience.com/how-to-compare-two-distributions-in-practice-8c676904a285 to see if a completely discrete non-parametric test makes sense.
The text was updated successfully, but these errors were encountered: