Skip to content

Commit

Permalink
much faster (c++) implementation of wilkinson binning
Browse files Browse the repository at this point in the history
  • Loading branch information
mjskay committed Nov 26, 2023
1 parent cb212cf commit bacbae2
Show file tree
Hide file tree
Showing 9 changed files with 85 additions and 16 deletions.
6 changes: 5 additions & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,8 @@ Imports:
distributional (>= 0.3.2),
numDeriv,
glue,
quadprog
quadprog,
Rcpp
Suggests:
knitr,
testthat (>= 3.0.0),
Expand Down Expand Up @@ -70,6 +71,7 @@ Collate:
"ggdist-package.R"
"util.R"
"rd.R"
"RcppExports.R"
"abstract_geom.R"
"abstract_stat.R"
"abstract_stat_slabinterval.R"
Expand Down Expand Up @@ -123,3 +125,5 @@ Collate:
"deprecated.R"
Roxygen: list(markdown = TRUE)
Config/testthat/edition: 3
LinkingTo:
Rcpp
2 changes: 2 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -246,6 +246,7 @@ export(weighted_quantile_fun)
import(ggplot2)
import(grid)
import(vctrs)
importFrom(Rcpp,sourceCpp)
importFrom(cli,cli_abort)
importFrom(cli,cli_warn)
importFrom(distributional,cdf)
Expand Down Expand Up @@ -338,3 +339,4 @@ importFrom(tibble,as_tibble)
importFrom(tibble,tibble)
importFrom(tidyselect,eval_select)
importFrom(utils,packageVersion)
useDynLib(ggdist, .registration = TRUE)
2 changes: 2 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@ New features and enhancements:
to `bandwidth_nrd0()` when they fail, with a warning that suggests trying
a dotplot or histogram (as these failures tend to happen on data that is not
a good candidate for a density plot in the first place) (#196).
* Much faster (C++) implementation of Wilkinson dotplot binning, especially
for large dotplots.

Bug fixes:

Expand Down
7 changes: 7 additions & 0 deletions R/RcppExports.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Generated by using Rcpp::compileAttributes() -> do not edit by hand
# Generator token: 10BE3573-1514-4C36-9D1C-5A225CD40393

wilkinson_bin_to_right_ <- function(x, width) {
.Call(`_ggdist_wilkinson_bin_to_right_`, x, width)
}

17 changes: 2 additions & 15 deletions R/binning_methods.R
Original file line number Diff line number Diff line change
Expand Up @@ -410,21 +410,8 @@ wilkinson_bin_to_right = function(x, width) {
))
}

# determine bins and midpoints of bins
bins = rep(NA_integer_, length(x))
bins[[1]] = 1L
current_bin = 1L
first_x = x[[1]]
n = 1
for (i in seq_along(x)[-1]) {
# This is equivalent to x[[i]] - first_x >= width but it accounts for machine precision.
# If we instead used `>=` directly some things that should be symmetric will not be
if (x[[i]] - first_x - width >= -.Machine$double.eps) {
current_bin = current_bin + 1L
first_x = x[[i]]
}
bins[[i]] = current_bin
}
# determine bins
bins = wilkinson_bin_to_right_(x, width)

# determine bin positions
# can take advantage of the fact that bins is sorted runs of numbers to
Expand Down
6 changes: 6 additions & 0 deletions R/ggdist-package.R
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
## usethis namespace: start
#' @importFrom Rcpp sourceCpp
#' @useDynLib ggdist, .registration = TRUE
## usethis namespace: end
NULL

#' Visualizations of Distributions and Uncertainty
#'
#' @docType package
Expand Down
3 changes: 3 additions & 0 deletions src/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
*.o
*.so
*.dll
34 changes: 34 additions & 0 deletions src/RcppExports.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
// Generated by using Rcpp::compileAttributes() -> do not edit by hand
// Generator token: 10BE3573-1514-4C36-9D1C-5A225CD40393

#include <Rcpp.h>

using namespace Rcpp;

#ifdef RCPP_USE_GLOBAL_ROSTREAM
Rcpp::Rostream<true>& Rcpp::Rcout = Rcpp::Rcpp_cout_get();
Rcpp::Rostream<false>& Rcpp::Rcerr = Rcpp::Rcpp_cerr_get();
#endif

// wilkinson_bin_to_right_
IntegerVector wilkinson_bin_to_right_(const NumericVector& x, double width);
RcppExport SEXP _ggdist_wilkinson_bin_to_right_(SEXP xSEXP, SEXP widthSEXP) {
BEGIN_RCPP
Rcpp::RObject rcpp_result_gen;
Rcpp::RNGScope rcpp_rngScope_gen;
Rcpp::traits::input_parameter< const NumericVector& >::type x(xSEXP);
Rcpp::traits::input_parameter< double >::type width(widthSEXP);
rcpp_result_gen = Rcpp::wrap(wilkinson_bin_to_right_(x, width));
return rcpp_result_gen;
END_RCPP
}

static const R_CallMethodDef CallEntries[] = {
{"_ggdist_wilkinson_bin_to_right_", (DL_FUNC) &_ggdist_wilkinson_bin_to_right_, 2},
{NULL, NULL, 0}
};

RcppExport void R_init_ggdist(DllInfo *dll) {
R_registerRoutines(dll, NULL, CallEntries, NULL, NULL);
R_useDynamicSymbols(dll, FALSE);
}
24 changes: 24 additions & 0 deletions src/binning_methods.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
#include <Rcpp.h>
#include <float.h> // for DBL_EPSILON
using namespace Rcpp;

// [[Rcpp::export]]
IntegerVector wilkinson_bin_to_right_(const NumericVector& x, double width) {
int n = x.size();
IntegerVector bins(n);
bins[0] = 1;
int current_bin = 1;
double first_x = x[0];

for(int i = 1; i < n; i++) {
// This is equivalent to x[i] - first_x >= width but it accounts for machine precision.
// If we instead used `>=` directly some things that should be symmetric will not be
if (x[i] - first_x - width >= -DBL_EPSILON) {
current_bin = current_bin + 1;
first_x = x[i];
}
bins[i] = current_bin;
}

return bins;
}

0 comments on commit bacbae2

Please sign in to comment.