The latest version of pROC, 1.15.0 has just been released. It features significant speed improvements, many bug fixes, new methods for use in dplyr pipelines, increased verbosity, and prepares the way for some backwards-incompatible changes upcoming in pROC 1.16.0.
Since its initial release, pROC has been detecting the
levels of the positive and negative classes (cases and controls), as well as the
direction of the comparison, that is whether values are higher in case or in control observations. Until now it has been doing so silently, but this has lead to several issues and misunderstandings in the past. In particular, because of the detection of
direction, ROC curves in pROC will nearly always have an AUC higher than 0.5, which can at times hide problems with certain classifiers, or cause bias in resampling operations such as bootstrapping or cross-validation.
In order to increase transparency, pROC 1.15.0 now prints a message on the command line when it auto-detects one of these two arguments.
> roc(aSAH$outcome, aSAH$ndka) Setting levels: control = Good, case = Poor Setting direction: controls < cases Call: roc.default(response = aSAH$outcome, predictor = aSAH$ndka) Data: aSAH$ndka in 72 controls (aSAH$outcome Good) < 41 cases (aSAH$outcome Poor). Area under the curve: 0.612
If you run pROC repeatedly in loops, you may want to turn off these diagnostic messsages. The recommended way is to explicitly specify them explicitly:
roc(aSAH$outcome, aSAH$ndka, levels = c("Good", "Poor"), direction = "<")
Alternatively you can pass
quiet = TRUE to the ROC function to silenty ignore them.
roc(aSAH$outcome, aSAH$ndka, quiet = TRUE)
As mentioned earlier this last option should be avoided when you are resampling, such as in bootstrap or cross-validation, as this could silently hide some biases due to changing directions.
Several bottlenecks have been removed, yielding significant speedups in the
roc function with
algorithm = 2 (see issue 44), as well as in the
coords function which is now vectorized much more efficiently (see issue 52) and scales much better with the number of coordinates to calculate. With these improvements pROC is now as fast as other ROC R packages such as ROCR.
With Big Data becoming more and more prevalent, every speed up matters and making pROC faster has very high priority. If you think that a particular computation is abnormally slow, for instance with a particular combination of arguments, feel free to submit a bug report.
As a consequence,
algorithm = 2 is now used by default for numeric predictors, and is automatically selected by the new
algorithm = 6 meta algorithm.
algorithm = 3 remains slightly faster with very low numbers of thresholds (below 50) and is still the default with ordered factor predictors.
roc function can be used in pipelines, for instance with dplyr or magrittr. This is still a highly experimental feature and will change significantly in future versions (see issue 54 for instance). Here is an example of usage:
library(dplyr) aSAH %>% filter(gender == "Female") %>% roc(outcome, s100b)
roc.data.frame method supports both standard and non-standard evaluation (NSE), and the
roc_ function supports standard evaluation only. By default it returns the
roc object, which can then be piped to the
coords function to extract coordinates that can be used in further pipelines
aSAH %>% filter(gender == "Female") %>% roc(outcome, s100b) %>% coords(transpose=FALSE) %>% filter(sensitivity > 0.6, specificity > 0.6)
More details and use cases are available in the
?roc help page.
Since the initial release of pROC, the
coords function has been returning a matrix with thresholds in columns, and the coordinate variables in rows.
data(aSAH) rocobj <- roc(aSAH$outcome, aSAH$s100b) coords(rocobj, c(0.05, 0.2, 0.5)) # 0.05 0.2 0.5 # threshold 0.05000000 0.2000000 0.5000000 # specificity 0.06944444 0.8055556 0.9722222 # sensitivity 0.97560976 0.6341463 0.2926829
This format doesn't conform to the grammar of the tidyverse, outlined by Hadley Wickham in his Tidy Data 2014 paper, which has become prevalent in modern R language. In addition, the dropping of dimensions by default makes it difficult to guess what type of data
coords is going to return.
coords(rocobj, "best") # threshold specificity sensitivity # 0.2050000 0.8055556 0.6341463 # A numeric vector
Although it is possible to pass
drop = FALSE, the fact that it is not the default makes the behaviour unintuitive. In an upcoming version of pROC, this will be changed and
coords will return a
data.frame with the thresholds in rows and measurement in colums by default.
Changes in 1.15
- Addition of the
- Display a warning if
transposeis missing. Pass
transposeexplicitly to silence the warning.
- Deprecation of
transpose = FALSE, the output is a tidy
data.frame suitable for use in pipelines:
coords(rocobj, c(0.05, 0.2, 0.5), transpose = FALSE) # threshold specificity sensitivity # 0.05 0.05 0.06944444 0.9756098 # 0.2 0.20 0.80555556 0.6341463 # 0.5 0.50 0.97222222 0.2926829
It is recommended that new developments set
transpose = FALSE explicitly. Currently these changes are neutral to the API and do not affect functionality outside of a warning.
Upcoming backwards incompatible changes in future version (1.16)
The next version of pROC will change the default
FALSE. This is a backward incompatible change that will break any script that did not previously set
transpose and will initially come with a warning to make debugging easier. Scripts that set
transpose explicitly will be unaffected.
RecommendationsIf you are writing a script calling the
transpose = FALSEto silence the warning and make sure your script keeps running smoothly once the default
transposeis changed to
FALSE. It is also possible to set
transpose = TRUEto keep the current behavior, however is likely to be deprecated in the long term, and ultimately dropped.
coords return values
coordsfunction can now return two new values,
"closest.topleft". They can be returned regardless of whether
input = "best"and of the value of the
best.methodargument, although they will not be re-calculated if possible. They follow the
best.weightsargument as expected. See issue 48 for more information.
Several small bugs have been fixed in this version of pROC. Most of them were identified thanks to an increased unit test coverage. 65% of the code is now unit tested, up from 46% a year ago. The main weak points remain the testing of all bootstrapping and resampling operations. If you notice any unexpected or wrong behavior in those, or in any other function, feel free to submit a bug report.
Getting the update
The update his available on CRAN now. You can update your installation by simply typing:
Here is the full changelog:
rocnow prints messages when autodetecting
directionby default. Turn off with
quiet = TRUEor set these values explicitly.
- Speedup with
algorithm = 2(issue 44) and in
algorithm = 6(used by default) uses
algorithm = 2for numeric data, and
algorithm = 3for ordered vectors.
roc_function for use in pipelines.
coordscan now returns
"closest.topleft"values (issue 48).
TRUEby default (issue 54).
- Use text instead of Tcl/Tk progress bar by default (issue 51).
method = "density"smoothing when called directly from
- Fixed 'are.paired' ignoring smoothing arguments of
coordsnow drops the dimension of
rettoo (issue 43)