pROC 1.17.0.1
pROC version 1.17.0.1 is available on CRAN now. Besides several bug fixes and small changes, it introduces more values in input
of coords
.
Here is an example:
library(pROC) data(aSAH) rocobj <- roc(aSAH$outcome, aSAH$s100b) coords(rocobj, x = seq(0, 1, .1), input="recall", ret="precision") # precision # 1 NaN # 2 1.0000000 # 3 1.0000000 # 4 0.8601399 # 5 0.6721311 # 6 0.6307692 # 7 0.6373057 # 8 0.4803347 # 9 0.4517906 # 10 0.3997833 # 11 0.3628319
Getting the update
The update his available on CRAN now. You can update your installation by simply typing:
install.packages("pROC")
Here is the full changelog:
1.17.0.1 (2020-01-07):
- Fix CRAN incoming checks as requested by CRAN.
1.17.0 (2020-12-29)
- Accept more values in
input
ofcoords
(issue #67). - Accept
kappa
for thepower.roc.test
of two ROC curves (issue #82). - The
input
argument tocoords
forsmooth.roc
curves no longer has a default. - The
x
argument tocoords
forsmooth.roc
can now be set toall
(also the default). - Fix bootstrap
roc.test
andcov
withsmooth.roc
curves. - The
ggroc
function can now plotsmooth.roc
curves (issue #86). - Remove warnings with
warnPartialMatchDollar
option (issue #87). - Make tests depending on vdiffr conditional (issue #88).
Xavier Robin
Published Wednesday, January 13, 2021 16:19 CET
Permalink: /blog/2021/01/13/proc-1.17.0.1
Tags:
pROC
Comments: 0
pROC 1.16.1
pROC version 1.16.1 is a minor release that disables a timing-dependent test based on the microbenchmark package that can sometimes cause random failures on CRAN. This version contains no user-visible changes. Users don't need to install this update.
Xavier Robin
Published Tuesday, January 14, 2020 08:52 CET
Permalink: /blog/2020/01/14/proc-1.16.1
Tags:
pROC
Comments: 0
pROC 1.16.0
pROC version 1.16.0 is available on CRAN now. Besides several bug fixes, the main change is the switch of the default value of the transpose
argument to the coords
function from TRUE
to FALSE
. As announced earlier, this is a backward incompatible change that will break any script that did not previously set the transpose
argument and for now comes with a warning to make debugging easier. Scripts that set transpose explicitly are not unaffected.
New return values of coords
and ci.coords
With transpose = FALSE
, the coords
returns a tidy data.frame
suitable for use in pipelines:
data(aSAH) rocobj <- roc(aSAH$outcome, aSAH$s100b) coords(rocobj, c(0.05, 0.2, 0.5), transpose = FALSE) # threshold specificity sensitivity # 0.05 0.05 0.06944444 0.9756098 # 0.2 0.20 0.80555556 0.6341463 # 0.5 0.50 0.97222222 0.2926829
The function doesn't drop dimensions, so the result is always a data.frame
, even if it has only one row and/or one column.
If speed is of utmost importance, you can get the results as a non-transposed matrix instead:
coords(rocobj, c(0.05, 0.2, 0.5), transpose = FALSE, as.matrix = TRUE) # threshold specificity sensitivity # [1,] 0.05 0.06944444 0.9756098 # [2,] 0.20 0.80555556 0.6341463 # [3,] 0.50 0.97222222 0.2926829
In some scenarios this can be a tiny bit faster, and is used internally in ci.coords
.
Type help(coords_transpose)
for additional information.
ci.coords
The ci.coords
function now returns a list-like object:
ciobj <- ci.coords(rocobj, c(0.05, 0.2, 0.5)) ciobj$accuracy # 2.5% 50% 97.5% # 1 0.3628319 0.3982301 0.4424779 # 2 0.6637168 0.7433628 0.8141593 # 3 0.6725664 0.7256637 0.7787611
The print
function prints a table with all the results, however this table is generated on the fly and not available directly.
ciobj # 95% CI (2000 stratified bootstrap replicates): # threshold sensitivity.low sensitivity.median sensitivity.high # 0.05 0.05 0.9268 0.9756 1.0000 # 0.2 0.20 0.4878 0.6341 0.7805 # 0.5 0.50 0.1707 0.2927 0.4390 # specificity.low specificity.median specificity.high accuracy.low # 0.05 0.01389 0.06944 0.1250 0.3628 # 0.2 0.70830 0.80560 0.8889 0.6637 # 0.5 0.93060 0.97220 1.0000 0.6726 # accuracy.median accuracy.high # 0.05 0.3982 0.4425 # 0.2 0.7434 0.8142 # 0.5 0.7257 0.7788
The following code snippet can be used to obtain all the information calculated by the function:
for (ret in attr(ciobj, "ret")) { print(ciobj[[ret]]) } # 2.5% 50% 97.5% # 1 0.9268293 0.9756098 1.0000000 # 2 0.4878049 0.6341463 0.7804878 # 3 0.1707317 0.2926829 0.4390244 # 2.5% 50% 97.5% # 1 0.01388889 0.06944444 0.1250000 # 2 0.70833333 0.80555556 0.8888889 # 3 0.93055556 0.97222222 1.0000000 # 2.5% 50% 97.5% # 1 0.3628319 0.3982301 0.4424779 # 2 0.6637168 0.7433628 0.8141593 # 3 0.6725664 0.7256637 0.7787611
Getting the update
The update his available on CRAN now. You can update your installation by simply typing:
install.packages("pROC")
Here is the full changelog:
- BACKWARD INCOMPATIBLE CHANGE:
transpose
argument tocoords
switched toFALSE
by default (issue #54). - BACKWARD INCOMPATIBLE CHANGE:
ci.coords
return value is now of list type and easier to use. - Fix one-sided DeLong test for curves with
direction=">"
(issue #64). - Fix an error in
ci.coords
due to expectedNA
values in some coords (like "precision") (issue #65). - Ordrered predictors are converted to numeric in a more robust way (issue #63).
- Cleaned up
power.roc.test
code (issue #50). - Fix pairing with
roc.formula
and warn ifna.action
is not set to"na.pass"
or"na.fail"
(issue #68). - Fix
ci.coords
not working withsmooth.roc
curves.
Xavier Robin
Published Sunday, January 12, 2020 21:46 CET
Permalink: /blog/2020/01/12/proc-1.16.0
Tags:
pROC
Comments: 0
pROC 1.15.3
A new version of pROC, 1.15.3, has been released and is now available on CRAN. It is a minor bugfix release. Versions 1.15.1 and 1.15.2 were rejected from CRAN.
Here is the full changelog:
- Fix
-Inf
threshold in coords for curves withdirection = ">"
(issue 60). - Keep list order in
ggroc
(issue 58). - Fix erroneous error in
ci.coords
withret="threshold"
(issue 57). - Restore lazy loading of the data and fix an
R CMD check
warning "Variables with usage in documentation object 'aSAH' not in code". - Fix vdiffr unit tests with ggplot2 3.2.0 (issue 53).
Xavier Robin
Published Monday, July 22, 2019 09:07 CEST
Permalink: /blog/2019/07/22/proc-1.15.3
Tags:
pROC
Comments: 0
pROC 1.15.0
The latest version of pROC, 1.15.0 has just been released. It features significant speed improvements, many bug fixes, new methods for use in dplyr pipelines, increased verbosity, and prepares the way for some backwards-incompatible changes upcoming in pROC 1.16.0.
Verbosity
Since its initial release, pROC has been detecting the level
s of the positive and negative classes (cases and controls), as well as the direction
of the comparison, that is whether values are higher in case or in control observations. Until now it has been doing so silently, but this has lead to several issues and misunderstandings in the past. In particular, because of the detection of direction
, ROC curves in pROC will nearly always have an AUC higher than 0.5, which can at times hide problems with certain classifiers, or cause bias in resampling operations such as bootstrapping or cross-validation.
In order to increase transparency, pROC 1.15.0 now prints a message on the command line when it auto-detects one of these two arguments.
> roc(aSAH$outcome, aSAH$ndka)
Setting levels: control = Good, case = Poor
Setting direction: controls < cases
Call:
roc.default(response = aSAH$outcome, predictor = aSAH$ndka)
Data: aSAH$ndka in 72 controls (aSAH$outcome Good) < 41 cases (aSAH$outcome Poor).
Area under the curve: 0.612
If you run pROC repeatedly in loops, you may want to turn off these diagnostic messsages. The recommended way is to explicitly specify them explicitly:
roc(aSAH$outcome, aSAH$ndka, levels = c("Good", "Poor"), direction = "<")
Alternatively you can pass quiet = TRUE
to the ROC function to silenty ignore them.
roc(aSAH$outcome, aSAH$ndka, quiet = TRUE)
As mentioned earlier this last option should be avoided when you are resampling, such as in bootstrap or cross-validation, as this could silently hide some biases due to changing directions.
Speed
Several bottlenecks have been removed, yielding significant speedups in the roc
function with algorithm = 2
(see issue 44), as well as in the coords
function which is now vectorized much more efficiently (see issue 52) and scales much better with the number of coordinates to calculate. With these improvements pROC is now as fast as other ROC R packages such as ROCR.
With Big Data becoming more and more prevalent, every speed up matters and making pROC faster has very high priority. If you think that a particular computation is abnormally slow, for instance with a particular combination of arguments, feel free to submit a bug report.
As a consequence, algorithm = 2
is now used by default for numeric predictors, and is automatically selected by the new algorithm = 6
meta algorithm. algorithm = 3
remains slightly faster with very low numbers of thresholds (below 50) and is still the default with ordered factor predictors.
Pipelines
The roc
function can be used in pipelines, for instance with dplyr or magrittr. This is still a highly experimental feature and will change significantly in future versions (see issue 54 for instance). Here is an example of usage:
library(dplyr) aSAH %>% filter(gender == "Female") %>% roc(outcome, s100b)
The roc.data.frame
method supports both standard and non-standard evaluation (NSE), and the roc_
function supports standard evaluation only. By default it returns the roc
object, which can then be piped to the coords
function to extract coordinates that can be used in further pipelines
aSAH %>% filter(gender == "Female") %>% roc(outcome, s100b) %>% coords(transpose=FALSE) %>% filter(sensitivity > 0.6, specificity > 0.6)
More details and use cases are available in the ?roc
help page.
Transposing coordinates
Since the initial release of pROC, the coords
function has been returning a matrix with thresholds in columns, and the coordinate variables in rows.
data(aSAH) rocobj <- roc(aSAH$outcome, aSAH$s100b) coords(rocobj, c(0.05, 0.2, 0.5)) # 0.05 0.2 0.5 # threshold 0.05000000 0.2000000 0.5000000 # specificity 0.06944444 0.8055556 0.9722222 # sensitivity 0.97560976 0.6341463 0.2926829
This format doesn't conform to the grammar of the tidyverse, outlined by Hadley Wickham in his Tidy Data 2014 paper, which has become prevalent in modern R language. In addition, the dropping of dimensions by default makes it difficult to guess what type of data coords
is going to return.
coords(rocobj, "best") # threshold specificity sensitivity # 0.2050000 0.8055556 0.6341463 # A numeric vector
Although it is possible to pass drop = FALSE
, the fact that it is not the default makes the behaviour unintuitive. In an upcoming version of pROC, this will be changed and coords
will return a data.frame
with the thresholds in rows and measurement in colums by default.
Changes in 1.15
- Addition of the
transpose
argument. - Display a warning if
transpose
is missing. Passtranspose
explicitly to silence the warning. - Deprecation of
as.list
.
With transpose = FALSE
, the output is a tidy data.frame
suitable for use in pipelines:
coords(rocobj, c(0.05, 0.2, 0.5), transpose = FALSE) # threshold specificity sensitivity # 0.05 0.05 0.06944444 0.9756098 # 0.2 0.20 0.80555556 0.6341463 # 0.5 0.50 0.97222222 0.2926829
It is recommended that new developments set transpose = FALSE
explicitly. Currently these changes are neutral to the API and do not affect functionality outside of a warning.
Upcoming backwards incompatible changes in future version (1.16)
The next version of pROC will change the default transpose
to FALSE
. This is a backward incompatible change that will break any script that did not previously set transpose
and will initially come with a warning to make debugging easier. Scripts that set transpose
explicitly will be unaffected.
Recommendations
If you are writing a script calling thecoords
function, set transpose = FALSE
to silence the warning and make sure your script keeps running smoothly once the default transpose
is changed to FALSE
. It is also possible to set transpose = TRUE
to keep the current behavior, however is likely to be deprecated in the long term, and ultimately dropped.
New coords
return values
The coords
function can now return two new values, "youden"
and "closest.topleft"
. They can be returned regardless of whether input = "best"
and of the value of the best.method
argument, although they will not be re-calculated if possible. They follow the best.weights
argument as expected. See issue 48 for more information.
Bug fixes
Several small bugs have been fixed in this version of pROC. Most of them were identified thanks to an increased unit test coverage. 65% of the code is now unit tested, up from 46% a year ago. The main weak points remain the testing of all bootstrapping and resampling operations. If you notice any unexpected or wrong behavior in those, or in any other function, feel free to submit a bug report.
Getting the update
The update his available on CRAN now. You can update your installation by simply typing:
install.packages("pROC")
Here is the full changelog:
roc
now prints messages when autodetectinglevels
anddirection
by default. Turn off withquiet = TRUE
or set these values explicitly.- Speedup with
algorithm = 2
(issue 44) and incoords
(issue 52). - New
algorithm = 6
(used by default) usesalgorithm = 2
for numeric data, andalgorithm = 3
for ordered vectors. - New
roc.data.frame
method androc_
function for use in pipelines. coords
can now returns"youden"
and"closest.topleft"
values (issue 48).- New
transpose
argument forcoords
,TRUE
by default (issue 54). - Use text instead of Tcl/Tk progress bar by default (issue 51).
- Fix
method = "density"
smoothing when called directly fromroc
(issue 49). - Renamed
roc
argumentn
tosmooth.n
. - Fixed 'are.paired' ignoring smoothing arguments of
roc2
withreturn.paired.rocs
. - New
ret
option"all"
incoords
(issue 47) drop
incoords
now drops the dimension ofret
too (issue 43)
Xavier Robin
Published Saturday, June 1, 2019 09:33 CEST
Permalink: /blog/2019/06/01/proc-1.15.0
Tags:
pROC
Comments: 0
pROC 1.14.0
pROC 1.14.0 was released with many bug fixes and some new features.
Multiclass ROC
The multiclass.roc
function can now take a multivariate input with columns corresponding to scores of the different classes. The columns must be named with the corresponding class labels. Thanks Matthias Döring for the contribution.
Let's see how to use it in practice with the iris dataset. Let's first split the dataset into a training and test sets:
data(iris) iris.sample <- sample(1:150) iris.train <- iris[iris.sample[1:75],] iris.test <- iris[iris.sample[76:150],]
We'll use the nnet
package to generate some predictions. We use the type="prob"
to the predict
function to get class probabilities.
library("nnet") mn.net <- nnet::multinom(Species ~ ., iris.train) iris.predictions <- predict(mn.net, newdata=iris.test, type="prob") head(iris.predictions)
setosa versicolor virginica 63 2.877502e-21 1.000000e+00 6.647660e-19 134 1.726936e-27 9.999346e-01 6.543642e-05 150 1.074627e-28 7.914019e-03 9.920860e-01 120 6.687744e-34 9.986586e-01 1.341419e-03 6 1.000000e+00 1.845491e-24 6.590050e-72 129 4.094873e-45 1.779882e-15 1.000000e+00
Notice the column names, identical to the class labels. Now we can use the multiclass.roc
function directly:
multiclass.roc(iris.test$Species, iris.predictions)
Many modelling functions have similar interfaces, where the output of predict
can be changed with an extra argument. Check their documentation to find out how to get the required data.
Multiple aesthetics for ggroc
It is now possible to pass several aesthetics to ggroc
. So for instance you can map a curve to both colour
and linetype
:
roc.list <- roc(outcome ~ s100b + ndka + wfns, data = aSAH) ggroc(roc.list, aes=c("linetype", "color"))
Mapping 3 ROC curves to 2 aesthetics with ggroc.
Getting the update
The update his available on CRAN now. You can update your installation by simply typing:
install.packages("pROC")
Here is the full changelog:
- The
multiclass.roc
function now accepts multivariate decision values (code contributed by Matthias Döring). ggroc
supports multiple aesthetics.- Make ggplot2 dependency optional.
- Suggested packages can be installed interactively when required.
- Passing both
cases
andcontrols
orresponse
andpredictor
arguments is now an error. - Many small bug fixes.
Xavier Robin
Published Wednesday, March 13, 2019 10:22 CET
Permalink: /blog/2019/03/13/proc-1.14.0
Tags:
pROC
Comments: 0
pROC 1.13.0
pROC 1.13.0 was just released with bug fixes and a new feature.
Infinite values in predictor
Following the release of pROC 1.12, it quickly became clear with issue #30 that infinite values were handled differently by the different algorithms of pROC. The problem with these values is that they cannot be thresholded. An Inf
will always be greater than any value. This means that in some cases, it may not be possible to reach 0 or 100% specificity or sensitivity. This also revealed that threshold-agnostic algorithms such as algorithm="2"
or the DeLong theta calculations would happily reach 0 or 100% specificity or sensitivity in those case, although those values are unattainable.
Starting with 1.13.0, when pROC's roc
function finds any infinite value in the predictor
argument, or in controls
or cases
, it will return NaN
(not a number).
Numerical accuracy
The handling of near ties close to + or - Infinity or 0 has been improved by calculating the threshold (which is the mean between two consecutive values) differently depending on the mean value itself. This allows preserving as much precision close to 0 without maxing out large absolute values.
New argument for ggroc
ggroc
can now take a new value for the aes
argument, aes="group"
. Consistent with ggplot2, it allows to curves with identical aesthetics to be split in different groups. This is especially useful for instance in facetted plots.
library(pROC) data(aSAH) roc.list <- roc(outcome ~ s100b + ndka + wfns, data = aSAH) g.list <- ggroc(roc.list) g.group <- ggroc(roc.list, aes="group") g.group + facet_grid(.~name)
Facetting of 3 ROC curves with ggroc.
Getting the update
The update has just been accepted on CRAN and should be online soon. Once it is out, update your installation by simply typing:
install.packages("pROC")
The full changelog is:
roc
now returnsNaN
when predictor contains infinite values ( issue #30).- Better handling of near-ties near +-Infinity and 0.
ggroc
supportsaes="group"
to allow curves with identical aesthetics.
Xavier Robin
Published Monday, September 24, 2018 20:09 CEST
Permalink: /blog/2018/09/24/proc-1.13.0
Tags:
pROC
Comments: 0
pROC 1.12.1
A major regression slipped into yesterday's release 1.12.0 of pROC. The fix of issue #25 allocated a matrix which could quickly become very large for bigger datasets, and cause significant slow downs as the machine allocated the memory, or even crash pROC with an error message such as cannot allocate vector of size ...
for larger datasets. This bug (issue #29) is fixed (and now automatically tested for) in pROC 1.12.1.
Please update your installation by simply typing:
install.packages("pROC")
Xavier Robin
Published Sunday, May 6, 2018 15:44 CEST
Permalink: /blog/2018/05/06/proc-1.12.1
Tags:
Comments: 0
pROC 1.12.0
I just released pROC 1.12.0, which fixes several bugs and should significantly improve the performance on large datasets by selecting the best algorithm automatically.
Issue 25
GitHub issue #25 identified two distinct bugs causing ci.auc
, var
and other functions to fail when calculating DeLong placements with the following error message, with different variation of numbers:
Error in delongPlacements(roc) : A problem occured while calculating DeLong's theta: got 0.50057161522129678399 instead of 0.50032663726931247972. This is a bug in pROC, please report it to the maintainer.
pROC calculates the AUC with the trapezoidal method. This is the AUC obtained when calling auc(roc)
. When using the DeLong method (for roc.test
, var
etc.), the AUC is also calculated with an other method (similar to the Wilcoxon/Mann-Whitney statistic). These two values should be identical, at least down to something close to the floating point precision of the hardware, typically below 10^-8. To be on the safe side, pROC checks this assumption after calculating the DeLong AUC.
The first sub-issue caused by a broken conversion of the roc curve from percent to fractions for internal calculations, followed by a broken check that the DeLong code produced the correct AUC. In combination, these two bugs caused pROC to stop with an error message in the specific case where percent=TRUE
and direction=">"
. The check was introduced in pROC version 1.9.1. The bug in the conversion from percent to fraction was present in earlier versions, however it never affected calculations, which is why it was left unnoticed until the check was added. Both bugs are now fixed.
The second sub-issue was impacting the calculation of the thresholds. When two predictor values were too close, their mean could not be represented exactly in the IEEE 754 arithmetic and the result would rounded back to one or the other value, pretty much arbitrarily depending on the implementation.
a <- c(0.65354946699793847742, 0.65354946699793858844) > print(mean(a), digits = 20) [1] 0.65354946699793847742 > mean(a) == a [1] TRUE FALSE
Because pROC calculates the ROC thresholds as the mean between consecutive predictor observations, this would cause some comparisons to be incorrect when calculating sensitivity and specificity. As a consequence, erroneous sensitivities, sensitivities and AUCs may have been reported in the past. The issue was fixed by carefully selecting the correct value for threshold in case the mean was identical to a predictor value.
Other bug fixes
- GitHub issue #27 caused
ci.auc
to returnNaN
when cases or controls contained only a single observation. The function has been fixed and now returnsNA
as expected. power.roc.curve
failed with ROC curves havingpercent=TRUE
. This issue was identified when adding testthat unit tests for the function.ci(..., of="coords")
returned theci
function instead of calculating the CI.- C++ code now check for user interrupts regularly with
Rcpp::checkUserInterrupt()
so that very long runs can be aborted from R. - A better error message (instead of a useless internal garbage message error) is now displayed when attempting to return
threshold
withci.coords
. An empirical ROC curve, like one produced by pROC, is is made of discrete points that concentrate all the possible thresholds. Lines are added to join the points visually on the plot, however they do not contain any actual threshold. Returning thresholds at an arbitrary sensitivity or specificity requires either to be very lucky to have a point at the exact desired value, or to interpolate thresholds between the points. Interpolating is more tricky than it sounds and is very sensitive to the method of calculation of the threshold (very different results will be returned by pROC that uses the mean between consecutive predictor, than by some other packages which use the values directly).
New algorithm
A new "meta" algorithm is introduced in the roc
function. algorithm = 5
causes pROC to automatically select algorithm 2 or 3, based on the number of threshold of the ROC curve. Algorithm 3 has a time complexity of O(N^2). It behaves very well when the number of thresholds remains low. However its square term can cause a catastrophic slowdown with very large datasets where the predictor takes nearly continuous values, and the ROC curve contains many thresholds (typically this will become very obvious above 10000 thresholds). Algorithm 2 has an algorithmic complexity of O(N), and shows a much better performance with large number of data points. However it comes with a rather large pre-factor which makes it very inefficient in most small- to normal-sized datasets. The decisive factor is the number of thresholds, and pROC will select algorithm 2 in curves with more than about 1500 thresholds, 3 otherwise. Algorithm 5 is now the default algorithm in pROC which should significantly improve the performance with large datasets, without impacting the speed in most cases.
Getting the update
The update has just been submitted to the CRAN and should be online soon. Once it is out, update your installation by simply typing:
install.packages("pROC")
The full changelog is:
- Fix bug that crashed DeLong calculations when predictor had near-ties close to the floating point precision limit that were rounded back to a predictor value (issue #25).
- Fix bug that crashed
ci.auc
andvar
ifdirection
was">"
andpercent=TRUE
(issue #25). - Fix bug causing
ci
to returnNaN
values withmethod="delong"
when cases or controls had a single observation (issue #27). - Fix
power.roc.curve
failed with curves havingpercent=TRUE
. - Fix
ci(..., of="coords")
returned theci
function instead of the CI. - C++ code now check for user interrupts regularly with
Rcpp::checkUserInterrupt()
. - Better error message for
ci.coords
attempting to returnthreshold
. - New algorithm = 5 (used by default) chooses the algorithm based on the number of thresholds to avoid worst case with algorithm = 3.
Xavier Robin
Published Saturday, May 5, 2018 13:50 CEST
Permalink: /blog/2018/05/05/proc-1.12.0
Tags:
pROC
Comments: 0
pROC 1.11.0
pROC 1.11.0 is now on CRAN! This is a minor update that mostly fixes notes in CRAN checks. It also adds support for the legacy.axes
argument to change the axis labeling in ggroc
.
The full changelog is:
- Added argument
legacy.axes
toggroc
- Fix NOTE about "apparent S3 methods exported but not registered" in
R CMD check
Xavier Robin
Published Sunday, March 25, 2018 15:04 CEST
Permalink: /blog/2018/03/25/proc-1.11.0
Tags:
pROC
Comments: 0