Xavier Robin – Tag – pROCtag:xavier.robin.name,2010-05-28:/fr/feed/tag/pROC2023-11-02T18:01:20.376328000+01:00weekly2https://xavier.robin.name/fr/img/favicon.icoXavier Robinhttps://xavier.robin.name/fr/contactpROC 1.18.0tag:xavier.robin.name,2023-11-02:/blog/2023/11/02/proc-1.18.52023-11-02T17:01:20+01:002023-11-02T17:01:20+01:00<p>pROC 1.18.5 is now available on CRAN. It's a minor bugfix release:</p>
<ul>
<li>Fixed formula input when given as variable and combined with <code>with</code> (<a href="https://github.com/xrobin/pROC/issues/111">issue #111</a>)</li>
<li>Fixed formula containing variables with spaces (<a href="https://github.com/xrobin/pROC/issues/120">issue #120</a>)</li>
<li>Fixed broken grouping when <code>colour</code> argument was given in <code>ggroc</code> (<a href="https://github.com/xrobin/pROC/issues/121">issue #121</a>)</li>
</ul>
<p>You can update your installation by simply typing:</p>
<pre>install.packages("pROC")</pre>pROC 1.18.0tag:xavier.robin.name,2021-09-06:/blog/2021/09/06/proc-1.18.02021-09-06T18:34:01+02:002021-09-06T18:34:01+02:00<p>pROC version 1.18.0 is now available on CRAN now. Only a few changes were implemented in this release:</p>
<ul>
<li>Add <abbr title="Confidence Interval">CI</abbr> of the estimate for <code>roc.test</code> (DeLong, paired only for now) (code contributed by <a href="https://wz-billings.rbind.io/">Zane Billings</a>) (<a href="https://github.com/xrobin/pROC/pull/95">issue #95</a>).</li>
<li>Fix documentation and alternative hypothesis for Venkatraman test (<a href="https://github.com/xrobin/pROC/issues/92">issue #92</a>).</li>
</ul>
<p>You can update your installation by simply typing:</p>
<pre>install.packages("pROC")</pre>pROC 1.17.0.1tag:xavier.robin.name,2021-01-13:/blog/2021/01/13/proc-1.17.0.12021-01-13T16:19:16+01:002021-01-13T16:19:16+01:00<p>pROC version 1.17.0.1 is available on CRAN now. Besides several bug fixes and small changes, it introduces more values in <code>input</code> of <code>coords</code>.</p>
<p>Here is an example:</p>
<pre>
library(pROC)
data(aSAH)
rocobj <- roc(aSAH$outcome, aSAH$s100b)
coords(rocobj, x = seq(0, 1, .1), input="recall", ret="precision")
# precision
# 1 NaN
# 2 1.0000000
# 3 1.0000000
# 4 0.8601399
# 5 0.6721311
# 6 0.6307692
# 7 0.6373057
# 8 0.4803347
# 9 0.4517906
# 10 0.3997833
# 11 0.3628319
</pre>
<h2>Getting the update</h2>
<p>The update his available on CRAN now. You can update your installation by simply typing:</p>
<pre>install.packages("pROC")</pre>
<p>Here is the full changelog:</p>
<p>1.17.0.1 (2020-01-07):</p>
<ul>
<li>Fix CRAN incoming checks as requested by CRAN.</li>
</ul>
<p>1.17.0 (2020-12-29)</p>
<ul>
<li>Accept more values in <code>input</code> of <code>coords</code> (<a href="https://github.com/xrobin/pROC/issues/67">issue #67</a>).</li>
<li>Accept <code>kappa</code> for the <code>power.roc.test</code> of two ROC curves (<a href="https://github.com/xrobin/pROC/issues/82">issue #82</a>).</li>
<li>The <code>input</code> argument to <code>coords</code> for <code>smooth.roc</code> curves no longer has a default.</li>
<li>The <code>x</code> argument to <code>coords</code> for <code>smooth.roc</code> can now be set to <code>all</code> (also the default).</li>
<li>Fix bootstrap <code>roc.test</code> and <code>cov</code> with <code>smooth.roc</code> curves.</li>
<li>The <code>ggroc</code> function can now plot <code>smooth.roc</code> curves (<a href="https://github.com/xrobin/pROC/issues/86">issue #86</a>).</li>
<li>Remove warnings with <code>warnPartialMatchDollar</code> option (<a href="https://github.com/xrobin/pROC/issues/87">issue #87</a>).</li>
<li>Make tests depending on vdiffr conditional (<a href="https://github.com/xrobin/pROC/issues/88">issue #88</a>).</li>
</ul>pROC 1.16.1tag:xavier.robin.name,2020-01-14:/blog/2020/01/14/proc-1.16.12020-01-14T08:52:57+01:002020-01-14T08:52:57+01:00<p>pROC version 1.16.1 is a minor release that disables a timing-dependent test based on the microbenchmark package that can sometimes cause random failures on CRAN. This version contains no user-visible changes. Users don't need to install this update.</p>
pROC 1.16.0tag:xavier.robin.name,2020-01-12:/blog/2020/01/12/proc-1.16.02020-01-12T21:46:00+01:002020-01-12T21:46:00+01:00<p>pROC version 1.16.0 is available on CRAN now. Besides several bug fixes, the main change is the switch of the default value of the <code>transpose</code> argument to the <code>coords</code> function from <code>TRUE</code> to <code>FALSE</code>. As announced earlier, <strong>this is a backward incompatible change that will break any script that did not previously set the <code>transpose</code> argument</strong> and for now comes with a warning to make debugging easier. Scripts that set transpose explicitly are not unaffected.</p>
<h2>New return values of <code>coords</code> and <code>ci.coords</code></h2>
<p>With <code>transpose = FALSE</code>, the <code>coords</code> returns a tidy <code>data.frame</code> suitable for use in pipelines:</p>
<pre>
data(aSAH)
rocobj <- roc(aSAH$outcome, aSAH$s100b)
coords(rocobj, c(0.05, 0.2, 0.5), transpose = FALSE)
# threshold specificity sensitivity
# 0.05 0.05 0.06944444 0.9756098
# 0.2 0.20 0.80555556 0.6341463
# 0.5 0.50 0.97222222 0.2926829
</pre>
<p>The function doesn't drop dimensions, so the result is always a <code>data.frame</code>, even if it has only one row and/or one column.</p>
<p>If speed is of utmost importance, you can get the results as a non-transposed matrix instead:
<pre>
coords(rocobj, c(0.05, 0.2, 0.5), transpose = FALSE, as.matrix = TRUE)
# threshold specificity sensitivity
# [1,] 0.05 0.06944444 0.9756098
# [2,] 0.20 0.80555556 0.6341463
# [3,] 0.50 0.97222222 0.2926829
</pre>
<p>In some scenarios this can be a tiny bit faster, and is used internally in <code>ci.coords</code>.</p>
<p>Type <code>help(coords_transpose)</code> for additional information.</p>
<h3><code>ci.coords</code></h3>
<p>The <code>ci.coords</code> function now returns a list-like object:</p>
<pre>
ciobj <- ci.coords(rocobj, c(0.05, 0.2, 0.5))
ciobj$accuracy
# 2.5% 50% 97.5%
# 1 0.3628319 0.3982301 0.4424779
# 2 0.6637168 0.7433628 0.8141593
# 3 0.6725664 0.7256637 0.7787611
</pre>
<p>The <code>print</code> function prints a table with all the results, however this table is generated on the fly and not available directly.</p>
<pre>ciobj
# 95% CI (2000 stratified bootstrap replicates):
# threshold sensitivity.low sensitivity.median sensitivity.high
# 0.05 0.05 0.9268 0.9756 1.0000
# 0.2 0.20 0.4878 0.6341 0.7805
# 0.5 0.50 0.1707 0.2927 0.4390
# specificity.low specificity.median specificity.high accuracy.low
# 0.05 0.01389 0.06944 0.1250 0.3628
# 0.2 0.70830 0.80560 0.8889 0.6637
# 0.5 0.93060 0.97220 1.0000 0.6726
# accuracy.median accuracy.high
# 0.05 0.3982 0.4425
# 0.2 0.7434 0.8142
# 0.5 0.7257 0.7788
</pre>
<p>The following code snippet can be used to obtain all the information calculated by the function:</p>
<pre>
for (ret in attr(ciobj, "ret")) {
print(ciobj[[ret]])
}
# 2.5% 50% 97.5%
# 1 0.9268293 0.9756098 1.0000000
# 2 0.4878049 0.6341463 0.7804878
# 3 0.1707317 0.2926829 0.4390244
# 2.5% 50% 97.5%
# 1 0.01388889 0.06944444 0.1250000
# 2 0.70833333 0.80555556 0.8888889
# 3 0.93055556 0.97222222 1.0000000
# 2.5% 50% 97.5%
# 1 0.3628319 0.3982301 0.4424779
# 2 0.6637168 0.7433628 0.8141593
# 3 0.6725664 0.7256637 0.7787611
</pre>
<h2>Getting the update</h2>
<p>The update his available on CRAN now. You can update your installation by simply typing:</p>
<pre>install.packages("pROC")</pre>
<p>Here is the full changelog:</p>
<ul>
<li>BACKWARD INCOMPATIBLE CHANGE: <code>transpose</code> argument to <code>coords</code> switched to <code>FALSE</code> by default (<a href="https://github.com/xrobin/pROC/issues/54">issue #54</a>).</li>
<li>BACKWARD INCOMPATIBLE CHANGE: <code>ci.coords</code> return value is now of list type and easier to use.</li>
<li>Fix one-sided DeLong test for curves with <code>direction=">"</code> (<a href="https://github.com/xrobin/pROC/issues/64">issue #64</a>).</li>
<li>Fix an error in <code>ci.coords</code> due to expected <code>NA</code> values in some coords (like "precision") (<a href="https://github.com/xrobin/pROC/issues/65">issue #65</a>).</li>
<li>Ordrered predictors are converted to numeric in a more robust way (<a href="https://github.com/xrobin/pROC/issues/63">issue #63</a>).</li>
<li>Cleaned up <code>power.roc.test</code> code (<a href="https://github.com/xrobin/pROC/issues/50">issue #50</a>).</li>
<li>Fix pairing with <code>roc.formula</code> and warn if <code>na.action</code> is not set to <code>"na.pass"</code> or <code>"na.fail"</code> (<a href="https://github.com/xrobin/pROC/issues/68">issue #68</a>).</li>
<li>Fix <code>ci.coords</code> not working with <code>smooth.roc</code> curves.</li>
</ul>pROC 1.15.3tag:xavier.robin.name,2019-07-22:/blog/2019/07/22/proc-1.15.32019-07-22T09:07:57+02:002019-07-22T09:07:57+02:00<p>A new version of pROC, 1.15.3, has been released and is now available on CRAN. It is a minor bugfix release. Versions 1.15.1 and 1.15.2 were rejected from CRAN.</p>
<p>Here is the full changelog:</p>
<ul>
<li>Fix <code>-Inf</code> threshold in coords for curves with <code>direction = ">"</code> (<a href="https://github.com/xrobin/pROC/issues/60">issue 60</a>).</li>
<li>Keep list order in <code>ggroc</code> (<a href="https://github.com/xrobin/pROC/issues/58">issue 58</a>).</li>
<li>Fix erroneous error in <code>ci.coords</code> with <code>ret="threshold"</code> (<a href="https://github.com/xrobin/pROC/issues/57">issue 57</a>).</li>
<li>Restore lazy loading of the data and fix an <code>R CMD check</code> warning "Variables with usage in documentation object 'aSAH' not in code".</li>
<li>Fix vdiffr unit tests with ggplot2 3.2.0 (<a href="https://github.com/xrobin/pROC/issues/53">issue 53</a>).</li>
</ul>
pROC 1.15.0tag:xavier.robin.name,2019-06-01:/blog/2019/06/01/proc-1.15.02019-06-01T09:33:08+02:002019-06-01T09:33:08+02:00<p>The latest version of pROC, 1.15.0 has just been released. It features significant speed improvements, many bug fixes, new methods for use in dplyr pipelines, increased verbosity, and prepares the way for some backwards-incompatible changes upcoming in pROC 1.16.0.</p>
<h2>Verbosity</h2>
<p>Since its initial release, pROC has been detecting the <code>level</code>s of the positive and negative classes (cases and controls), as well as the <code>direction</code> of the comparison, that is whether values are higher in case or in control observations. Until now it has been doing so silently, but this has lead to several issues and misunderstandings in the past. In particular, because of the detection of <code>direction</code>, ROC curves in pROC will nearly always have an AUC higher than 0.5, which can at times hide problems with certain classifiers, or cause bias in resampling operations such as bootstrapping or cross-validation.</p>
<p>In order to increase transparency, pROC 1.15.0 now prints a message on the command line when it auto-detects one of these two arguments.</p>
<pre>
> roc(aSAH$outcome, aSAH$ndka)
<span style="color: red">Setting levels: control = Good, case = Poor
Setting direction: controls < cases</span>
Call:
roc.default(response = aSAH$outcome, predictor = aSAH$ndka)
Data: aSAH$ndka in 72 controls (aSAH$outcome Good) < 41 cases (aSAH$outcome Poor).
Area under the curve: 0.612
</pre>
<p>If you run pROC repeatedly in loops, you may want to turn off these diagnostic messsages. The recommended way is to explicitly specify them explicitly:</p>
<pre>
roc(aSAH$outcome, aSAH$ndka, levels = c("Good", "Poor"), direction = "<")
</pre>
<p>Alternatively you can pass <code>quiet = TRUE</code> to the ROC function to silenty ignore them.</p>
<pre>
roc(aSAH$outcome, aSAH$ndka, quiet = TRUE)
</pre>
<p>As mentioned earlier this last option should be avoided when you are resampling, such as in bootstrap or cross-validation, as this could silently hide some biases due to changing directions.</p>
<h2>Speed</h2>
<p>Several bottlenecks have been removed, yielding significant speedups in the <code>roc</code> function with <code>algorithm = 2</code> (see <a href="https://github.com/xrobin/pROC/issues/44">issue 44</a>), as well as in the <code>coords</code> function which is now vectorized much more efficiently (see <a href="https://github.com/xrobin/pROC/issues/52">issue 52</a>) and scales much better with the number of coordinates to calculate. With these improvements pROC is now as fast as other ROC R packages such as ROCR.</p>
<p>With Big Data becoming more and more prevalent, every speed up matters and making pROC faster has very high priority. If you think that a particular computation is abnormally slow, for instance with a particular combination of arguments, feel free to <a href="https://github.com/xrobin/pROC/issues/new?template=Bug_report.md">submit a bug report</a>.</p>
<p>As a consequence, <code>algorithm = 2</code> is now used by default for numeric predictors, and is automatically selected by the new <code>algorithm = 6</code> meta algorithm. <code>algorithm = 3</code> remains slightly faster with very low numbers of thresholds (below 50) and is still the default with ordered factor predictors.</p>
<h2>Pipelines</h2>
<p>The <code>roc</code> function can be used in pipelines, for instance with <a href="https://dplyr.tidyverse.org/">dplyr</a> or <a href="https://magrittr.tidyverse.org/">magrittr</a>. This is still a highly experimental feature and will change significantly in future versions (see <a href="https://github.com/xrobin/pROC/issues/54">issue 54</a> for instance). Here is an example of usage:</p>
<pre>
library(dplyr)
aSAH %>%
filter(gender == "Female") %>%
roc(outcome, s100b)
</pre>
<p>The <code>roc.data.frame</code> method supports both standard and non-standard evaluation (NSE), and the <code>roc_</code> function supports standard evaluation only. By default it returns the <code>roc</code> object, which can then be piped to the <code>coords</code> function to extract coordinates that can be used in further pipelines</p>
<pre>
aSAH %>%
filter(gender == "Female") %>%
roc(outcome, s100b) %>%
coords(transpose=FALSE) %>%
filter(sensitivity > 0.6,
specificity > 0.6)
</pre>
<p>More details and use cases are available in the <code>?roc</code> help page.</p>
<h2 id="tc">Transposing coordinates</h2>
<p>Since the initial release of pROC, the <code>coords</code> function has been returning a matrix with thresholds in columns, and the coordinate variables in rows.</p>
<pre>
data(aSAH)
rocobj <- roc(aSAH$outcome, aSAH$s100b)
coords(rocobj, c(0.05, 0.2, 0.5))
# 0.05 0.2 0.5
# threshold 0.05000000 0.2000000 0.5000000
# specificity 0.06944444 0.8055556 0.9722222
# sensitivity 0.97560976 0.6341463 0.2926829
</pre>
<p>This format doesn't conform to the grammar of the <a href="https://www.tidyverse.org">tidyverse</a>, outlined by Hadley Wickham in his <a href="http://dx.doi.org/10.18637/jss.v059.i10">Tidy Data</a> 2014 paper, which has become prevalent in modern R language. In addition, the dropping of dimensions by default makes it difficult to guess what type of data <code>coords</code> is going to return.</p>
<pre>
coords(rocobj, "best")
# threshold specificity sensitivity
# 0.2050000 0.8055556 0.6341463
# A numeric vector
</pre>
<p>Although it is possible to pass <code>drop = FALSE</code>, the fact that it is not the default makes the behaviour unintuitive. In an upcoming version of pROC, this will be changed and <code>coords</code> will return a <code>data.frame</code> with the thresholds in rows and measurement in colums by default.</p>
<h3>Changes in 1.15</h3>
<ul>
<li>Addition of the <code>transpose</code> argument.</li>
<li>Display a warning if <code>transpose</code> is missing. Pass <code>transpose</code> explicitly to silence the warning.</li>
<li>Deprecation of <code>as.list</code>.</li>
</ul>
<p>With <code>transpose = FALSE</code>, the output is a tidy <code>data.frame</code> suitable for use in pipelines:</p>
<pre>
coords(rocobj, c(0.05, 0.2, 0.5), transpose = FALSE)
# threshold specificity sensitivity
# 0.05 0.05 0.06944444 0.9756098
# 0.2 0.20 0.80555556 0.6341463
# 0.5 0.50 0.97222222 0.2926829
</pre>
<p>It is recommended that new developments set <code>transpose = FALSE</code> explicitly. Currently these changes are neutral to the API and do not affect functionality outside of a warning.</code>
<h3>Upcoming backwards incompatible changes in future version (1.16)</h3>
<p>The next version of pROC will change the default <code>transpose</code> to <code>FALSE</code>. <strong>This is a backward incompatible change that will break any script that did not previously set <code>transpose</code></strong> and will initially come with a warning to make debugging easier. Scripts that set <code>transpose</code> explicitly will be unaffected.</p>
</p>
<h3>Recommendations</h3>
If you are writing a script calling the <code>coords</code> function, set <code>transpose = FALSE</code> to silence the warning and make sure your script keeps running smoothly once the default <code>transpose</code> is changed to <code>FALSE</code>. It is also possible to set <code>transpose = TRUE</code> to keep the current behavior, however is likely to be deprecated in the long term, and ultimately dropped.</p>
<h2>New <code>coords</code> return values</h2>
The <code>coords</code> function can now return two new values, <code>"youden"</code> and <code>"closest.topleft"</code>. They can be returned regardless of whether <code>input = "best"</code> and of the value of the <code>best.method</code> argument, although they will not be re-calculated if possible. They follow the <code>best.weights</code> argument as expected. See <a href="https://github.com/xrobin/pROC/issues/48">issue 48</a> for more information.
<h2>Bug fixes</h2>
<p>Several small bugs have been fixed in this version of pROC. Most of them were identified thanks to an increased <a href="https://codecov.io/github/xrobin/pROC">unit test coverage</a>. 65% of the code is now unit tested, up from 46% a year ago. The main weak points remain the testing of all bootstrapping and resampling operations. If you notice any unexpected or wrong behavior in those, or in any other function, feel free to <a href="https://github.com/xrobin/pROC/issues/new?template=Bug_report.md">submit a bug report</a>.</p>
<h2>Getting the update</h2>
<p>The update his available on CRAN now. You can update your installation by simply typing:</p>
<pre>install.packages("pROC")</pre>
<p>Here is the full changelog:</p>
<ul>
<li><code>roc</code> now prints messages when autodetecting <code>levels</code> and <code>direction</code> by default. Turn off with <code>quiet = TRUE</code> or set these values explicitly.</li>
<li>Speedup with <code>algorithm = 2</code> (<a href="https://github.com/xrobin/pROC/issues/44">issue 44</a>) and in <code>coords</code> (<a href="https://github.com/xrobin/pROC/issues/52">issue 52</a>).</li>
<li>New <code>algorithm = 6</code> (used by default) uses <code>algorithm = 2</code> for numeric data, and <code>algorithm = 3</code> for ordered vectors.</li>
<li>New <code>roc.data.frame</code> method and <code>roc_</code> function for use in pipelines.</li>
<li><code>coords</code> can now returns <code>"youden"</code> and <code>"closest.topleft"</code> values (<a href="https://github.com/xrobin/pROC/issues/48">issue 48</a>).</li>
<li>New <code>transpose</code> argument for <code>coords</code>, <code>TRUE</code> by default (<a href="https://github.com/xrobin/pROC/issues/54">issue 54</a>).</li>
<li>Use text instead of Tcl/Tk progress bar by default (<a href="https://github.com/xrobin/pROC/issues/51">issue 51</a>).</li>
<li>Fix <code>method = "density"</code> smoothing when called directly from <code>roc</code> (<a href="https://github.com/xrobin/pROC/issues/49">issue 49</a>).</li>
<li>Renamed <code>roc</code> argument <code>n</code> to <code>smooth.n</code>.</li>
<li>Fixed 'are.paired' ignoring smoothing arguments of <code>roc2</code> with <code>return.paired.rocs</code>.</li>
<li>New <code>ret</code> option <code>"all"</code> in <code>coords</code> (<a href="https://github.com/xrobin/pROC/issues/47">issue 47</a>)</li>
<li><code>drop</code> in <code>coords</code> now drops the dimension of <code>ret</code> too (<a href="https://github.com/xrobin/pROC/issues/43">issue 43</a>)</li>
</ul>
pROC 1.14.0tag:xavier.robin.name,2019-03-13:/blog/2019/03/13/proc-1.14.02019-03-13T10:22:42+01:002019-03-13T10:22:42+01:00<p>pROC 1.14.0 was released with many bug fixes and some new features.</p>
<h2>Multiclass ROC</h2>
<p>The <code>multiclass.roc</code> function can now take a multivariate input with columns corresponding to scores of the different classes. The columns must be named with the corresponding class labels. Thanks Matthias Döring for the contribution.</p>
<p>Let's see how to use it in practice with the iris dataset. Let's first split the dataset into a training and test sets:</p>
<pre>
data(iris)
iris.sample <- sample(1:150)
iris.train <- iris[iris.sample[1:75],]
iris.test <- iris[iris.sample[76:150],]
</pre>
<p>We'll use the <code>nnet</code> package to generate some predictions. We use the <code>type="prob"</code> to the <code>predict</code> function to get class probabilities.</p>
<pre>library("nnet")
mn.net <- nnet::multinom(Species ~ ., iris.train)
iris.predictions <- predict(mn.net, newdata=iris.test, type="prob")
head(iris.predictions)
</pre>
<pre>
setosa versicolor virginica
63 2.877502e-21 1.000000e+00 6.647660e-19
134 1.726936e-27 9.999346e-01 6.543642e-05
150 1.074627e-28 7.914019e-03 9.920860e-01
120 6.687744e-34 9.986586e-01 1.341419e-03
6 1.000000e+00 1.845491e-24 6.590050e-72
129 4.094873e-45 1.779882e-15 1.000000e+00
</pre>
<p>Notice the column names, identical to the class labels. Now we can use the <code>multiclass.roc</code> function directly:</p>
<pre>multiclass.roc(iris.test$Species, iris.predictions)</pre>
<p>Many modelling functions have similar interfaces, where the output of <code>predict</code> can be changed with an extra argument. Check their documentation to find out how to get the required data.</p>
<h2>Multiple aesthetics for <code>ggroc</code></h2>
<p>It is now possible to pass several aesthetics to <code>ggroc</code>. So for instance you can map a curve to both <code>colour</code> and <code>linetype</code>:</p>
<pre>
roc.list <- roc(outcome ~ s100b + ndka + wfns, data = aSAH)
ggroc(roc.list, aes=c("linetype", "color"))
</pre>
<p class="imglegende center" style="max-width:768px"><img src="/files/blog/2019/03/12/ggroc_multiple_aes.png" alt="ROC curves mapped to several aesthetics"> <span>Mapping 3 ROC curves to 2 aesthetics with ggroc.</span></p>
<h2>Getting the update</h2>
<p>The update his available on CRAN now. You can update your installation by simply typing:</p>
<pre>install.packages("pROC")</pre>
<p>Here is the full changelog:</p>
<ul>
<li>The <code>multiclass.roc</code> function now accepts multivariate decision values (code contributed by Matthias Döring).</li>
<li><code>ggroc</code> supports multiple aesthetics.</li>
<li>Make <i>ggplot2</i> dependency optional.</li>
<li>Suggested packages can be installed interactively when required.</li>
<li>Passing both <code>cases</code> and <code>controls</code> or <code>response</code> and <code>predictor</code> arguments is now an error.</li>
<li>Many small bug fixes.</li>
</ul>pROC 1.13.0tag:xavier.robin.name,2018-09-24:/blog/2018/09/24/proc-1.13.02018-09-24T20:09:07+02:002018-09-24T20:10:44+02:00<p>pROC 1.13.0 was just released with bug fixes and a new feature.</p>
<h2>Infinite values in predictor</h2>
<p>Following the release of pROC 1.12, it quickly became clear with <a href="https://github.com/xrobin/pROC/issues/30">issue #30</a> that infinite values were handled differently by the different algorithms of pROC. The problem with these values is that they cannot be thresholded. An <code>Inf</code> will always be greater than any value. This means that in some cases, it may not be possible to reach 0 or 100% specificity or sensitivity. This also revealed that threshold-agnostic algorithms such as <code>algorithm="2"</code> or the DeLong theta calculations would happily reach 0 or 100% specificity or sensitivity in those case, although those values are unattainable.</p>
<p>Starting with 1.13.0, when pROC's <code>roc</code> function finds any infinite value in the <code>predictor</code> argument, or in <code>controls</code> or <code>cases</code>, it will return <code>NaN</code> (not a number).</p>
<h2>Numerical accuracy</h2>
<p>The handling of near ties close to + or - Infinity or 0 has been improved by calculating the threshold (which is the mean between two consecutive values) differently depending on the mean value itself. This allows preserving as much precision close to 0 without maxing out large absolute values.</p>
<h2>New argument for ggroc</h2>
<p><code>ggroc</code> can now take a new value for the <code>aes</code> argument, <code>aes="group"</code>. Consistent with ggplot2, it allows to curves with identical aesthetics to be split in different groups. This is especially useful for instance in facetted plots.
<pre>library(pROC)
data(aSAH)
roc.list <- roc(outcome ~ s100b + ndka + wfns, data = aSAH)
g.list <- ggroc(roc.list)
g.group <- ggroc(roc.list, aes="group")
g.group + facet_grid(.~name)
</pre>
<p class="imglegende center" style="max-width:672px"><img src="/files/blog/2018/09/24/ggroc_facet.png" alt="3 ROC curves in a facetted ggplot2 panel"> <span>Facetting of 3 ROC curves with ggroc.</span></p>
<h2>Getting the update</h2>
<p>The update has just been accepted on CRAN and should be online soon. Once it is out, update your installation by simply typing:</p>
<pre>install.packages("pROC")</pre>
<p>The full changelog is:</p>
<ul>
<li><code>roc</code> now returns <code>NaN</code> when predictor contains infinite values ( <a href="https://github.com/xrobin/pROC/issues/30">issue #30</a>).</li>
<li>Better handling of near-ties near +-Infinity and 0.</li>
<li><code>ggroc</code> supports <code>aes="group"</code> to allow curves with identical aesthetics.</li>
</ul>pROC 1.12.0tag:xavier.robin.name,2018-05-05:/blog/2018/05/05/proc-1.12.02018-05-05T13:50:04+02:002018-05-05T13:50:04+02:00<p>I just released pROC 1.12.0, which fixes several bugs and should significantly improve the performance on large datasets by selecting the best algorithm automatically.</p>
<h2>Issue 25</h2>
<p>GitHub <a href="https://github.com/xrobin/pROC/issues/25">issue #25</a> identified two distinct bugs causing <code>ci.auc</code>, <code>var</code> and other functions to fail when calculating DeLong placements with the following error message, with different variation of numbers:</p>
<pre>Error in delongPlacements(roc) :
A problem occured while calculating DeLong's theta: got 0.50057161522129678399 instead of 0.50032663726931247972. This is a bug in pROC, please report it to the maintainer.</pre>
<p>pROC calculates the AUC with the trapezoidal method. This is the AUC obtained when calling <code>auc(roc)</code>. When using the DeLong method (for <code>roc.test</code>, <code>var</code> etc.), the AUC is also calculated with an other method (similar to the Wilcoxon/Mann-Whitney statistic). These two values should be identical, at least down to something close to the floating point precision of the hardware, typically below 10^-8. To be on the safe side, pROC checks this assumption after calculating the DeLong AUC.</p>
<p>The first sub-issue caused by a broken conversion of the roc curve from percent to fractions for internal calculations, followed by a broken check that the DeLong code produced the correct AUC. In combination, these two bugs caused pROC to stop with an error message in the specific case where <code>percent=TRUE</code> and <code>direction=">"</code>. The check was introduced in pROC version 1.9.1. The bug in the conversion from percent to fraction was present in earlier versions, however it never affected calculations, which is why it was left unnoticed until the check was added. Both bugs are now fixed.</p>
<p>The second sub-issue was impacting the calculation of the thresholds. When two predictor values were too close, their mean could not be represented exactly in the IEEE 754 arithmetic and the result would rounded back to one or the other value, pretty much arbitrarily depending on the implementation.</p>
<pre> a <- c(0.65354946699793847742, 0.65354946699793858844)
> print(mean(a), digits = 20)
[1] 0.65354946699793847742
> mean(a) == a
[1] TRUE FALSE</pre>
<p>Because pROC calculates the ROC thresholds as the mean between consecutive predictor observations, this would cause some comparisons to be incorrect when calculating sensitivity and specificity. As a consequence, erroneous sensitivities, sensitivities and AUCs may have been reported in the past. The issue was fixed by carefully selecting the correct value for threshold in case the mean was identical to a predictor value.</p>
<h2>Other bug fixes</h2>
<ul>
<li>GitHub <a href="https://github.com/xrobin/pROC/issues/27">issue #27</a> caused <code>ci.auc</code> to return <code>NaN</code> when cases or controls contained only a single observation. The function has been fixed and now returns <code>NA</code> as expected.</li>
<li><code>power.roc.curve</code> failed with ROC curves having <code>percent=TRUE</code>. This issue was identified when adding <a href="http://testthat.r-lib.org/">testthat</a> unit tests for the function.</li>
<li><code>ci(..., of="coords")</code> returned the <code>ci</code> function instead of calculating the CI.</li>
<li>C++ code now check for user interrupts regularly with <code>Rcpp::checkUserInterrupt()</code> so that very long runs can be aborted from R.</li>
<li>A better error message (instead of a useless internal garbage message error) is now displayed when attempting to return <code>threshold</code> with <code>ci.coords</code>. An empirical ROC curve, like one produced by pROC, is is made of discrete points that concentrate all the possible thresholds. Lines are added to join the points visually on the plot, however they do not contain any actual threshold. Returning thresholds at an arbitrary sensitivity or specificity requires either to be very lucky to have a point at the exact desired value, or to interpolate thresholds between the points. Interpolating is more tricky than it sounds and is very sensitive to the method of calculation of the threshold (very different results will be returned by pROC that uses the mean between consecutive predictor, than by some other packages which use the values directly).</li>
</ul>
<h2>New algorithm</h2>
<p>A new "meta" algorithm is introduced in the <code>roc</code> function. <code>algorithm = 5</code> causes pROC to automatically select algorithm 2 or 3, based on the number of threshold of the ROC curve. Algorithm 3 has a time complexity of O(N^2). It behaves very well when the number of thresholds remains low. However its square term can cause a catastrophic slowdown with very large datasets where the predictor takes nearly continuous values, and the ROC curve contains many thresholds (typically this will become very obvious above 10000 thresholds). Algorithm 2 has an algorithmic complexity of O(N), and shows a much better performance with large number of data points. However it comes with a rather large pre-factor which makes it very inefficient in most small- to normal-sized datasets. The decisive factor is the number of thresholds, and pROC will select algorithm 2 in curves with more than about 1500 thresholds, 3 otherwise. Algorithm 5 is now the default algorithm in pROC which should significantly improve the performance with large datasets, without impacting the speed in most cases.</p>
<h2>Getting the update</h2>
<p>The update has just been submitted to the CRAN and should be online soon. Once it is out, update your installation by simply typing:</p>
<pre>install.packages("pROC")</pre>
<p>The full changelog is:</p>
<ul>
<li>Fix bug that crashed DeLong calculations when predictor had near-ties close to the floating point precision limit that were rounded back to a predictor value (<a href="https://github.com/xrobin/pROC/issues/25">issue #25</a>).</li>
<li>Fix bug that crashed <code>ci.auc</code> and <code>var</code> if <code>direction</code> was <code>">"</code> and <code>percent=TRUE</code> (<a href="https://github.com/xrobin/pROC/issues/25">issue #25</a>).</li>
<li>Fix bug causing <code>ci</code> to return <code>NaN</code> values with <code>method="delong"</code> when cases or controls had a single observation (<a href="https://github.com/xrobin/pROC/issues/27">issue #27</a>).</li>
<li>Fix <code>power.roc.curve</code> failed with curves having <code>percent=TRUE</code>.</li>
<li>Fix <code>ci(..., of="coords")</code> returned the <code>ci</code> function instead of the CI.</li>
<li>C++ code now check for user interrupts regularly with <code>Rcpp::checkUserInterrupt()</code>.</li>
<li>Better error message for <code>ci.coords</code> attempting to return <code>threshold</code>.</li>
<li>New algorithm = 5 (used by default) chooses the algorithm based on the number of thresholds to avoid worst case with algorithm = 3.</li>
</ul>