pROC 1.13.0

pROC 1.13.0 was just released with bug fixes and a new feature.

Infinite values in predictor

Following the release of pROC 1.12, it quickly became clear with issue #30 that infinite values were handled differently by the different algorithms of pROC. The problem with these values is that they cannot be thresholded. An Inf will always be greater than any value. This means that in some cases, it may not be possible to reach 0 or 100% specificity or sensitivity. This also revealed that threshold-agnostic algorithms such as algorithm="2" or the DeLong theta calculations would happily reach 0 or 100% specificity or sensitivity in those case, although those values are unattainable.

Starting with 1.13.0, when pROC's roc function finds any infinite value in the predictor argument, or in controls or cases, it will return NaN (not a number).

Numerical accuracy

The handling of near ties close to + or - Infinity or 0 has been improved by calculating the threshold (which is the mean between two consecutive values) differently depending on the mean value itself. This allows preserving as much precision close to 0 without maxing out large absolute values.

New argument for ggroc

ggroc can now take a new value for the aes argument, aes="group". Consistent with ggplot2, it allows to curves with identical aesthetics to be split in different groups. This is especially useful for instance in facetted plots.

library(pROC)
data(aSAH)
roc.list <- roc(outcome ~ s100b + ndka + wfns, data = aSAH)
g.list <- ggroc(roc.list)
g.group <- ggroc(roc.list, aes="group")
g.group + facet_grid(.~name)

3 ROC curves in a facetted ggplot2 panel Facetting of 3 ROC curves with ggroc.

Getting the update

The update has just been accepted on CRAN and should be online soon. Once it is out, update your installation by simply typing:

install.packages("pROC")

The full changelog is:

Xavier Robin
Published Monday, September 24, 2018 20:09 CEST
Permalink: /blog/2018/09/24/proc-1.13.0
Tags: pROC
Comments: 0

pROC 1.12.1

A major regression slipped into yesterday's release 1.12.0 of pROC. The fix of issue #25 allocated a matrix which could quickly become very large for bigger datasets, and cause significant slow downs as the machine allocated the memory, or even crash pROC with an error message such as cannot allocate vector of size ... for larger datasets. This bug (issue #29) is fixed (and now automatically tested for) in pROC 1.12.1.

Please update your installation by simply typing:

install.packages("pROC")

Xavier Robin
Published Sunday, May 6, 2018 15:44 CEST
Permalink: /blog/2018/05/06/proc-1.12.1
Tags:
Comments: 0

pROC 1.12.0

I just released pROC 1.12.0, which fixes several bugs and should significantly improve the performance on large datasets by selecting the best algorithm automatically.

Issue 25

GitHub issue #25 identified two distinct bugs causing ci.auc, var and other functions to fail when calculating DeLong placements with the following error message, with different variation of numbers:

Error in delongPlacements(roc) :
 A problem occured while calculating DeLong's theta: got 0.50057161522129678399 instead of 0.50032663726931247972. This is a bug in pROC, please report it to the maintainer.

pROC calculates the AUC with the trapezoidal method. This is the AUC obtained when calling auc(roc). When using the DeLong method (for roc.test, var etc.), the AUC is also calculated with an other method (similar to the Wilcoxon/Mann-Whitney statistic). These two values should be identical, at least down to something close to the floating point precision of the hardware, typically below 10^-8. To be on the safe side, pROC checks this assumption after calculating the DeLong AUC.

The first sub-issue caused by a broken conversion of the roc curve from percent to fractions for internal calculations, followed by a broken check that the DeLong code produced the correct AUC. In combination, these two bugs caused pROC to stop with an error message in the specific case where percent=TRUE and direction=">". The check was introduced in pROC version 1.9.1. The bug in the conversion from percent to fraction was present in earlier versions, however it never affected calculations, which is why it was left unnoticed until the check was added. Both bugs are now fixed.

The second sub-issue was impacting the calculation of the thresholds. When two predictor values were too close, their mean could not be represented exactly in the IEEE 754 arithmetic and the result would rounded back to one or the other value, pretty much arbitrarily depending on the implementation.

 a <- c(0.65354946699793847742, 0.65354946699793858844)
> print(mean(a), digits = 20)
[1] 0.65354946699793847742
> mean(a) == a
[1]  TRUE FALSE

Because pROC calculates the ROC thresholds as the mean between consecutive predictor observations, this would cause some comparisons to be incorrect when calculating sensitivity and specificity. As a consequence, erroneous sensitivities, sensitivities and AUCs may have been reported in the past. The issue was fixed by carefully selecting the correct value for threshold in case the mean was identical to a predictor value.

Other bug fixes

New algorithm

A new "meta" algorithm is introduced in the roc function. algorithm = 5 causes pROC to automatically select algorithm 2 or 3, based on the number of threshold of the ROC curve. Algorithm 3 has a time complexity of O(N^2). It behaves very well when the number of thresholds remains low. However its square term can cause a catastrophic slowdown with very large datasets where the predictor takes nearly continuous values, and the ROC curve contains many thresholds (typically this will become very obvious above 10000 thresholds). Algorithm 2 has an algorithmic complexity of O(N), and shows a much better performance with large number of data points. However it comes with a rather large pre-factor which makes it very inefficient in most small- to normal-sized datasets. The decisive factor is the number of thresholds, and pROC will select algorithm 2 in curves with more than about 1500 thresholds, 3 otherwise. Algorithm 5 is now the default algorithm in pROC which should significantly improve the performance with large datasets, without impacting the speed in most cases.

Getting the update

The update has just been submitted to the CRAN and should be online soon. Once it is out, update your installation by simply typing:

install.packages("pROC")

The full changelog is:

Xavier Robin
Published Saturday, May 5, 2018 13:50 CEST
Permalink: /blog/2018/05/05/proc-1.12.0
Tags: pROC
Comments: 0

pROC 1.11.0

pROC 1.11.0 is now on CRAN! This is a minor update that mostly fixes notes in CRAN checks. It also adds support for the legacy.axes argument to change the axis labeling in ggroc.

The full changelog is:

Xavier Robin
Published Sunday, March 25, 2018 15:04 CEST
Permalink: /blog/2018/03/25/proc-1.11.0
Tags: pROC
Comments: 0

pROC 1.10.0

A new update of pROC is now available on CRAN: version 1.10.0.

ggplot2 support (Experimental)

A new function was introduced: ggroc. Given a roc object, or a (optionally named) list of roc objects, it returns a ggplot object, that can then be printed, with optional aesthetics, themes etc. Here is a basic example:

library(pROC)
# Create a basic roc object
data(aSAH)
rocobj <- roc(aSAH$outcome, aSAH$s100b)
rocobj2 <- roc(aSAH$outcome, aSAH$wfns)

library(ggplot2)
# Multiple curves:
gg2 <- ggroc(list(s100b=rocobj, wfns=rocobj2))
gg2

2 ROC curves with ggplot2 Basic ggplot with two ROC curves.

The usual ggplot syntax applies, so you can add themes, labels, etc. Note the aes argument, which control the aesthetics for geom_line to map to the different ROC curves supplied. Here we use "linetype" instead of the default color:

# with additional aesthetics:
gg2b <- ggroc(list(s100b=rocobj, wfns=rocobj2), aes="linetype", color="red")
# You can then your own theme, etc.
gg2b + theme_minimal() + ggtitle("My ROC curve")

2 ROC curves themed Basic ggplot with two ROC curves.

This functionality is currently experimental and subject to change. Please report bugs and feedback on pROC's GitHub issue tracker.

Precision and recall in coords

The coords function supports two new ret values: "precision" and "recall":

library(pROC)
# Create a basic roc object
data(aSAH)
rocobj <- roc(aSAH$outcome, aSAH$s100b)
coords(rocobj, "best", ret = c("threshold", "sensitivity", "specificity", "precision", "recall"))
  threshold sensitivity specificity   precision      recall 
  0.2050000   0.6341463   0.8055556   0.6500000   0.6341463

It makes it very easy to get a Precision-Recall (PR) plot:

plot(precision ~ recall, t(coords(rocobj, "all", ret = c("recall", "precision"))), type="l", main = "PR plot of S100B")

PR plot of S100B A simple PR plot.

Automated testing

Several functions are now covered with tests (powered by the testthat package) to ensure correct behavior. This allowed me to find and fix a few glitches. It will also make it easier to refactor code in the future.

The tests are automatically run by R CMD check. Additional tests that are too slow to be enabled by defauld can be activated with the RUN_SLOW_TESTS environment variable.

export RUN_SLOW_TESTS=true
R CMD check pROC

Test results can be seen on Travis CI, and the coverage of the tests can be seen on Codecov. Currently 30% of the code is tested. This includes most functionality, with the exception of bootstrapping and smoothing which I plan to implement in the future.

Obtaining the update

To update your installation, simply type:

install.packages("pROC")

Here is the full changelog:

Xavier Robin
Published Sunday, June 11, 2017 08:03 CEST
Permalink: /blog/2017/06/11/proc-1.10.0
Tags: pROC
Comments: 0

Php's htmlspecialchars stupid behavior

Can php's htmlspecialchars delete your data? The answer, unfortunately, is yes.

I just updated a database server with a web interface from PHP 5.3 (in Ubuntu 12.04) to PHP 7 (Ubuntu 16.04.2). It went pretty smoothly, but after a couple of weeks, users started reporting missing data in some fields where they were expecting some. After some investigation, it turns out the curlpit is the htmlspecialchars function, which changed behaviour with the update. Given the following script:

<?php 
$string = "An e acute character: \xE9\n";
echo htmlspecialchars($string);
?>

In PHP 5.3, it would output:

An e acute character: �

Now with PHP >= 5.4, here's the output:

 

Yep, that's correct: the output is empty. PHP just discarded the whole string. Without even a warning!

While this is documented in the manual, this is the most stupid and destructive design I have seen in a long while. Data loss guaranteed when the user saves the page without realizing some fields are accidentally empty! How can anyone be so brain dead and design and implement such a behaviour? Without even a warning!

It turns out one has to define the encoding for the function to work with non-UTF8 characters:

htmlspecialchars($string, ENT_COMPAT,'ISO-8859-1', true);

As this is a legacy application dating back more than 15 years, I fully expect some strings to be broken beyond repair. Thus I wrote the following function to replace all the calls to htmlspecialchars:

function safe_htmlspecialchars($string) {
	$htmlstring = htmlspecialchars($string, ENT_COMPAT,'ISO-8859-1', true);
  
        if (strlen($string) > 0 && strlen($htmlstring) == 0) {
                trigger_error ("htmlspecialchars failed to convert data", E_USER_ERROR);
        }
}

Displaying an error in case of doubt is the only sensible behaviour here, and should be the default.

Moral of the story: I'm never using PHP in a new project again. And neither should you, if you value your data more than the PHP developers who clearly don't.

Xavier Robin
Published Sunday, February 26, 2017 14:25 CET
Permalink: /blog/2017/02/26/php-s-htmlspecialchars-stupid-behavior
Tags: Programming
Comments: 0

pROC 1.9.1

After nearly two years since the previous release, pROC 1.9.1 is finally available on CRAN. Here is a list of the main changes:

Obtaining the update

To update your installation, simply type:

install.packages("pROC")

References

Xu Sun and Weichao Xu (2014) "Fast Implementation of DeLongs Algorithm for Comparing the Areas Under Correlated Receiver Operating Characteristic Curves". IEEE Signal Processing Letters, 21, 1389-1393. DOI: 10.1109/LSP.2014.2337313.

Xavier Robin
Published Monday, February 6, 2017 09:08 CET
Permalink: /blog/2017/02/06/proc-1.9.1
Tags: pROC
Comments: 0

pROC 1.8 is coming with some potential backward-incompatible changes in the namespace

The last significant update of pROC, 1.7, was released a year ago, followed by some minor bug fix updates. In the meantime, the policies of the CRAN repository evolved, and are requiring a significant update of pROC.

Specifically, S3 methods in pROC have always been exported, which means that you could call auc.roc or roc.formula directly. This is not allowed any longer, and methods must now to be registered as such with S3method() calls in the NAMESPACE file. The upcoming version of pROC (1.8) will therefore feature a major cleanup of the namespace.

In practice, this could potentially break some of your code. Specifically, direct call to S3 methods will not work any longer. For instance, the following is incorrect:

rocobj <- roc(...)
smooth.roc(rocobj)

Although not documented, it used to work but that will no longer be the case. Instead, you should call the generic function that will dispatch to the proper method:

smooth(rocobj)

Other examples include for instance:

# Incorrect:
auc.roc(rocobj)
# Correct:
auc(rocobj)

# Incorrect:
var.roc(rocobj)
# Correct:
var(rocobj)

Please make sure you replace any call to a method with the generic. In doubt, consult the Usage section of pROC's manual.

Xavier Robin
Published Monday, February 23, 2015 23:13 CET
Permalink: /blog/2015/02/23/proc-1.8-is-coming-with-some-potential-backward-incompatible-changes-in-the-namespace
Tags: pROC
Comments: 0

pROC 1.7.3 bugfix release

pROC 1.7.3 was pushed to the CRAN a few minutes ago. It is a bugfix release that solves two issues with smoothing, the first of which is a significant numeric issue:

It should be available for update from CRAN in a few hours / days, depending on your operating system.

Xavier Robin
Published Thursday, June 12, 2014 20:34 CEST
Permalink: /blog/2014/06/12/proc-1.7.3
Tags: pROC
Comments: 0

pROC 1.7.2

pROC 1.7.2 was published this morning. It is a bugfix release that primarily solves various issues with coords and ci.coords. It also warns when computing confidence intervals / roc tests of a ROC curves with AUC == 1 (the CI will always be 1-1 / p value 0) as this can potentially be misleading.

Xavier Robin
Published Sunday, April 6, 2014 08:49 CEST
Permalink: /blog/2014/04/06/proc-1.7.2
Tags: pROC
Comments: 0

Passer en français

Search

Tags

Background noise Books Computers Fun Hobbies Internet Me Mozilla My website Photo Politics Programming School Software Ubuntu pROC

Recent posts

Calendar

MonTueWedThuFriSatSun
1234567
891011121314
15161718192021
22232425262728
293031

Syndication

Recommend