pROC 1.10.0

A new update of pROC is now available on CRAN: version 1.10.0.

ggplot2 support (Experimental)

A new function was introduced: ggroc. Given a roc object, or a (optionally named) list of roc objects, it returns a ggplot object, that can then be printed, with optional aesthetics, themes etc. Here is a basic example:

library(pROC)
# Create a basic roc object
data(aSAH)
rocobj <- roc(aSAH$outcome, aSAH$s100b)
rocobj2 <- roc(aSAH$outcome, aSAH$wfns)

library(ggplot2)
# Multiple curves:
gg2 <- ggroc(list(s100b=rocobj, wfns=rocobj2))
gg2

2 ROC curves with ggplot2 Basic ggplot with two ROC curves.

The usual ggplot syntax applies, so you can add themes, labels, etc. Note the aes argument, which control the aesthetics for geom_line to map to the different ROC curves supplied. Here we use "linetype" instead of the default color:

# with additional aesthetics:
gg2b <- ggroc(list(s100b=rocobj, wfns=rocobj2), aes="linetype", color="red")
# You can then your own theme, etc.
gg2b + theme_minimal() + ggtitle("My ROC curve")

2 ROC curves themed Basic ggplot with two ROC curves.

This functionality is currently experimental and subject to change. Please report bugs and feedback on pROC's GitHub issue tracker.

Precision and recall in coords

The coords function supports two new ret values: "precision" and "recall":

library(pROC)
# Create a basic roc object
data(aSAH)
rocobj <- roc(aSAH$outcome, aSAH$s100b)
coords(rocobj, "best", ret = c("threshold", "sensitivity", "specificity", "precision", "recall"))
  threshold sensitivity specificity   precision      recall 
  0.2050000   0.6341463   0.8055556   0.6500000   0.6341463

It makes it very easy to get a Precision-Recall (PR) plot:

plot(precision ~ recall, t(coords(rocobj, "all", ret = c("recall", "precision"))), type="l", main = "PR plot of S100B")

PR plot of S100B A simple PR plot.

Automated testing

Several functions are now covered with tests (powered by the testthat package) to ensure correct behavior. This allowed me to find and fix a few glitches. It will also make it easier to refactor code in the future.

The tests are automatically run by R CMD check. Additional tests that are too slow to be enabled by defauld can be activated with the RUN_SLOW_TESTS environment variable.

export RUN_SLOW_TESTS=true
R CMD check pROC

Test results can be seen on Travis CI, and the coverage of the tests can be seen on Codecov. Currently 30% of the code is tested. This includes most functionality, with the exception of bootstrapping and smoothing which I plan to implement in the future.

Obtaining the update

To update your installation, simply type:

install.packages("pROC")

Here is the full changelog:

Xavier Robin
Published Sunday, June 11, 2017 08:03 CEST
Permalink: /blog/2017/06/11/proc-1.10.0
Tags: pROC
Comments: 0

Php's htmlspecialchars stupid behavior

Can php's htmlspecialchars delete your data? The answer, unfortunately, is yes.

I just updated a database server with a web interface from PHP 5.3 (in Ubuntu 12.04) to PHP 7 (Ubuntu 16.04.2). It went pretty smoothly, but after a couple of weeks, users started reporting missing data in some fields where they were expecting some. After some investigation, it turns out the curlpit is the htmlspecialchars function, which changed behaviour with the update. Given the following script:

<?php 
$string = "An e acute character: \xE9\n";
echo htmlspecialchars($string);
?>

In PHP 5.3, it would output:

An e acute character: �

Now with PHP >= 5.4, here's the output:

 

Yep, that's correct: the output is empty. PHP just discarded the whole string. Without even a warning!

While this is documented in the manual, this is the most stupid and destructive design I have seen in a long while. Data loss guaranteed when the user saves the page without realizing some fields are accidentally empty! How can anyone be so brain dead and design and implement such a behaviour? Without even a warning!

It turns out one has to define the encoding for the function to work with non-UTF8 characters:

htmlspecialchars($string, ENT_COMPAT,'ISO-8859-1', true);

As this is a legacy application dating back more than 15 years, I fully expect some strings to be broken beyond repair. Thus I wrote the following function to replace all the calls to htmlspecialchars:

function safe_htmlspecialchars($string) {
	$htmlstring = htmlspecialchars($string, ENT_COMPAT,'ISO-8859-1', true);
  
        if (strlen($string) > 0 && strlen($htmlstring) == 0) {
                trigger_error ("htmlspecialchars failed to convert data", E_USER_ERROR);
        }
}

Displaying an error in case of doubt is the only sensible behaviour here, and should be the default.

Moral of the story: I'm never using PHP in a new project again. And neither should you, if you value your data more than the PHP developers who clearly don't.

Xavier Robin
Published Sunday, February 26, 2017 14:25 CET
Permalink: /blog/2017/02/26/php-s-htmlspecialchars-stupid-behavior
Tags: Programming
Comments: 0

pROC 1.9.1

After nearly two years since the previous release, pROC 1.9.1 is finally available on CRAN. Here is a list of the main changes:

Obtaining the update

To update your installation, simply type:

install.packages("pROC")

References

Xu Sun and Weichao Xu (2014) "Fast Implementation of DeLongs Algorithm for Comparing the Areas Under Correlated Receiver Operating Characteristic Curves". IEEE Signal Processing Letters, 21, 1389-1393. DOI: 10.1109/LSP.2014.2337313.

Xavier Robin
Published Monday, February 6, 2017 09:08 CET
Permalink: /blog/2017/02/06/proc-1.9.1
Tags: pROC
Comments: 0

pROC 1.8 is coming with some potential backward-incompatible changes in the namespace

The last significant update of pROC, 1.7, was released a year ago, followed by some minor bug fix updates. In the meantime, the policies of the CRAN repository evolved, and are requiring a significant update of pROC.

Specifically, S3 methods in pROC have always been exported, which means that you could call auc.roc or roc.formula directly. This is not allowed any longer, and methods must now to be registered as such with S3method() calls in the NAMESPACE file. The upcoming version of pROC (1.8) will therefore feature a major cleanup of the namespace.

In practice, this could potentially break some of your code. Specifically, direct call to S3 methods will not work any longer. For instance, the following is incorrect:

rocobj <- roc(...)
smooth.roc(rocobj)

Although not documented, it used to work but that will no longer be the case. Instead, you should call the generic function that will dispatch to the proper method:

smooth(rocobj)

Other examples include for instance:

# Incorrect:
auc.roc(rocobj)
# Correct:
auc(rocobj)

# Incorrect:
var.roc(rocobj)
# Correct:
var(rocobj)

Please make sure you replace any call to a method with the generic. In doubt, consult the Usage section of pROC's manual.

Xavier Robin
Published Monday, February 23, 2015 23:13 CET
Permalink: /blog/2015/02/23/proc-1.8-is-coming-with-some-potential-backward-incompatible-changes-in-the-namespace
Tags: pROC
Comments: 0

pROC 1.7.3 bugfix release

pROC 1.7.3 was pushed to the CRAN a few minutes ago. It is a bugfix release that solves two issues with smoothing, the first of which is a significant numeric issue:

It should be available for update from CRAN in a few hours / days, depending on your operating system.

Xavier Robin
Published Thursday, June 12, 2014 20:34 CEST
Permalink: /blog/2014/06/12/proc-1.7.3
Tags: pROC
Comments: 0

pROC 1.7.2

pROC 1.7.2 was published this morning. It is a bugfix release that primarily solves various issues with coords and ci.coords. It also warns when computing confidence intervals / roc tests of a ROC curves with AUC == 1 (the CI will always be 1-1 / p value 0) as this can potentially be misleading.

Xavier Robin
Published Sunday, April 6, 2014 08:49 CEST
Permalink: /blog/2014/04/06/proc-1.7.2
Tags: pROC
Comments: 0

pROC 1.7 released

pROC 1.7 was released. It provides additional speed improvements with the DeLong calculations now implemented with Rcpp, improved behaviour with math operations, and various bug fixes. It is now possible to pass multiple predictors in a formula: a list of ROC curves is returned. In details:

pROC 1.7.1 is an quick fix release to get the package on CRAN.

Xavier Robin
Published Thursday, February 20, 2014 21:48 CET
Permalink: /blog/2014/02/20/proc-1.7-released
Tags: pROC
Comments: 0

pROC 1.6.0.1 bugfix release

I just pushed pROC 1.6.0.1 to the CRAN, as version 1.6 was breaking the vignette of the Causata package with sanity checks (thanks Kurt Hornick for the report). Those tests appeared to be too stringent in some cases (matrix inputs to roc() are working OK), and yet appeared not to catch all possible errors by testing for vector predictors and responses, which can let some mistakes pass (for instance list inputs).

The erroneous checks were removed. Please keep in mind that pROC is designed to take atomic vectors as predictor and response inputs. Future versions of pROC may not accept other inputs as they currently do, however this will be announced in advance.

The new version is already available on the CRAN. To update, type update.packages() or install.packages("pROC") if you want to update pROC only.

Xavier Robin
Published Saturday, December 28, 2013 18:23 CET
Permalink: /blog/2013/12/28/proc-1.6.0.1-released
Tags: pROC
Comments: 0

pROC 1.6 released

Two years after the last major release 1.5, pROC 1.6 is finally available. It comes with several major enhancements:

Power ROC tests

This is probably the main feature of this version: power tests for ROC curves. It is now possible to compute sample size, power, significance level or minimum AUC with pROC.

library(pROC)
data(aSAH)

roc1 <- roc(aSAH$outcome, aSAH$ndka)
roc2 <- roc(aSAH$outcome, aSAH$wfns)

power.roc.test(roc1, roc2, power=0.9)

It is implemented with the methods proposed by Obuchowski and colleagues1, 2, with the added possibility to use bootstrap or the DeLong3 method to compute variance and covariances. For more details and examples, see ?power.roc.test.

As a side effect, a new method="obuchowski" has been implemented in the cov and var functions. More details in ?var.roc and ?cov.roc.

Confidence intervals for arbitrary coordinates

It is now possible to compute confidence intervals of arbitrary coordinates, with a syntax much similar to that of the coords function.

library(pROC)
data(aSAH)

ci.coords(aSAH$outcome, aSAH$s100b, x="best")

# Or for much more information:
rets <- c("threshold", "specificity", "sensitivity", "accuracy", "tn", "tp", "fn", "fp", "npv", 
          "ppv", "1-specificity", "1-sensitivity", "1-accuracy", "1-npv", "1-ppv")
ci.coords(aSAH$outcome, aSAH$wfns, x=0.9, input = "sensitivity", ret=rets)

Speed enhancements

NOTE: because of this change, roc objects created with an earlier version will have to be re-created before they can be used in any bootstrap operation.

Dropped S+ support

S+ support was dropped, due to diverging code bases and apparent drop of support of S+ by TIBCO. A version 1.5.9 will be released in the next few days on ExPaSy with an initial work on ROC tests. It will work only on 32bits versions of S+ 8.2 for Windows.

Other changes

As usual, you will find the new version on ExPASy (please give a few days for the update to be propagated there) and on the CRAN. To update, type update.packages() or install.packages("pROC") if you want to update pROC only.

Xavier Robin
Published Thursday, December 26, 2013 18:10 CET
Permalink: /blog/2013/12/26/proc-1.6-released
Tags: pROC
Comments: 0

Transcend class 10 vs. SanDisk Extreme Pro: a real-case scenario

I own a Pentax K-5 with Transcend Class 10 cards (2 x 16 GB + 1 x 64 GB) and I am mostly satisfied with it. However I have been wondering if a better SD card (such as a SanDisk Extreme Pro 95 MB/s) would make a noticeable difference. I mean, it does on the paper in controlled tests. But is it really any better inside the camera? I mean, isn't the limiting factor the camera itself? So I just bought a 8 GB Extreme Pro one and tested the time necessary to write 10 pictures to the card in the camera and display the photo on the screen. This is a rather typical scenario: you shoot and wait to see the result. I also tested a slower SanDisk Class 4 card just for the fun.

Test setup

The Pentax K-5 was fixed on a tripod, aiming at a white paper sheet and manually focused. The mode was set to M, 80 ISO, 1/80s and F/2.8. The “all-manual” settings should eliminate most variations coming from refocusing or different exposure times. The drive mode was set to continuous shooting (Lo), taking about 1.5 image/second.

With each card, I repeat the following procedure 5 times. I put the card in the camera, turn it on, format the card, and press the trigger until 10 shots are taken. I then start the chronometer as soon as the last shot is taken (with the Android App Chronometer by REmaxer), wait until the last image appears on the screen and stop the chronometer as soon as possible. The resulting images (DNG + JPG) for each shot are about 25 MB, summing up to a total of 250 MB for each test. I should note that the cards were put in the camera in more or less random order, and changed each time (so I didn't do all the test for a card in a row).

The following 5 cards were tested:

Results

Time to display the last photo after the last of 10 shots (in seconds). The lower the better.

Unsurprisingly, the class 4 card is the worst. I had to wait more than 13 seconds after the last shot before I could see anything displayed on the screen. It is more than 10 seconds more than with the best cards. Clearly not a good choice.

Next thing, and still quite unsurprising, the SanDisk Extreme Pro was the fastest card, with an average of 3 seconds between the last shot and its display on screen. This is slightly better than the Transcend Class 10 cards.

The real surprise came from the Transcend cards. First, the 64 GB was significantly slower than the 16 GB ones, with about 7 seconds required to display the last photo versus only 3.5 – 4. Probably the controller isn't able to cope with all this space to allocate? Second, the two 16 GB cards performed fairly differently, one being noticeably slower than the other. More precisely, it had a few “outlier” points where it would take up to 5 seconds to display the last shot. I repeated the test several more times and came to the same conclusion: only one of the card displayed this feature. All other cards had much more stable results.

Conclusion

So, are the more expensive cards really better?

Well, the first thing we can conclude is that cheap class 4 cards are clearly slower. At least the SanDisk Ultra. Since Transcend class 10 cards are about the same price but much, much faster (even in the worst-case outlier scenario), the latter should be prefered.

Then, is it worth buying a SanDisk ExtremePro that is 2.5–3 times more expensive (at the same capacity)? Well, it depends if the 0.5–1 second gain really means something to you. It may be significant on the field when the action is taking place right now and you quickly need to check your photos are OK before continuing.

Several questions remain.

As you can see, my testings raise more questions than they solve. For now, I will keep the Extreme Pro in my K-5, with spare Transcends in the bag for when the 8 GB are full (with > 150 photos it shouldn't happen that often). I think this kind of setup is quite efficient: a small, fast card for everyday photos, and big, cheap ones available when more space is needed, at the cost of slightly slower shooting.

Xavier Robin
Published Friday, December 28, 2012 18:34 CET
Permalink: /blog/2012/12/28/transcend-class-10-vs-sandisk-extreme-pro-a-real-case-scenario
Tags: Photo
Comments: 2

Passer en français

Search

Tags

Background noise Books Computers Fun Hobbies Internet Me Mozilla My website Photo Politics Programming School Software Ubuntu pROC

Recent posts

Calendar

MonTueWedThuFriSatSun
1
2345678
9101112131415
16171819202122
23242526272829
3031

Syndication

Recommend