Php's htmlspecialchars stupid behavior

Can php's htmlspecialchars delete your data? The answer, unfortunately, is yes.

I just updated a database server with a web interface from PHP 5.3 (in Ubuntu 12.04) to PHP 7 (Ubuntu 16.04.2). It went pretty smoothly, but after a couple of weeks, users started reporting missing data in some fields where they were expecting some. After some investigation, it turns out the curlpit is the htmlspecialchars function, which changed behaviour with the update. Given the following script:

<?php 
$string = "An e acute character: \xE9\n";
echo htmlspecialchars($string);
?>

In PHP 5.3, it would output:

An e acute character: �

Now with PHP >= 5.4, here's the output:

 

Yep, that's correct: the output is empty. PHP just discarded the whole string. Without even a warning!

While this is documented in the manual, this is the most stupid and destructive design I have seen in a long while. Data loss guaranteed when the user saves the page without realizing some fields are accidentally empty! How can anyone be so brain dead and design and implement such a behaviour? Without even a warning!

It turns out one has to define the encoding for the function to work with non-UTF8 characters:

htmlspecialchars($string, ENT_COMPAT,'ISO-8859-1', true);

As this is a legacy application dating back more than 15 years, I fully expect some strings to be broken beyond repair. Thus I wrote the following function to replace all the calls to htmlspecialchars:

function safe_htmlspecialchars($string) {
	$htmlstring = htmlspecialchars($string, ENT_COMPAT,'ISO-8859-1', true);
  
        if (strlen($string) > 0 && strlen($htmlstring) == 0) {
                trigger_error ("htmlspecialchars failed to convert data", E_USER_ERROR);
        }
}

Displaying an error in case of doubt is the only sensible behaviour here, and should be the default.

Moral of the story: I'm never using PHP in a new project again. And neither should you, if you value your data more than the PHP developers who clearly don't.

Xavier Robin
Publié le dimanche 26 février 2017 à 14:25 CET
Lien permanent : /blog/2017/02/26/php-s-htmlspecialchars-stupid-behavior
Tags : Programmation
Commentaires : 0

Commentaires

Aucun commentaire

Nouveau commentaire

* L'astérisque dénote un champ obligatoire.

En soumettant votre message, vous acceptez qu' il soit publié sous licence CC BY-SA 3.0.

Quelques balises HTML sont autorisées : a[href, hreflang, title], br, em, i, strong, b, tt, samp, kbd, var, abbr[title], acronym[title], code, q[cite], sub, sup.

Switch to English

Chercher

Tags

Bruit de fond Hobbys Humour Informatique Internet Livres Logiciels Moi Mon site web Mozilla Photo Politique Programmation Scolaire Ubuntu pROC

Billets récents

Calendrier

lun.mar.mer.jeu.ven.sam.dim.
12345
6789101112
13141516171819
20212223242526
2728

Syndication

Recommender