Php's htmlspecialchars stupid behavior

Can php's htmlspecialchars delete your data? The answer, unfortunately, is yes.

I just updated a database server with a web interface from PHP 5.3 (in Ubuntu 12.04) to PHP 7 (Ubuntu 16.04.2). It went pretty smoothly, but after a couple of weeks, users started reporting missing data in some fields where they were expecting some. After some investigation, it turns out the curlpit is the htmlspecialchars function, which changed behaviour with the update. Given the following script:

<?php 
$string = "An e acute character: \xE9\n";
echo htmlspecialchars($string);
?>

In PHP 5.3, it would output:

An e acute character: �

Now with PHP >= 5.4, here's the output:

 

Yep, that's correct: the output is empty. PHP just discarded the whole string. Without even a warning!

While this is documented in the manual, this is the most stupid and destructive design I have seen in a long while. Data loss guaranteed when the user saves the page without realizing some fields are accidentally empty! How can anyone be so brain dead and design and implement such a behaviour? Without even a warning!

It turns out one has to define the encoding for the function to work with non-UTF8 characters:

htmlspecialchars($string, ENT_COMPAT,'ISO-8859-1', true);

As this is a legacy application dating back more than 15 years, I fully expect some strings to be broken beyond repair. Thus I wrote the following function to replace all the calls to htmlspecialchars:

function safe_htmlspecialchars($string) {
	$htmlstring = htmlspecialchars($string, ENT_COMPAT,'ISO-8859-1', true);
  
        if (strlen($string) > 0 && strlen($htmlstring) == 0) {
                trigger_error ("htmlspecialchars failed to convert data", E_USER_ERROR);
        }
}

Displaying an error in case of doubt is the only sensible behaviour here, and should be the default.

Moral of the story: I'm never using PHP in a new project again. And neither should you, if you value your data more than the PHP developers who clearly don't.

Xavier Robin
Published Sunday, February 26, 2017 14:25 CET
Permalink: /blog/2017/02/26/php-s-htmlspecialchars-stupid-behavior
Tags: Programming
Comments: 0

Comments

No comment

New comment

* denotes a mandatory field.

By submitting your message, you accept to publish it under a CC BY-SA 3.0 license.

Some HTML tags are allowed: a[href, hreflang, title], br, em, i, strong, b, tt, samp, kbd, var, abbr[title], acronym[title], code, q[cite], sub, sup.

Passer en français

Search

Tags

Background noise Books Computers Fun Hobbies Internet Me Mozilla My website Photo Politics Programming School Software Ubuntu pROC

Recent posts

Calendar

MonTueWedThuFriSatSun
12345
6789101112
13141516171819
20212223242526
2728

Syndication

Recommend