<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/">

	<title>Xavier Robin – Tag – Programming</title>
	<id>tag:xavier.robin.name,2010-05-28:/en/feed/tag/programming</id>
	<link rel="self" href="https://xavier.robin.name/en/feed/tag/programming" />
	<link rel="alternate" href="https://xavier.robin.name/en/tag/programming"/>
	<updated>2025-10-08T16:38:52.316850000+02:00</updated>
	<sy:updatePeriod>weekly</sy:updatePeriod>
	<sy:updateFrequency>2</sy:updateFrequency>
	<link rel="license" type="application/rdf+xml" href="http://creativecommons.org/licenses/by-sa/3.0/rdf" />

	<icon>https://xavier.robin.name/en/img/favicon.ico</icon>

	<author>
		<name>Xavier Robin</name>
		<uri>https://xavier.robin.name/en/contact</uri>
	</author>
	

	<entry xml:lang="en" xml:base="https://xavier.robin.name/en/">
		<title type="html">Deep Learning of MNIST handwritten digits</title>
		
			<category term="programming" label="Programming" scheme="https://xavier.robin.name/en/tag/programming" />
		
		<link href="https://xavier.robin.name/en/blog/2022/06/11/deep-learning-of-mnist-handwritten-digits"/>
		<id>tag:xavier.robin.name,2022-06-11:/blog/2022/06/11/deep-learning-of-mnist-handwritten-digits</id>
		<published>2022-06-11T16:39:25+02:00</published>
		<updated>2022-06-11T16:39:25+02:00</updated>
		<content type="html">&lt;p&gt;In this document I am going create a video showing the training of the inner-most layer of Deep Belief Network (DBN) using the MNIST dataset of handwritten digits. I will use our &lt;code&gt;DeepLearning&lt;/code&gt; R package that implements flexible DBN architectures with an object-oriented interface.&lt;/p&gt;

&lt;h2&gt;MNIST&lt;/h2&gt;
&lt;p&gt;The MNIST dataset is a database of handwritten digits with 60,000 training images and 10,000 testing images. &lt;a href=&quot;https://en.wikipedia.org/wiki/MNIST_database&quot; title=&quot;Wikipedia: MNIST database&quot;&gt;You can learn everything about it on Wikipedia&lt;/a&gt;. In short, it is the go-to dataset to train and test handwritten digit recognition machine learning algorithms.&lt;/p&gt;


&lt;p&gt;I made an R package for easy access, named &lt;code&gt;mnist&lt;/code&gt;. The easiest way to install it is with &lt;code&gt;devtools&lt;/code&gt;. If you don't have it already, let's first install &lt;code&gt;devtools&lt;/code&gt;:&lt;/p&gt;
&lt;pre&gt;
if (!require(&quot;devtools&quot;)) {install.packages(&quot;devtools&quot;)}
&lt;/pre&gt;

&lt;p&gt;Now we can install &lt;code&gt;mnist&lt;/code&gt;:&lt;/p&gt;

&lt;pre&gt;
devtools::install_github(&quot;xrobin/mnist&quot;)
&lt;/pre&gt;

&lt;h2&gt;PCA&lt;/h2&gt;
&lt;p&gt;In order to see what the dataset looks like, let's use PCA to reduce it to two dimensions.&lt;/p&gt;

&lt;pre&gt;
pca &amp;lt;- prcomp(mnist$train$x)
plot.mnist(
	prediction = predict(pca, mnist$test$x),
	reconstruction = tcrossprod(
		predict(pca, mnist$test$x)[,1:2], pca$rotation[,1:2]),
	highlight.digits = c(72, 3, 83, 91, 6688, 7860, 92, 1, 180, 13))
&lt;/pre&gt;

&lt;p&gt;&lt;img src=&quot;/files/blog/2022/06/11/pca.png&quot; style=&quot;max-width: 100%&quot; alt=&quot;PCA Scatterplot&quot;&gt;&lt;/p&gt;
	
&lt;p&gt;Let's take a minute to describe this plot.
The central scatterplot shows first two components of the PCA of all digits in the test set.
On the left hand side, I picked 10 representative digits from the test set to highlight, which are shown as larger circles in the central scatterplot.
On the left are the &quot;reconstructed digits&quot;, which were reconstructed from the two first dimensions of the PCA. While we can see some digit-like structures, it is basically impossible to recognize them.
We can see some separation of the digits in the 2D space as well, but it is pretty weak and some pairs cannot be distinguished at all (like 4 and 9).
Of course the reconstructions would look much better had we kept all PCA dimensions, but so much for dimensionality reduction.&lt;/p&gt;


&lt;h2&gt;Deep Learning&lt;/h2&gt;
&lt;p&gt;Now let's see if we can do better with Deep Learning. We'll use a classical Deep Belief Network (DBN), based on Restricted Boltzmann Machines (RBM) similar to what Hinton described back in 2006 (Hinton &amp;amp; Salakhutdinov, 2006). The training happens in two steps: a pre-training step with contrastive divergence stochastic gradient descent brings the network to a reasonable starting point for a more conventional conjugate gradient optimization (hereafter referred to as fine-tuning).&lt;/p&gt;

&lt;p&gt;I implemented this algorithm with a few modifications in an R package which is available on GitHub. The core of the processing is done in C++ with &lt;a href=&quot;https://cran.r-project.org/web/packages/RcppEigen/index.html&quot;&gt;RcppEigen&lt;/a&gt; (Bates &amp;amp; Eddelbuettel, 2013) for higher speed. Using &lt;code&gt;devtools&lt;/code&gt; again:&lt;/p&gt;
&lt;pre&gt;
devtools::install_github(&quot;xrobin/DeepLearning&quot;)
&lt;/pre&gt;

&lt;p&gt;We will use this code to train a 5 layers deep network, that reduces the digits to an abstract, 2D representation. By looking at this last layer throughout the training process we can start to understand how the network learns to recognize digits. Let's start by loading the required packages and the MNIST dataset, and create the DBN.&lt;/p&gt;

&lt;pre&gt;library(DeepLearning)
library(mnist)
data(mnist)

dbn &amp;lt;- DeepBeliefNet(Layers(c(784, 1000, 500, 250, 2),
	input = &quot;continuous&quot;, output = &quot;gaussian&quot;),
	initialize = &quot;0&quot;)
&lt;/pre&gt;
&lt;p&gt;We just created the 5-layers DBN, with continuous, 784 nodes input (the digit image pixels), and a 2 nodes, gaussian output. It is initialized with 0, but we could have left out the &lt;code&gt;initialize&lt;/code&gt; to start from a random initilization (Bengio &lt;i&gt;et al.&lt;/i&gt;, 2007). Before we go, let's define a few useful variables:&lt;/p&gt;

&lt;pre&gt;
output.folder &amp;lt;- &quot;video&quot; # Where to save the output
maxiters.pretrain &amp;lt;- 1e6  # Number of pre-training iterations
maxiters.train &amp;lt;- 10000 # Number of fine-tuning iterations
run.training &amp;lt;- run.images &amp;lt;- TRUE # Turn any of these off 
# Which digits to highlight and reconstruct
highlight.digits = c(72, 3, 83, 91, 6688, 7860, 92, 1, 180, 13)
&lt;/pre&gt;

&lt;p&gt;We'll also need the following function to show the elapsed time:&lt;/p&gt;
&lt;pre&gt;
format.timediff &amp;lt;- function(start.time) {
    diff = as.numeric(difftime(Sys.time(), start.time, units=&quot;mins&quot;))
    hr &amp;lt;- diff%/%60
    min &amp;lt;- floor(diff - hr * 60)
    sec &amp;lt;- round(diff%%1 * 60,digits=2)
    return(paste(hr,min,sec,sep=':'))
}
&lt;/pre&gt;

&lt;h2&gt;Pre-training&lt;/h2&gt;
&lt;p&gt;Initially, the network is a stack of RBMs that we need to &lt;em&gt;pre-train&lt;/em&gt; one by one. Hinton &amp;amp; Salakhutdinov (2006) showed that this step is critical to train deep networks. We will use 1000000 iterations (&lt;code&gt;maxiters.pretrain&lt;/code&gt;) of contrastive divergence, which takes a couple of days on a modern CPU. Let's start with the first three RBMs:&lt;/p&gt;

&lt;h3&gt;First three RBMs&lt;/h3&gt;
&lt;pre&gt;
if (run.training) {
	sprintf.fmt.iter &amp;lt;- sprintf(&quot;%%0%dd&quot;, nchar(sprintf(&quot;%d&quot;, maxiters.pretrain)))
	
	mnist.data.layer &amp;lt;- mnist
	for (i in 1:3) {
&lt;/pre&gt;
&lt;p&gt;We define a &lt;code&gt;diag&lt;/code&gt; function that will simply print where we are in the training. Because this function will be called a million times (&lt;code&gt;maxiters.pretrain&lt;/code&gt;), we can use &lt;code&gt;rate = &quot;accelerate&quot;&lt;/code&gt; to slow down the printing over time and save a few CPU cycles.&lt;/p&gt;
&lt;pre&gt;
		diag &amp;lt;- list(rate = &quot;accelerate&quot;, data = NULL, f = function(rbm, batch, data, iter, batchsize, maxiters, layer) {
			print(sprintf(&quot;%s[%s/%s] in %s&quot;, layer, iter, maxiters, format.timediff(start.time)))
		})
&lt;/pre&gt;
&lt;p&gt;We can get the current RBM, and we will work on it directly. Let's save it for good measure, as well as the current time for the progress function:&lt;/p&gt;
&lt;pre&gt;
		rbm &amp;lt;- dbn[[i]]
		save(rbm, file = file.path(output.folder, sprintf(&quot;rbm-%s-%s.RData&quot;, i, &quot;initial&quot;)))
		start.time &amp;lt;- Sys.time()
&lt;/pre&gt;
&lt;p&gt;Now we can start the actual pre-training:&lt;/p&gt;
&lt;pre&gt;
		rbm &amp;lt;- pretrain(rbm, mnist.data.layer$train$x,
			penalization = &quot;l2&quot;, lambda=0.0002, momentum = c(0.5, 0.9),
			epsilon=c(.1, .1, .1, .001)[i], batchsize = 100, maxiters=maxiters.pretrain,
			continue.function = continue.function.always, diag = diag)
&lt;/pre&gt;
&lt;p&gt;This can take some time, especially for the first layers which are larger. Once it is done, we predict the data through this RBM for the next layer and save the results:&lt;/p&gt;
&lt;pre&gt;
		mnist.data.layer$train$x &amp;lt;- predict(rbm, mnist.data.layer$train$x)
		mnist.data.layer$test$x &amp;lt;- predict(rbm, mnist.data.layer$test$x)
		save(rbm, file = file.path(output.folder, sprintf(&quot;rbm-%s-%s.RData&quot;, i, &quot;final&quot;)))
		dbn[[i]] &amp;lt;- rbm
	}
&lt;/pre&gt;

&lt;h3&gt;Last RBM&lt;/h3&gt;
&lt;p&gt;This is very similar to the previous three, but note that we save the RBM within the &lt;code&gt;diag&lt;/code&gt; function. We could generate the plot directly, but it is easier to do it later once we have some idea about the final axis we will need. Please note the &lt;code&gt;rate = &quot;accelerate&quot;&lt;/code&gt; here. You probably don't want to save a million RBM objects on your hard drive, both for speed and space reasons.&lt;/p&gt;

&lt;pre&gt;
	rbm &amp;lt;- dbn[[4]]
	print(head(rbm$b))
	diag &amp;lt;- list(rate = &quot;accelerate&quot;, data = NULL, f = function(rbm, batch, data, iter, batchsize, maxiters, layer) {
		save(rbm, file = file.path(output.folder, sprintf(&quot;rbm-4-%s.RData&quot;, sprintf(sprintf.fmt.iter, iter))))
		print(sprintf(&quot;%s[%s/%s] in %s&quot;, layer, iter, maxiters, format.timediff(start.time)))
	})
	save(rbm, file = file.path(output.folder, sprintf(&quot;rbm-%s-%s.RData&quot;, 4, &quot;initial&quot;)))
	start.time &amp;lt;- Sys.time()
	rbm &amp;lt;- pretrain(rbm, mnist.data.layer$train$x,  penalization = &quot;l2&quot;, lambda=0.0002,
		epsilon=.001, batchsize = 100, maxiters=maxiters.pretrain,
		continue.function = continue.function.always, diag = diag)
	save(rbm, file = file.path(output.folder, sprintf(&quot;rbm-4-%s.RData&quot;, &quot;final&quot;)))
	dbn[[4]] &amp;lt;- rbm
&lt;/pre&gt;

&lt;iframe src=&quot;https://www.youtube.com/embed/3EapaWpDqGQ&quot; title=&quot;YouTube video player&quot; allow=&quot;accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture&quot; allowfullscreen style=&quot;	width: 100%; aspect-ratio: 16/9; border: none;&quot;&gt;&lt;/iframe&gt;

&lt;p&gt;If we were not querying the last layer, we could have pre-trained the entire network at once with the following call:&lt;/p&gt;

&lt;pre&gt;
	dbn &amp;lt;- pretrain(dbn, mnist.data.layer$train$x, 
		penalization = &quot;l2&quot;, lambda=0.0002, momentum = c(0.5, 0.9),
		epsilon=c(.1, .1, .1, .001), batchsize = 100, 
		maxiters=maxiters.pretrain,
		continue.function = continue.function.always)
&lt;/pre&gt;	 

&lt;h3&gt;Pre-training parameters&lt;/h3&gt;
&lt;p&gt;Pre-training RBMs is quite sensitive to the use of proper parameters.
 With improper parameters, the network can quickly go crazy and start to generate infinite values. If that happens to you, you should try to tune one of the following parameters:
 
&lt;ul&gt;
 &lt;li&gt;&lt;code&gt;penalization&lt;/code&gt;: this is the penalty of introducing or increasing the value of a weight. We used L2 regularization, but &lt;code&gt;&quot;l1&quot;&lt;/code&gt; is available if a sparser weight matrix is needed.&lt;/li&gt;
 &lt;li&gt;&lt;code&gt;lambda&lt;/code&gt;: the regularization rate. In our experience 0.0002 works fine with the MNIST and other datasets of similar sizes such as cellular imaging data. Too small or large values will result in over- or under-fitted networks, respectively.&lt;/li&gt;
 &lt;li&gt;&lt;code&gt;momentum&lt;/code&gt;: helps avoiding oscillatory behaviors, where the network oscillate between iterations. Allowed values can range from 0 (no momentum) to 1 (full momentum = no training). Here we used an increasing gradient of momentum which starts at 0.5 and increases linearly to 0.9, in order to stabilize the final network without compromising early training steps.&lt;/li&gt;
 &lt;li&gt;&lt;code&gt;epsilon&lt;/code&gt;: the learning rate. Typically, 0.1 works well with binary and continuous output layers, and must be decreased to around 0.001 for gaussian outputs. Too large values will drive the network to generate infinities, while too small ones will slow down the training.&lt;/li&gt;
 &lt;li&gt;&lt;code&gt;batchsize&lt;/code&gt;: larger batch sizes will result in smoother but slower training. Small batch sizes will make the training &quot;jumpy&quot;, which can be compensated by lower learning rates (epsilon) or increased momentum.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;Fine-tuning&lt;/h2&gt;
&lt;p&gt;This is where the real training happens. We use conjugate gradients to find the optimal solution. Again, the &lt;code&gt;diag&lt;/code&gt; function saves the DBN. This time we use &lt;code&gt;rate = &quot;each&quot;&lt;/code&gt; to save every step of the training. First we have way fewer steps, but also the training itself happen at a much more stable speed than in the pre-training, where things slow down dramatically.
&lt;/p&gt;

&lt;pre&gt;
	sprintf.fmt.iter &amp;lt;- sprintf(&quot;%%0%dd&quot;, nchar(sprintf(&quot;%d&quot;, maxiters.train)))
	diag &amp;lt;- list(rate = &quot;each&quot;, data = NULL, f = function(dbn, batch, data, iter, batchsize, maxiters) {
		save(dbn, file = file.path(output.folder, sprintf(&quot;dbn-finetune-%s.RData&quot;, sprintf(sprintf.fmt.iter, iter))))
		print(sprintf(&quot;[%s/%s] in %s&quot;, iter, maxiters, format.timediff(start.time)))
	})
	save(dbn, file = file.path(output.folder, sprintf(&quot;dbn-finetune-%s.RData&quot;, &quot;initial&quot;)))
	start.time &amp;lt;- Sys.time()
	dbn &amp;lt;- train(unroll(dbn), mnist$train$x, batchsize = 100, maxiters=maxiters.train,
		continue.function = continue.function.always, diag = diag)
	save(dbn, file = file.path(output.folder, sprintf(&quot;dbn-finetune-%s.RData&quot;, &quot;final&quot;)))
}
&lt;/pre&gt;

&lt;iframe src=&quot;https://www.youtube.com/embed/wSfoZ_kMMTc&quot; title=&quot;YouTube video player&quot; allow=&quot;accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture&quot; allowfullscreen style=&quot;	width: 100%; aspect-ratio: 16/9; border: none;&quot;&gt;&lt;/iframe&gt;

&lt;p&gt;And that's it, our DBN is now fully trained!&lt;/p&gt;

&lt;h2&gt;Generating the images&lt;/h2&gt;
&lt;p&gt;Now we need to read in the saved network states again, pass the data through the network (&lt;code&gt;predict&lt;/code&gt;) and save this in HD-sized PNG file.&lt;/p&gt;

&lt;p&gt;The first three RBMs are only loaded into the DBN&lt;/p&gt;

&lt;pre&gt;
if (run.images) {
	for (i in 1:3) {
		load(file.path(output.folder, sprintf(&quot;rbm-%d-final.RData&quot;, i)))
		dbn[[i]] &amp;lt;- rbm
	}
&lt;/pre&gt;

&lt;p&gt;The last RBM is where interesting things happen.&lt;/p&gt;
&lt;pre&gt;
	for (file in list.files(output.folder, pattern = &quot;rbm-4-.+\\.RData&quot;, full.names = TRUE)) {
		print(file)
		load(file)
		dbn[[4]] &amp;lt;- rbm
		iter &amp;lt;- stringr::str_match(file, &quot;rbm-4-(.+)\\.RData&quot;)[,2]
&lt;/pre&gt;
&lt;p&gt;We now predict and reconstruct the data, and calculate the mean reconstruction error:&lt;/p&gt;
&lt;pre&gt;
		predictions &amp;lt;- predict(dbn, mnist$test$x)
		reconstructions &amp;lt;- reconstruct(dbn, mnist$test$x)
		iteration.error &amp;lt;- errorSum(dbn, mnist$test$x) / nrow(mnist$test$x)
&lt;/pre&gt;
&lt;p&gt;Now the actual plotting. Here I selected &lt;code&gt;xlim&lt;/code&gt; and &lt;code&gt;ylim&lt;/code&gt; values that worked well for my training run, but your mileage may vary.&lt;/p&gt;
&lt;pre&gt;
		png(sub(&quot;.RData&quot;, &quot;.png&quot;, file), width = 1280, height = 720) # hd output
		plot.mnist(model = dbn, x = mnist$test$x, label = mnist$test$y+1, predictions = predictions, reconstructions = reconstructions,
				   ncol = 16, highlight.digits = highlight.digits,
				   xlim = c(-12.625948, 8.329168), ylim = c(-10.50657, 13.12654))
		par(family=&quot;mono&quot;)
		legend(&quot;bottomleft&quot;, legend = sprintf(&quot;Mean error = %.3f&quot;, iteration.error), bty=&quot;n&quot;, cex=3)
		legend(&quot;bottomright&quot;, legend = sprintf(&quot;Iteration = %s&quot;, iter), bty=&quot;n&quot;, cex=3)
		dev.off()
	}
&lt;/pre&gt;
&lt;p&gt;&lt;img src=&quot;/files/blog/2022/06/11/rbm-4-final.png&quot; style=&quot;max-width: 100%&quot; alt=&quot;Scatterplot after pretraining&quot;&gt;&lt;/p&gt;

&lt;p&gt;We do the same with the fine-tuning:&lt;/p&gt;
&lt;pre&gt;
	for (file in list.files(output.folder, pattern = &quot;dbn-finetune-.+\\.RData&quot;, full.names = TRUE)) {
		print(file)
		load(file)
		iter &amp;lt;- stringr::str_match(file, &quot;dbn-finetune-(.+)\\.RData&quot;)[,2]
		predictions &amp;lt;- predict(dbn, mnist$test$x)
		reconstructions &amp;lt;- reconstruct(dbn, mnist$test$x)
		iteration.error &amp;lt;- errorSum(dbn, mnist$test$x) / nrow(mnist$test$x)
		png(sub(&quot;.RData&quot;, &quot;.png&quot;, file), width = 1280, height = 720) # hd output
		plot.mnist(model = dbn, x = mnist$test$x, label = mnist$test$y+1, predictions = predictions, reconstructions = reconstructions,
				   ncol = 16, highlight.digits = highlight.digits,
				   xlim = c(-22.81098,  27.94829), ylim = c(-17.49874,  33.34688))
		par(family=&quot;mono&quot;)
		legend(&quot;bottomleft&quot;, legend = sprintf(&quot;Mean error = %.3f&quot;, iteration.error), bty=&quot;n&quot;, cex=3)
		legend(&quot;bottomright&quot;, legend = sprintf(&quot;Iteration = %s&quot;, iter), bty=&quot;n&quot;, cex=3)
		dev.off()
	}
}
&lt;/pre&gt;
&lt;p&gt;&lt;img src=&quot;/files/blog/2022/06/11/dbn-finetune-final.png&quot; style=&quot;max-width: 100%&quot; alt=&quot;Scatterplot after fine-tuning&quot;&gt;&lt;/p&gt;

&lt;h2&gt;The video&lt;/h2&gt;

&lt;p&gt;I simply used &lt;a href=&quot;https://ffmpeg.org/&quot;&gt;ffmpeg&lt;/a&gt; to convert the PNG files to a video:&lt;/p&gt;
&lt;pre&gt;
cd video
ffmpeg -pattern_type glob -i &quot;rbm-4-*.png&quot; -b:v 10000000 -y ../rbm-4.mp4
ffmpeg -pattern_type glob -i &quot;dbn-finetune-*.png&quot; -b:v 10000000 -y ../dbn-finetune.mp4
&lt;/pre&gt;

&lt;p&gt;And that's it! Notice how the pre-training only brings the network to a state similar to that of a PCA, and the fine-tuning actually does the separation, and how it really makes the reconstructions accurate.&lt;/p&gt;

&lt;h2&gt;Application&lt;/h2&gt;
&lt;p&gt;We used this code to analyze changes in cell morphology upon drug resistance in cancer. With a 27-dimension space, we could describe all of the observed cell morphologies and predict whether a cell was resistant to ErbB-family drugs with an accuracy of 74%. The paper is available in Open Access in Cell Reports, DOI &lt;a href=&quot;https://doi.org/10.1016/j.celrep.2020.108657&quot; title=&quot;Deep neural networks identify signaling mechanisms of ErbB-family drug resistance from a continuous cell morphology space&quot;&gt;10.1016/j.celrep.2020.108657&lt;/a&gt;.

&lt;h2&gt;Concluding remarks&lt;/h2&gt;
&lt;p&gt;In this document I described how to build and train a DBN with the &lt;code&gt;DeepLearning&lt;/code&gt; package. I also showed how to query the internal layer, and use the generative properties to follow the training of the network on handwritten digits.&lt;/p&gt;
&lt;p&gt;DBNs have the advantage over Convolutional Networks (CN) that they are fully generative, at least during the pre-training. They are therefore easier to query and interpret as we have demonstrated here. 
 However keep in mind that CNs have demonstrated higher accuracies on computer vision tasks, such as the MNIST dataset.&lt;/p&gt;
&lt;p&gt;Additional algorithmic details are available in the &lt;code&gt;doc&lt;/code&gt; folder of the DeepLearning package.&lt;/p&gt;


&lt;h2&gt;References&lt;/h2&gt;
&lt;dl&gt;
	&lt;dt&gt;Our paper, 2021&lt;/dt&gt;
	&lt;dd&gt;Longden J., Robin X., Engel M., &lt;i&gt;et al.&lt;/i&gt;
 &lt;a href=&quot;https://doi.org/10.1016/j.celrep.2020.108657&quot;&gt;Deep neural networks identify signaling mechanisms of ErbB-family drug resistance from a continuous cell morphology space&lt;/a&gt;. &lt;i&gt;Cell Reports&lt;/i&gt;, 2021;34(3):108657.&lt;/dd&gt;

	&lt;dt&gt;Bates &amp;amp; Eddelbuettel, 2013&lt;/dt&gt;
	&lt;dd&gt;Bates D, Eddelbuettel D. &lt;a href=&quot;http://www.jstatsoft.org/v52/i05/&quot;&gt;Fast and Elegant Numerical Linear Algebra Using the RcppEigen Package&lt;/a&gt;. &lt;i&gt;Journal of Statistical Software&lt;/i&gt;, 2013;52(5):1&amp;ndash;24.&lt;/dd&gt;

	&lt;dt&gt;Bengio &lt;i&gt;et al.&lt;/i&gt;, 2007&lt;/dt&gt;
	&lt;dd&gt;Bengio Y, Lamblin P, Popovici D, Larochelle H. &lt;a href=&quot;https://papers.nips.cc/paper/3048-greedy-layer-wise-training-of-deep-networks.pdf&quot;&gt;Greedy layer-wise training of deep networks&lt;/a&gt;. &lt;i&gt;Advances in neural information processing systems&lt;/i&gt;. 2007;19:153&amp;ndash;60.&lt;/dd&gt;

	&lt;dt&gt;Hinton &amp;amp; Salakhutdinov, 2006&lt;/dt&gt;
	&lt;dd&gt;Hinton GE, Salakhutdinov RR. &lt;a href=&quot;http://dx.doi.org/10.1126/science.1127647&quot;&gt;Reducing the Dimensionality of Data with Neural Networks&lt;/a&gt;. &lt;i&gt;Science&lt;/i&gt;. 2006;313(5786):504&amp;ndash;7.&lt;/dd&gt;
&lt;/dl&gt;

&lt;h2&gt;Downloads&lt;/h2&gt;
&lt;ol&gt;
	&lt;li&gt;&lt;a href=&quot;/files/blog/2022/06/11/MNIST_video.tar.gz&quot;&gt;Code to generate the video&lt;/a&gt;&lt;/li&gt;
	&lt;li&gt;&lt;a href=&quot;https://github.com/xrobin/DeepLearning&quot;&gt;DeepLearning package source code&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;
</content>
		</entry>

	<entry xml:lang="en" xml:base="https://xavier.robin.name/en/">
		<title type="html">Php's htmlspecialchars stupid behavior</title>
		
			<category term="programming" label="Programming" scheme="https://xavier.robin.name/en/tag/programming" />
		
		<link href="https://xavier.robin.name/en/blog/2017/02/26/php-s-htmlspecialchars-stupid-behavior"/>
		<id>tag:xavier.robin.name,2017-02-26:/blog/2017/02/26/php-s-htmlspecialchars-stupid-behavior</id>
		<published>2017-02-26T14:25:02+01:00</published>
		<updated>2017-02-26T14:25:02+01:00</updated>
		<content type="html">&lt;p&gt;Can php's htmlspecialchars delete your data? The answer, unfortunately, is yes.&lt;/p&gt;

&lt;p&gt;I just updated a database server with a web interface from PHP 5.3 (in Ubuntu 12.04) to PHP 7 (Ubuntu 16.04.2). It went pretty smoothly, but after a couple of weeks, users started reporting missing data in some fields where they were expecting some. After some investigation, it turns out the curlpit is the &lt;code&gt;htmlspecialchars&lt;/code&gt; function, which changed behaviour with the update. Given the following script:&lt;/p&gt;

&lt;pre&gt;
&amp;lt;?php 
$string = &quot;An e acute character: \xE9\n&quot;;
echo htmlspecialchars($string);
?&amp;gt;&lt;/pre&gt;

&lt;p&gt;In PHP 5.3, it would output:&lt;/p&gt; 
&lt;pre&gt;An e acute character: �&lt;/pre&gt;

&lt;p&gt;Now with PHP &gt;= 5.4, here's the output:&lt;/p&gt;
&lt;pre&gt;&amp;nbsp;&lt;/pre&gt;

&lt;p&gt;Yep, that's correct: the output is empty. PHP just discarded the whole string. Without even a warning! &lt;/p&gt;

&lt;p&gt;While this is &lt;a href=&quot;http://php.net/manual/en/function.htmlspecialchars.php&quot;&gt;documented in the manual&lt;/a&gt;, this is the most stupid and destructive design I have seen in a long while. Data loss guaranteed when the user saves the page without realizing some fields are accidentally empty! How can anyone be so brain dead and design and implement such a behaviour? Without even a warning!&lt;/p&gt;

&lt;p&gt;It turns out one has to define the encoding for the function to work with non-UTF8 characters:&lt;/p&gt;
&lt;pre&gt;htmlspecialchars($string, ENT_COMPAT,'ISO-8859-1', true);&lt;/pre&gt;

&lt;p&gt;As this is a legacy application dating back more than 15 years, I fully expect some strings to be broken beyond repair. Thus I wrote the following function to replace all the calls to &lt;code&gt;htmlspecialchars&lt;/code&gt;:&lt;/p&gt;
&lt;pre&gt;
function safe_htmlspecialchars($string) {
	$htmlstring = htmlspecialchars($string, ENT_COMPAT,'ISO-8859-1', true);
  
        if (strlen($string) &gt; 0 &amp;&amp; strlen($htmlstring) == 0) {
                trigger_error (&quot;htmlspecialchars failed to convert data&quot;, E_USER_ERROR);
        }
}
&lt;/pre&gt;

&lt;p&gt;Displaying an error in case of doubt is the only sensible behaviour here, and should be the default.&lt;/p&gt;

&lt;p&gt;Moral of the story: I'm never using PHP in a new project again. And neither should you, if you value your data more than the PHP developers who clearly don't.&lt;/p&gt;</content>
		</entry>

	<entry xml:lang="fr" xml:base="https://xavier.robin.name/en/">
		<title type="html"></title>
		
			<category term="hobbies" label="Hobbies" scheme="https://xavier.robin.name/en/tag/hobbies" />
		
			<category term="programming" label="Programming" scheme="https://xavier.robin.name/en/tag/programming" />
		
		<link href="https://xavier.robin.name/en/blog/2011/06/02/sécheresse-de-printemps-2011-à-genève"/>
		<id>tag:xavier.robin.name,2011-06-02:/blog/2011/06/02/s%C3%A9cheresse-de-printemps-2011-%C3%A0-gen%C3%A8ve</id>
		<published>2011-06-02T15:15:39+02:00</published>
		<updated>2011-06-02T15:15:39+02:00</updated>
		<content type="html"></content>
		</entry>

	<entry xml:lang="fr" xml:base="https://xavier.robin.name/en/">
		<title type="html">Extraction du Ruby</title>
		
			<category term="programming" label="Programming" scheme="https://xavier.robin.name/en/tag/programming" />
		
		<link href="https://xavier.robin.name/en/blog/2011/02/09/extraction-du-ruby"/>
		<id>tag:xavier.robin.name,2011-02-09:/blog/2011/02/09/extraction-du-ruby</id>
		<published>2011-02-09T18:20:02+01:00</published>
		<updated>2011-02-09T18:20:02+01:00</updated>
		<content type="html">&lt;p&gt;Il y a pas mal de temps, j'étais tombé sur le tutoriel intitulé &lt;em&gt;Extraction du Ruby&lt;/em&gt; sur le  &lt;a href=&quot;http://www.siteduzero.com/&quot;&gt;SiteDuZero&lt;/a&gt;. Il abordait les points clés du langage de manière didactique et très agréable. Malheureusement, il a été supprimé par la suite.&lt;/p&gt;

&lt;p&gt;Mais tout n'est pas perdu&amp;nbsp;! Grâce à l'&lt;a href=&quot;http://web.archive.org/&quot;&gt;archive web&lt;/a&gt;, j'ai pu récupérer ce tutoriel (dans sa &lt;a href=&quot;http://web.archive.org/web/20071013011732/http://www.siteduzero.com/tuto-3-2771-0-extraction-du-ruby.html&quot;&gt;version du 13 octobre 2007&lt;/a&gt;). Comme il est sous licence &lt;a href=&quot;http://creativecommons.org/licenses/by-nc-sa/2.0/fr/&quot;&gt;licence
Creative Commons NC-SA&lt;/a&gt;, j'ai pu en faire une copie. Vous la trouverez naturellement dans le fichier &lt;a href=&quot;https://xavier.robin.name/files/ruby/extraction-du-ruby.htm&quot;&gt;Extraction du Ruby&lt;/a&gt;. Bonne lecture&amp;nbsp;!&lt;/p&gt;</content>
		</entry>

	<entry xml:lang="en" xml:base="https://xavier.robin.name/en/">
		<title type="html">CPAN: Terminal does not support GetHistory / AddHistory in Ubuntu</title>
		
			<category term="programming" label="Programming" scheme="https://xavier.robin.name/en/tag/programming" />
		
		<link href="https://xavier.robin.name/en/blog/2010/08/28/cpan-terminal-does-not-support-gethistory-addhistory-in-ubuntu"/>
		<id>tag:xavier.robin.name,2010-08-28:/blog/2010/08/28/cpan-terminal-does-not-support-gethistory-addhistory-in-ubuntu</id>
		<published>2010-08-28T17:10:18+02:00</published>
		<updated>2011-03-16T10:04:10+01:00</updated>
		<content type="html">&lt;p&gt;Using CPAN in Ubuntu Lucid Lynx (10.04), the command line history was broken and the up/down keys only triggered escape codes such as &lt;code&gt;^[[A&lt;/code&gt;. Additionally, the following error messages showed up on CPAN startup (only one of them at a time):&lt;/p&gt;

&lt;pre&gt;Terminal does not support GetHistory
Terminal does not support AddHistory&lt;/pre&gt;

&lt;p&gt;&lt;code&gt;Term::ReadLine&lt;/code&gt; was correctly installed, but when I tried to install &lt;code&gt;Bundle::CPAN&lt;/code&gt;, I got failed tests for &lt;code&gt;Term::ReadLine::Perl&lt;/code&gt;. So I did:&lt;/p&gt;

&lt;pre&gt;look Term::ReadLine::Perl&lt;/pre&gt;

&lt;p&gt;to inspect the build folder, tried the test which actually seems to work. Installing the module from the shell with:&lt;/p&gt;

&lt;pre&gt;make install&lt;/pre&gt;

&lt;p&gt;was sucessful, and after restarting CPAN I had my terminal working properly. Anyway, it seems &lt;code&gt;Term::ReadLine&lt;/code&gt; still isn't functionning perfectly:&lt;/p&gt;

&lt;pre&gt;perl -MTerm::ReadLine::Perl
Can't locate object method &quot;Features&quot; via package &quot;Term::ReadLine::Stub&quot; at /opt/perl-5.12.1/lib/site_perl/5.12.1/Term/ReadLine/Perl.pm line 101.
Compilation failed in require.
BEGIN failed--compilation aborted.&lt;/pre&gt;</content>
		</entry>

	<entry xml:lang="fr" xml:base="https://xavier.robin.name/en/">
		<title type="html"></title>
		
			<category term="programming" label="Programming" scheme="https://xavier.robin.name/en/tag/programming" />
		
			<category term="politics" label="Politics" scheme="https://xavier.robin.name/en/tag/politics" />
		
		<link href="https://xavier.robin.name/en/blog/2010/07/03/évolution-du-climat-à-genève"/>
		<id>tag:xavier.robin.name,2010-07-03:/blog/2010/07/03/%C3%A9volution-du-climat-%C3%A0-gen%C3%A8ve</id>
		<published>2010-07-03T10:22:01+02:00</published>
		<updated>2010-08-14T14:37:02+02:00</updated>
		<content type="html"></content>
		</entry>

	<entry xml:lang="en" xml:base="https://xavier.robin.name/en/">
		<title type="html">ePerl syntax highlighting in Kate</title>
		
			<category term="programming" label="Programming" scheme="https://xavier.robin.name/en/tag/programming" />
		
		<link href="https://xavier.robin.name/en/blog/2010/06/20/eperl-syntax-highlighting-in-kate"/>
		<id>tag:xavier.robin.name,2010-06-20:/blog/2010/06/20/eperl-syntax-highlighting-in-kate</id>
		<published>2010-06-20T10:53:29+02:00</published>
		<updated>2010-06-20T10:53:29+02:00</updated>
		<content type="html">&lt;p&gt;Kate is a great editor. It has an extensible syntax highlighting facility built around &lt;abbr title=&quot;eXtended Markup Language&quot;&gt;XML&lt;/abbr&gt; files.&lt;/p&gt;
&lt;p&gt;As ePerl is very similar to PHP in essence, I took the existing PHP highlighting scheme and adapted it to ePerl. I also had to modify the Perl syntax file to catch the &lt;code&gt;:&amp;gt;&lt;/code&gt; and &lt;code&gt;!&amp;gt;&lt;/code&gt; terminations. And that's it!&lt;/p&gt;

&lt;h2&gt;Installation&lt;/h2&gt;
&lt;p&gt;Copy the two following files in your &lt;code class=&quot;folder&quot;&gt;~/.kde/share/apps/katepart/syntax/&lt;/code&gt; directory:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;/files/eperl/perl-eperl.xml&quot;&gt;perl-eperl.xml&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;/files/eperl/html-eperl.xml&quot;&gt;html-eperl.xml&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Restart Kate. All your &lt;code class=&quot;ext&quot;&gt;.epl&lt;/code&gt;, &lt;code class=&quot;ext&quot;&gt;.eperl&lt;/code&gt;, &lt;code class=&quot;ext&quot;&gt;.phtml&lt;/code&gt; and &lt;code class=&quot;ext&quot;&gt;.phtm&lt;/code&gt; files should be highlighted. If not, open the &lt;code class=&quot;menu&quot;&gt;Tools&lt;/code&gt; menu, and select &lt;code class=&quot;menu&quot;&gt;Syntax highlighting&lt;/code&gt;, &lt;code class=&quot;menu&quot;&gt;Script&lt;/code&gt;, &lt;code class=&quot;menu&quot;&gt;ePerl (HTML)&lt;/code&gt;.

&lt;p&gt;Tested in Kate 3.3.2 (KDE 4.3.2 / Ubuntu 9.10 Karmic Koala).&lt;/p&gt;

&lt;h2&gt;Known bugs&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;You can mix &lt;code&gt;&amp;lt;?&lt;/code&gt; with &lt;code&gt;:&amp;gt;&lt;/code&gt; (for example) and all the other invalid combinations will be undetected. In fact, there should be separate highlighting for CGI and normal modes, but there is not. This should not be a big issue, though.&lt;/li&gt;
&lt;/ul&gt;</content>
		</entry>

	<entry xml:lang="en" xml:base="https://xavier.robin.name/en/">
		<title type="html">Debugging ePerl</title>
		
			<category term="programming" label="Programming" scheme="https://xavier.robin.name/en/tag/programming" />
		
		<link href="https://xavier.robin.name/en/blog/2010/06/19/debugging-eperl"/>
		<id>tag:xavier.robin.name,2010-06-19:/blog/2010/06/19/debugging-eperl</id>
		<published>2010-06-19T15:56:06+02:00</published>
		<updated>2010-06-19T17:34:41+02:00</updated>
		<content type="html">&lt;p&gt;Last time I reported &lt;a href=&quot;/blog/2010/06/13/eperl&quot;&gt;my wish to have true 500 errors upon script errors&lt;/a&gt;. I reported &lt;a href=&quot;http://marginalhacks.com/Hacks/ePerl/&quot;&gt;this rewrite by MarginalHacks&lt;/a&gt;, but also that &lt;a href=&quot;http://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol#Request_methods&quot;&gt;POST&lt;/a&gt; data was lost.&lt;/p&gt;

&lt;p&gt;So, I digged into the code and here are my results.&lt;/p&gt;

&lt;h2&gt;Setting the current script&lt;/h2&gt;
&lt;p&gt;ePerl reads the script from the arguments. However, in CGI mode, the script to read from is set in the environment variable, not in argument.&lt;/p&gt;

&lt;p&gt;What is weird is that sometimes, this is unnecessary as the filename is already captured in the &lt;code&gt;$ENV{'PATH_TRANSLATED'}&lt;/code&gt; block just before the arguments parsing. Indeed, it started working just as I first published this post. Anyway, argument parsing is then ignored, which is a problem, and the PATH_TRANSLATED does not give the filename, but a path which is &lt;em&gt;not&lt;/em&gt; the script if you have &lt;a href=&quot;http://httpd.apache.org/docs/2.2/mod/mod_rewrite.html&quot;&gt;mod_rewrite&lt;/a&gt; or use &lt;a href=&quot;http://httpd.apache.org/docs/2.2/mod/core.html#acceptpathinfo&quot;&gt;PATH_INFO&lt;/a&gt; (as I do both, you can imagine how wrong it is). So:&lt;/p&gt;

&lt;pre&gt;@@ -161,13 +164,6 @@
   $opt{'perl'} = $^X;
   $opt{'CaseDelimiters'} = 1;
 
-  if ($ENV{'PATH_TRANSLATED'}) {
-    # We're being called in a CGI environment, so @ARGV contains
-    # the search keywords, not the files or options to process
-    @files = ($ENV{'PATH_TRANSLATED'});
-    # Check for &quot;nph-&quot;
-    $opt{'mode'} = basename($ENV{'PATH_TRANSLATED'}) =~ /^nph-/ ? &quot;n&quot; : &quot;c&quot;;
-  } else {
-
     while ($#ARGV&gt;=0) {
       my $arg=shift(@ARGV);
       if ($arg =~ /^-(h|-help)$/) { usage(\%opt); }
@@ -209,18 +206,17 @@
     # Mode if not specified
     $opt{'mode'} = &quot;f&quot; unless ($opt{'mode'});
     $opt{'mode'} = &quot;f&quot; if ($opt{'mode'} =~ /^filter$/i);
-    $opt{'mode'} = &quot;c&quot; if ($opt{'mode'} =~ /^cgi$/i);
-    $opt{'mode'} = &quot;n&quot; if ($opt{'mode'} =~ /^nph-cgi$/i);
+    $opt{'mode'} = &quot;c&quot; if ($opt{'mode'} =~ /^cgi$/i || $ENV{&quot;SCRIPT_FILENAME&quot;});
+    $opt{'mode'} = &quot;n&quot; if ($opt{'mode'} =~ /^nph-cgi$/i || $ENV{&quot;SCRIPT_SRC_PATH_FILE&quot;} =~ /^nph/i);
     # And check for it based on PROGNAME
     $opt{'mode'} = &quot;n&quot; if ($PROGNAME =~ /^nph-/i);
 
-  }
-
   usage(\%opt,&quot;Unsupported mode: $opt{'mode'}&quot;) unless ($opt{'mode'} =~ /^[fcn]$/);
   if ($opt{'mode'} ne &quot;f&quot;) {
     CGI::Carp-&gt;import('fatalsToBrowser');	# Output HTML for errors
     $opt{'convert-entity'} = 1;
     $opt{'preprocess'} = 1;
+    @files = $ENV{&quot;SCRIPT_FILENAME&quot;} if $opt{'mode'} ne &quot;f&quot;;
     if ($CGI_NEEDS_ALLOWED_FILE_EXT) {
       foreach my $file (@files) {
         usage(\%opt,&quot;File `$file' is not allowed to be interpreted by ePerl (wrong extension!)&quot;,1)
&lt;/pre&gt;

&lt;p&gt;Just remove that test! Try some other defaults as well, it seems to work, but should be more extensively tested.&lt;/p&gt;

&lt;h2&gt;Not loosing &lt;abbr title=&quot;Standard input&quot;&gt;STDIN&lt;/abbr&gt;&lt;/h2&gt;
&lt;p&gt;Actually there is an option for this: &lt;code&gt;--tmpfile&lt;/code&gt;. Which leads us to the second point:&lt;/p&gt;

&lt;h2&gt;Multiple shebang options&lt;/h2&gt;
&lt;p&gt;After adding &lt;code&gt;--tmpfile&lt;/code&gt; to the shebang already containing &lt;code&gt;--mode=CGI&lt;/code&gt;, I got a true 500 error from the server, and a usage report in the logs. It looks like arguments are not splitted on whitespace in the shebang. Seems to be &lt;a href=&quot;http://unix.derkeiler.com/Mailing-Lists/FreeBSD/arch/2005-02/0039.html&quot;&gt;a known bug&lt;/a&gt; (or even feature?). So we have to split the arguments directly in the script. Here is the diff:&lt;/p&gt;

&lt;pre&gt;@@ -169,6 +172,7 @@
     $opt{'mode'} = basename($ENV{'PATH_TRANSLATED'}) =~ /^nph-/ ? &quot;n&quot; : &quot;c&quot;;
   } else {
 
+    @ARGV = split(&quot; &quot;, @ARGV[0]) if $#ARGV == 1; # weird behaviour in shebang: all args passed as only one, so split them here!
     while ($#ARGV&gt;=0) {
       my $arg=shift(@ARGV);
       if ($arg =~ /^-(h|-help)$/) { usage(\%opt); }
&lt;/pre&gt;

&lt;h2&gt;Returning 500 code upon error&lt;/h2&gt;
&lt;p&gt;As mentionned previously, the script returns a &lt;code&gt;200 OK&lt;/code&gt; when an error happens in the script. This is annoying, because the page ends up indexed in Google (&lt;a href=&quot;http://www.google.com/search?q=&amp;quot;ePerl%3A+ERROR&amp;quot;&quot;&gt;sample query&lt;/a&gt;. This can be a security issue, as the information disclosed could be used by an attacker to find a flaw in the code! (Actually it shouldn't create any flaw, it just discloses information to make it slightly easier to discover an existing flaw.) The HTTP spec defines the &lt;code&gt;500 Internal Server Error&lt;/code&gt; code to employ in this case of a server error. Hopefully it is an easy change:&lt;/p&gt;

&lt;pre&gt;@@ -1139,12 +1143,15 @@
 
   if ($opt_H-&gt;{'mode'} eq &quot;n&quot;) {
     my $proto = $ENV{'SERVER_PROTOCOL'} || &quot;HTTP/1.0&quot;;
-    print SEND_OUT &quot;$proto 200 OK\n&quot;;
+    print SEND_OUT &quot;$proto 500 Internal Server Error\n&quot;;
     my $server = $ENV{'SERVER_SOFTWARE'} || &quot;unknown-server/0.0&quot;;
     print SEND_OUT &quot;Server: $server ePerl/$VERSION Perl/$]\n&quot;;
     print SEND_OUT &quot;Date: &quot;.localtime(time).&quot;\n&quot;;
     print SEND_OUT &quot;Connection: close\n&quot;;
   }
+  else {
+    print SEND_OUT &quot;Status: 500\n&quot;;
+  }
   print SEND_OUT &amp;lt;&amp;lt;HTML_START;
 Content-Type: text/html

&lt;/pre&gt;

&lt;p&gt;Note that the 500 code is now returned in both &lt;a href=&quot;http://en.wikipedia.org/wiki/Common_Gateway_Interface&quot;&gt;CGI&lt;/a&gt; and &lt;abbr title=&quot;Non-parsed headers&quot;&gt;NPH&lt;/abbr&gt;-CGI modes.&lt;/p&gt;

&lt;h2&gt;Custom error page&lt;/h2&gt;
&lt;p&gt;The last step was to customize the error page. Right, the page provides useful information, but it is especially ugly!&lt;/p&gt;
&lt;p&gt;I introduced a new &lt;code&gt;--errorscript&lt;/code&gt; argument. The script in argument will be called and its output displayed in place of the built-in error message. It involves several changes:&lt;/p&gt;

&lt;pre&gt;@@ -129,6 +129,9 @@
                             (If script needs to read stdin, like a post .cgi)
   -1, --eval                Run in a single process using `eval'
                             (default for MSWin32 as can't fork)
+  -S, --errorscript=PATH    use a custom error script rather than the buit-in default
+                            message. Warning: do not use this argument in the error
+                            page itself, or you could end up with an infinite loop!
   --                        Following options are args to the ePerl script
 
 );
@@ -196,6 +200,7 @@
       if ($arg =~ /^-(s|-strict)$/) { $opt{'strict'}=1; next; }
       if ($arg =~ /^-(t|-tmpfile)$/) { $opt{'tmpfile'}=1; next; }
       if ($arg =~ /^-(1|-eval)$/) { $opt{'eval'}=1; next; }
+      if ($arg =~ /^-(S|-errorscript=)(.+)?$/) { $opt{'errorscript'}=arg(); next; }
    
       if ($arg =~ /^-(r|-readme)$/) { readme(); exit(0); }
       if ($arg =~ /^-(l|-license)$/) { license(); exit(0); }
@@ -1145,12 +1190,23 @@
     print SEND_OUT &quot;Date: &quot;.localtime(time).&quot;\n&quot;;
     print SEND_OUT &quot;Connection: close\n&quot;;
   }
+  else {
+    print SEND_OUT &quot;Status: 500\n&quot;;
+  }
+  if ($opt_H-&gt;{'errorscript'}) {
+    $ENV{'REDIRECT_STATUS'} = 500;
+    $ENV{'REDIRECT_ERROR_NOTES'} = &quot;&amp;lt;pre&gt;$error&amp;lt;/pre&gt;&quot;;
+    $ENV{'REDIRECT_ERROR_NOTES'} = &quot;&amp;lt;pre&gt;$error: @err&amp;lt;/pre&gt;&quot; if @err;
+    my $errorscript = $opt_H-&gt;{'errorscript'};
+    print SEND_OUT `$errorscript`;
+  }
+  else {
   print SEND_OUT &amp;lt;&amp;lt;HTML_START;
 Content-Type: text/html
 
@@ -1186,6 +1242,7 @@
 &amp;lt;/body&gt;
 &amp;lt;/html&gt;
 HTML_END
+  }
 }
 
 sub readme {

&lt;/pre&gt;

&lt;p&gt;Warning: do not define &lt;code&gt;--errorscript&lt;/code&gt; to point to the script itself. If something goes wrong, you would end up in an infinite loop, I haven't included code to check that condition!&lt;/p&gt;

&lt;h2&gt;Custom error page again&lt;/h2&gt;
&lt;p&gt;Ok, let's try what happens when we &lt;code&gt;die&lt;/code&gt; in the ePerl script… well, we see the output stop at the point we died, but no error message is displayed! This seems actually linked to the way the &lt;code&gt;--tmpfile&lt;/code&gt; argument is processed: error output is not captured. I added a dirty hack to do that, with another temp file to hold STDERR.&lt;/p&gt;

&lt;pre&gt;@@ -503,17 +509,24 @@
 
 # Write to a tmpfile, execute that
 my $TMPFILE;
+my $TMPERRFILE;
 sub start_perl_tmpfile {
   my ($opt_H) = @_;
 
   my $file = &quot;$TMPDIR/$PROGNAME.$$&quot;;
   usage($opt_H,&quot;Tmpfile already exists?? [$file]&quot;,1) if (-f $file);
+  my $errfile = &quot;$TMPDIR/$PROGNAME.err.$$&quot;;
+  usage($opt_H,&quot;Tmperrfile already exists?? [$errfile]&quot;,1) if (-f $errfile);
 
   my $save = umask 077;		# Some added safety
   $opt_H-&gt;{'ph'} = new IO::File;
+  $opt_H-&gt;{'pherr'} = new IO::File;
   usage($opt_H,&quot;Couldn't create tmpfile [$file]&quot;,1)
     unless $opt_H-&gt;{'ph'}-&gt;open(&quot;&gt;$file&quot;);
+  usage($opt_H,&quot;Couldn't create tmperrfile [$errfile]&quot;,1)
+    unless $opt_H-&gt;{'pherr'}-&gt;open(&quot;&gt;$errfile&quot;);
   $TMPFILE = $file;
+  $TMPERRFILE = $errfile;
   umask $save;
   $SIG{'INT'}='interrupt';
   $SIG{'TERM'}='interrupt';
@@ -521,7 +534,10 @@
   $SIG{'SUSP'}='interrupt';
   $SIG{'QUIT'}='interrupt';
 }
-sub clean_tmpfile { unlink $TMPFILE if $TMPFILE &amp;amp;&amp;amp; -f $TMPFILE; }
+sub clean_tmpfile {
+  unlink $TMPFILE if $TMPFILE &amp;amp;&amp;amp; -f $TMPFILE;
+  unlink $TMPERRFILE if $TMPERRFILE &amp;amp;&amp;amp; -f $TMPERRFILE;
+}
 sub interrupt { print STDERR &quot;[$PROGNAME] **INTERRUPT**&quot;; clean_tmpfile(); exit; }
 
 # Just open a normal pipe to a perl process, redirect STDOUT
@@ -797,9 +813,38 @@
     # Dangerous race condition here!
     usage($opt_H,&quot;Tmpfile disappeared?? [$TMPFILE]&quot;,1)
       unless $TMPFILE &amp;amp;&amp;amp; -r $TMPFILE;
-    system(&quot;$opt_H-&gt;{'perl'} $opt_H-&gt;{'perl_opts'} $TMPFILE @ARGV&quot;);
+    my $output = `$opt_H-&gt;{'perl'} $opt_H-&gt;{'perl_opts'} $TMPFILE @ARGV  2&gt;$TMPERRFILE`;
     $ret = $?;
+    $opt_H-&gt;{'pherr'}-&gt;close;
+    my $exit = $ret &gt;&gt; 8;
+    my $int  = $ret &amp;amp; 127;
+    my $core = $ret &amp;amp; 128;
+    $exit|=0xffffff00 if $exit&gt;&gt;7;
+    $exit = sprintf(&quot;%d&quot;,$exit);
+	if ($exit || $int || $core) { # Ok, there was an error!
+		# read-open the error file
+		my $errfile = &quot;$TMPDIR/$PROGNAME.err.$$&quot;;
+			usage($opt_H,&quot;Tmperrfile already removed?? [$errfile]&quot;,1) unless (-f $errfile);
+		$opt_H-&gt;{'pherr'} = new IO::File;
+		$opt_H-&gt;{'pherr'}-&gt;open(&quot;&amp;lt;$errfile&quot;);
+		my $error = &quot;&quot;;
+		$error .= &quot;[$PROGNAME] Interpretor returned error [$exit]\n&quot; if ($exit);
+		$error .= &quot;[$PROGNAME] **INTERRUPT**\n&quot; if $int;
+		$error .= &quot;[$PROGNAME] (Core dump)\n&quot; if $core;
+		$error .= &quot;$opt_H-&gt;{'start_file'} syntax OK\n&quot; if ($opt_H-&gt;{'syntax_check'} &amp;amp;&amp;amp; !$ret);
+		if ($error &amp;amp;&amp;amp; $opt_H-&gt;{'mode'} eq &quot;f&quot;) {
+			print STDERR $error;
+		} elsif ($error) {
+			redirect_output($opt_H);
+			chomp $error;
+			html_error($opt_H,$error,$opt_H-&gt;{'pherr'}-&gt;getlines);
+			$opt_H-&gt;{'pherr'}-&gt;close;
+		}
+	} else {
+		print $output;
+	}
     clean_tmpfile();
+    exit($exit); # Exit here directly
 
   } elsif ($opt_H-&gt;{'eval'}) {
     # eval method
&lt;/pre&gt;

&lt;h2&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;This new ePerl works (it's already used to generate the pages you see!) I think I've fixed the main issues I had, but I'm not fully satisfied with it. There is too much useless code (for my needs) and dirty hacks. I might come with an ePerlLite in some future, but for now it will do.&lt;/p&gt;
&lt;p&gt;You can &lt;a href=&quot;/files/eperl/eperl.diff&quot;&gt;download the diff&lt;/a&gt; if you're interrested. Just patch the original &lt;code class=&quot;file&quot;&gt;eperl.pl&lt;/code&gt; with &lt;code&gt;patch eperl.pl eperl.diff&lt;/code&gt; and that's it, or &lt;a href=&quot;/files/eperl/eperl.pl&quot;&gt;get the script&lt;/a&gt; directly (this one is additionally converted to UTF-8).&lt;/p&gt;</content>
		</entry>

	<entry xml:lang="en" xml:base="https://xavier.robin.name/en/">
		<title type="html">Tags cloud</title>
		
			<category term="my_site" label="My website" scheme="https://xavier.robin.name/en/tag/my_site" />
		
			<category term="programming" label="Programming" scheme="https://xavier.robin.name/en/tag/programming" />
		
		<link href="https://xavier.robin.name/en/blog/2010/06/13/tags-cloud"/>
		<id>tag:xavier.robin.name,2010-06-13:/blog/2010/06/13/tags-cloud</id>
		<published>2010-06-13T19:34:58+02:00</published>
		<updated>2010-06-13T19:34:58+02:00</updated>
		<content type="html">&lt;p&gt;There are several &lt;a href=&quot;http://www.cpan.org/&quot;&gt;&lt;abbr title=&quot;Comprehensive Perl Archive Network&quot;&gt;CPAN&lt;/abbr&gt;&lt;/a&gt; modules to generate tag clouds. To cite only 3 of them:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;http://search.cpan.org/perldoc?HTML::TagCloud&quot;&gt;HTML::TagCloud&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;http://search.cpan.org/perldoc?HTML::TagClouder&quot;&gt;HTML::TagClouder&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;http://search.cpan.org/perldoc?Data::CloudWeights&quot;&gt;Data::CloudWeights&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I just added tags cloud on this site (&lt;a href=&quot;/&quot;&gt;look the new home page!&lt;/a&gt;) with &lt;code&gt;Data::CloudWeights&lt;/code&gt;. &lt;code&gt;HTML::TagCloud&lt;/code&gt; generate ugly HTML and CSS that cannot be modified. &lt;code&gt;HTML::TagClouder&lt;/code&gt; is marked as &lt;q&gt;*WARNING* Alpha software! I mean it!&lt;/q&gt; Not for me, thanks!&lt;/p&gt;

&lt;p&gt;&lt;code&gt;Data::CloudWeights&lt;/code&gt; is far the most flexible. It generates an arrayref of hashes from which you can pick size, occurences, colors and so on. I use the following subroutine to generate the cloud with a very simple and standard code:&lt;/p&gt;

&lt;pre&gt;use Data::CloudWeights;
sub tag_cloud {
	my @tags = @_;
	my $cloud = Data::CloudWeights-&gt;new;
	for my $tag (@tags) {
		# The followin line is a bit more complicated than shown here
		# $tag is a &lt;a href=&quot;http://search.cpan.org/perldoc?DBIx::Class&quot;&gt;DBIx::Class&lt;/a&gt; entry with more columns than displayed here
		$cloud-&gt;add($tag-&gt;tag, scalar  $tag-&gt;links, &quot;/tag/&quot; . $tag-&gt;tag);
	}
	my $cloud_html = &quot;&quot;;
	foreach my $tag (@{$cloud-&gt;formation}) {
		$cloud_html .= '&amp;lt;a href=&quot;' . $tag-&gt;{'value'} .
			'&quot; style=&quot;font-size: ' . $tag-&gt;{'size'} . 'em&quot;&amp;gt;';
		$cloud_html .= $tag-&gt;{'tag'};
		$cloud_html .= '&amp;lt;/a&amp;gt;';
	}
	return $cloud_html;
}&lt;/pre&gt;</content>
		</entry>

	<entry xml:lang="en" xml:base="https://xavier.robin.name/en/">
		<title type="html">ePerl</title>
		
			<category term="my_site" label="My website" scheme="https://xavier.robin.name/en/tag/my_site" />
		
			<category term="programming" label="Programming" scheme="https://xavier.robin.name/en/tag/programming" />
		
		<link href="https://xavier.robin.name/en/blog/2010/06/13/eperl"/>
		<id>tag:xavier.robin.name,2010-06-13:/blog/2010/06/13/eperl</id>
		<published>2010-06-13T18:32:35+02:00</published>
		<updated>2010-06-13T18:32:35+02:00</updated>
		<content type="html">&lt;p&gt;There are two versions of ePerl:&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;&lt;a href=&quot;http://www.ossp.org/pkg/tool/eperl/&quot; hreflang=&quot;en&quot;&gt;The original version of Ralf S. Engelschall&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;http://marginalhacks.com/Hacks/ePerl/&quot; hreflang=&quot;en&quot;&gt;A forked version by David Ljung Madison&lt;/a&gt;&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;The former is a &lt;a href=&quot;http://en.wikipedia.org/wiki/C_%28programming_language%29&quot; title=&quot;C programming language on Wikipedia&quot;&gt;C&lt;/a&gt; wrapper around a perl module. The latter is written entirely in perl.&lt;/p&gt;
&lt;p&gt;I was using Ralf S. Engelschall's version until now, because it is in Debian repositories. However, I wanted to return an &lt;a href=&quot;http://en.wikipedia.org/wiki/List_of_HTTP_status_codes#5xx_Server_Error&quot; title=&quot;List of HTTP status codes on Wikipedia&quot;&gt;HTTP 500 code&lt;/a&gt; upon error, so that Google would't index errors (as it currently does). It is hard-coded in the C code. Rather than recompiling, I tried the newer version from MarginalHacks. The error code is hard-coded as well, but it is in Perl, so easier to rewrite and to link with my standard error code.&lt;/p&gt;

&lt;p&gt;I had to modify my shebangs (&lt;code&gt;#!/usr/bin/eperl.pl --mode=CGI&lt;/code&gt; is fine), and also to correct an error in &lt;code class=&quot;file&quot;&gt;eperl.pl&lt;/code&gt; by adding &lt;/p&gt;
&lt;pre&gt;@files = $ENV{&quot;SCRIPT_FILENAME&quot;};&lt;/pre&gt;
&lt;p&gt;inside the&lt;/p&gt;&lt;pre&gt;if ($opt{'mode'} ne &quot;f&quot;) {&lt;/pre&gt;
&lt;p&gt;block (that is, between lines 220 and 224), because we don't want to use the HTTP path passed as argument in CGI mode, but rather the script file name.&lt;/p&gt;

&lt;p&gt;Unfortunately, I quickly noted that this new ePerl looses &lt;a href=&quot;http://en.wikipedia.org/wiki/Standard_streams#Standard_input_.28stdin.29&quot; title=&quot;Standard streams on Wikipedia&quot;&gt;STDIN&lt;/a&gt; data. That means, no &lt;a href=&quot;http://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol#Request_methods&quot; title=&quot;Hypertext Transfer Protocol on Wikipedia&quot;&gt;POST&lt;/a&gt; data available in the script: and I use that for comments and contact. There is a &lt;code&gt;-t&lt;/code&gt; argument, but it had no effect for me. I'll stick to the old ePerl for the moment, and try to make sure my code runs fine, but I'll need to have a closer look into what happens exactly with input sooner or later.&lt;/p&gt;</content>
		</entry>


</feed>