RSS-feed

Sun, 24 Jun 2007

Blosxom and utf8


This is related to my previous post. I sent this to support@ today:

From: David 
Subject: my virtual host (is04607.com) is forcing charset=iso-8859-1                                         
To: support@

My virtual host (is04607.com) is configured in such a way that sends this                                    
header to the browsers:                                                                                      
                                                                                                             
Content-Type: text/html; charset=iso-8859-1                                                                  
                                                                                                             
By doing that, browsers cannot render utf8 characters.                                                       
                                                                                                             
Could you please fix it?                                                                                     
                                                                                                             
I just need apache not to force the character set, or if it does, to use                                     
utf8.             

Support replied to me, telling me that the other virtual hosts where not having the same problem (BTW, these guys rock. Thanks Matt!). They even pointed me to an utf8 blosxom plugin. But Matt pointed me in the right direction:

From: Matt
Subject: Re: charset=iso-8859-1 .. found it!!                                                                
To: David 

Its the perl module your blog uses :)   
                                                                                                             
Reading the source code:                                                                                     
$ more /usr/local/lib/perl5/5.6.1/CGI.pm                                                                     
                                                                                                             
The B<-charset> parameter can be used to control the character set                                           
sent to the browser.  If not provided, defaults to ISO-8859-1.  As a                                         
side effect, this sets the charset() method as well.                                                         
                                                                                                             
I did find a plugin for blosxom that forces utf8                                                             
http://www.vrtprj.com/misc/output_utf8                                                                       
                                                                                                             
I dunno if that helps.,      

And finally my answer with the solution:

From drio  Sun Jun 24 19:48:13 2007                                                                          
Subject: Re: charset=iso-8859-1 .. found it!!                                                                
To: Matt
                                                                                                             
I read about that plugin you sent me. I tried it out but it failed because                                   
in required the Encode.pm module.                                                                            
                                                                                                             
I read the code of the plugin and I think that plugin was not going to fix                                   
the problem. My output, my blog entries, they use utf8 characters already.                                   
That plugin was basically encoding to utf8 your blog entries.                                                
                                                                                                             
I read the blosxom code and I found this line:                                                               
                                                                                                             
$header = {-type=>$content_type};                                                                            
                                                                                                             
That was the one in charge of setting the character set in the http headers.                                 
I changed it to:                                                                                             
                                                                                                             
$header = {-charset => 'UTF-8'};                                                                             
                                                                                                             
and ...... success. My browser now renders the utf8 characters properly.                                     
                                                                                                             
Thanks for your help!                                                                                        


posted at: 18:48 | path: /blosxom | permanent link to this entry