RSS-feed

Sun, 31 Aug 2008

Para_srf: Convert your SoLiD data to srf, fast.


As you know, I released the para_srf version 0.2 in github some weeks ago. I have made some changes since then and I have put a new version available. This new version doesn't bring huge new changes. Mainly I added some integration tests. Very valuable by the way.

I am going to release a new version soon that will fix a little issue Jingwei from ABi found. The current version splits the input data in smaller chunks. By doing that, we may end up loosing some pairing information since not all the reads have a pair. In order to ensure the srf converter has the information for all the pairs we have to perform the split base on the panels. For example: first split panels from 1 to 20, split 2, panels from 21 to 40 and so on.

posted at: 21:17 | path: /programming | permanent link to this entry

Sat, 09 Aug 2008

Para_srf: Convert your SoLiD data to srf, fast.


I have just created a git repo for my para_srf project. This software paralellizes the SRF conversion of solid data.

The amount of sequence we get out of one ABi sequencer is extremely high. Performing the conversion in a non concurrent way can take a long time. This software parallelize the tasks so the whole process gets done much faster. Currently works only with LSF clusters but adding other alternatives is very simple.

This is the git url:

 git://github.com/drio/para_srf.git 


posted at: 15:09 | path: /programming | permanent link to this entry

Sun, 24 Jun 2007

Rendering unicode


Some of my friends were working in a site and they were using utf-8 to write their html/js. The main page page had a drop-down where you could switch between different languages:

English
Français
Español
Deutsch
日本語
中文
한국어

NOTE: I am assuming that your browser will render this last utf8 characters the proper way. At the time I was writing this, the http server that was sending these content to your browser was forcing this character set:

drio@simba:~/wwwroot $ curl -I http://blog.is04607.com
HTTP/1.1 302 Found
Date: Sun, 24 Jun 2007 18:45:10 GMT
Server: Apache/1.3.37 (Unix) mod_perl/1.29 PHP/4.3.11 mod_gzip/1.3.26.1a
Location: http://www.is04607.com/blog/blosxom.cgi
Content-Type: text/html; charset=iso-8859-1

I have to shoot an email to the sysadmin where I am hosting this so he can force utf8 on my virtual host.

That was exactly the same problem my friends had. Just by telling apache to use utf8 ( or at least not to force iso-8859-1) things get fixed.

By the way, do you know how many bits does utf8 uses to encode the Japanese characters? 32 bits, 4 bytes:

drio@simba:~/wwwroot $ cat test5.html 
----日-----
drio@simba:~/wwwroot $ hexdump test5.html 
0000000 2d2d 2d2d 97e6 2da5 2d2d 2d2d 000a     
000000d

Yes 日 is 0x97e62da5.

I found this document highly useful to understand what unicode is. I think it has become a classical already.

posted at: 13:11 | path: /programming | permanent link to this entry