Look at your html page as xml data for the sake of SEO

I’ve been working on a script that goes to a URL and scraps some parts of data, which is pretty much a crawler or spider.

If all pages that the crawler landed were valid, my job would have been so easy. However, in reality many many pages are not valid and the script has to use regular expression.

This can be a good or bad thing for those web owners.

However, exposure is very necessary in terms of marketing for the site and valid html page means it has greater chance to get exposed by search engines such as google.com because valid html page will provide what search engine crawler wants more efficiently.

I believe engineers who work on those crawler have overcome many difficulties due to the invalid markup on a page. However, if HTML in a page is not valid (treating it as a xml), those smart engineers would have to come up with a logic to overcome that by using regular expression perhaps. That could be prone to mistakes so lead to scrapping only few from invalid HTML in a page. After all engineers are human and human make mistakes.

Also just for the same reason, if well formed semantic HTML is used, it will have higher chance to get exposed to a certain keyword typed by users.

That’s just my idea of how html page has to be constructed considering SEO and future use.

So my recommendation is this:

1. Treat markup in a page as data. Forget about presentation and such. Just make sure the data is valid.
2. Use CSS to visualize the data (= HTML markup) to appeal users

It’s quite simple after all.

PHP PSR-[0-3]

At my work place, we use PHP and CodeSniffer hooked up in Jenkins.
I liked the fact that PSR-1 and PSR-2 are approved by the committee.
The PSR0 standard is found at:

The PSR1 standard is found at:

The PSR2 standard is found at:

The PSR3 standard is found at:

p.s. Kohana 3.3 has PSR-0 and the migration from 3.2 to 3.3 is quite tough for me. It requires a lot of refactoring on my side…

phantomjs on centos 6.2

Right now I have jenkins to do automation build and static code analysis for php projects. However, I need to look into client side javascript automation unit test suites and selenium server to do client-side functional test.

This post will be a long history of what I am going through (will include steps with failures…)

1. I downloaded phantomjs-1.7.0-linux-x86_64.tar.bz2 binary file from this page.

2. When I executed it, I got this error:
phantomjs: error while loading shared libraries: libfreetype.so.6: cannot open shared object file: No such file or directory

3. So I installed libfreetype.so.6:
sudo yum install libfreetype.so.6

4. I still get the same phantomjs error, which says libfreetype.so.6 is not found.

5. Looks like the libfreetype.so.6 was installed on /usr/lib, which is for 32 bit software and phantomjs needs one for 64 bit system.

6. So I created symlinks and placed them in /usr/lib64/

7. I get different error message (progress made!):
phantomjs: error while loading shared libraries: libfreetype.so.6: wrong ELF class: ELFCLASS32

8. It turned out that I just needed to install freetype:
yum install freetype

9. Now another dependency issue:
libfontconfig.so.1: cannot open shared object file: No such file or directory

10. “yum install fontconfig”. that installed correct lib files.

11. Got it installed. 🙂

[xxxxx@localhost ~]$ phantomjs –version

UPDATE Dec 6, 2012
Found this blog post for Phantomjs and qunit.

Another very resourceful answers at stackoverflow regarding phantomjs, ant, and jenkins.


Jenkins for php project on centos 6.2

After setting up Jenkins for use in my work place, I had to document how to set up in the past couple days. I thought I would share it here.

OS: CentOS release 6.2 (Final)
[—— ~]$ cat /proc/version
Linux version 2.6.32-220.23.1.el6.x86_64 (mockbuild@c6b5.bsys.dev.centos.org) (gcc version 4.4.6 20110731 (Red Hat 4.4.6-3) (GCC) ) #1 SMP Mon Jun 18 18:58:52 BST 2012
[—— ~]$ uname -a
Linux jenkins.sometrics 2.6.32-220.23.1.el6.x86_64 #1 SMP Mon Jun 18 18:58:52 BST 2012 x86_64 x86_64 x86_64 GNU/Linux

Prerequisites prior to Jenkins

java-1.6.0-openjdk.x86_64 (64bits) - as of Jun 27, 2012
1. sudo yum install java-1.6.0-openjdk(this will pull these dependencies)
- giflib
- jline
- jpackage-utils
- rhino
- tzdata-java
2. actual version is 1:
Apache Ant 1.7.1 - as of Jun 27, 2012
1. sudo yum install ant (this will install these dependencies)
- java-1.5.0-gcj
- java-1.6.0-openjdk-devel
- java_cup
- libgcj
- sinjdoc
- xerces-j2
- xml-commons-api
- xml-commons-resolver
2. actual version is 1.7.1-13.el6

1. sudo rpm -Uvh http://repo.webtatic.com/yum/el6/latest.rpm
2. sudo yum install php54w php54w-cli php54w-common php54-dba php54-devel php54-gd php54-intl php54-mbstring php54-mysql php54-odbc php54-pdo php54-pear php54-process php54-xml (this will install these dependencies)
- autoconf
- automake
- libXpm
- libicu
- unixODBC

1. sudo pear upgrade PEAR
2. sudo pear channel-discover pear.phpunit.de
3. sudo pear channel-discover pear.symfony-project.com
4. sudo pear channel-discover components.ez.no
5. sudo pear channel-discover pear.netpirates.net
6. sudo pear channel-discover pear.pdepend.org
7. sudo pear install --alldeps phpunit/PHPUnit
8. sudo pear install phpunit/DbUnit
9. sudo pear install phpunit/phpcpd
10. sudo yum install ImageMagick-devel (This will install these dependencies)
- bzip2-devel
- ghostscript-devel
- jasper-devel
- lcms-devel
- libICE-devel
- libSM-devel
- libX11-devel
- libXau-devel
- libXdmcp-devel
- libXext-devel
- libXt-devel
- libgomp
- libjpeg-devel
- libtiff-devel
- libwmf-lite
- libxcb-devel
- xorg-x11-proto-devel
- zlib-devel
11. sudo pear install --alldeps pdepend/PHP_Depend
12. sudo pear install theseer/phpDox-0.4.0 (note: this can take a while. Be patient)
13. sudo pear install PHP_CodeSniffer
14. sudo pear install --alldeps phpunit/PHP_CodeBrowser
15. pear channel-discover pear.phpmd.org (it’s not working currently...)
16. sudo pear install phpmd/PHP_PMD (this will fail.... maybe server is down as of now and will be back up tomorrow perhaps? For now skip it.)

To install Jenkins through yum
1. sudo wget -O /etc/yum.repos.d/jenkins.repo http://pkg.jenkins-ci.org/redhat/jenkins.repo
2. sudo rpm --import http://pkg.jenkins-ci.org/redhat/jenkins-ci.org.key
3. sudo yum install jenkins
4. sudo service jenkins start/stop/restart

Jenkins should be accessible through port 8080
Make sure iptable opens the port 8080 from outside

Install Jenkins Plugins
1. go to http://{jenkins.host}:8080
2. Click on “Manage Jenkins” link from side menu
3. Click on “Manage Plugins” link from the page
4. Click on “Available” Tab
5. Check these plugins (Some might be missing because they could be already installed)
- External Monitor Job Type Plugin
- ant
- Static Code Analysis Plug-ins
- Subversion Plugin
- HTML Publisher plugin
- Jenkins Translation Assistance plugin
- Jenkins SSH Slaves plugin
- PMD Plugin
- Jenkins Clover PHP plugin
- Publish Over SSH
- Green Balls
- Jenkins Violations plugin
- Plot plugin
- Measurement Plots
- xUnit plugin
- Hudson SCP publisher plugin
- Checkstyle Plugin
- DRY Plugin
- Jenkins JDepend Plugin
6. Click on “Download now and install after restart”

There gotta be something I missed, but this instruction should be a good start to set up jenkins on centos 6.2 for any PHP development work.

Happy CI'ing! :)

mysqldump tip

mysqldump is a command line tool for exporting mysql data.
This tool also can be used to generate seed xml files for php dbunit.
And this is how you can achieve that.

mysqldump -u {user} -t –xml –databases {dbname} –tables {tablename} > filename.xml

The above command will generate a xml file that PHP DBUnit understands.

And then inside phpunit class file, you would be using it like this:

protected function getDataSet()

$seed = $this->createMySQLXMLDataSet(dirname(__FILE__).’/filename.xml’);
$datesets = array($seed);

$compositeDs = new PHPUnit_Extensions_Database_DataSet_CompositeDataSet($seed);

return $compositeDs;


note to myself on mongoDB procedure

For PHP MongoDB Driver
sudo apt-get updatesudo apt-get install php5-dev php5-cli php-pearsudo pecl install mongo// if apache2cd /etc/php5/apache2sudo vim php.iniadd "extension=mongo.so" without double quotes/wq! and entersudo /etc/init.d/apache2 restart

For MongoDB Server
sudo apt-get adv --keyserver keyserver.ubuntu.com --recv 7F0CEB10sudo vim /etc/apt/sources.listadd "deb http://downloads-distro.mongodb.org/repo/debian-sysvinit dist 10gen" without double quotes/wq! and entersudo apt-get updatesudo apt-get install mongodb-10gen

To launch mongodb commandline

pulling query string in node.js

in web development, pulling data from query string is the basic and fundamental task. In node.js, the syntax is like this:

you can actually test my test app here: