Home > Linux, Useful commands > PhantomJS – site scraping and pages captures

PhantomJS – site scraping and pages captures

The official site saying that phantomjs is: “a headless WebKit with JavaScript API. It has fast and native support for various web standards: DOM handling, CSS selector, JSON, Canvas, and SVG.”.
This could be use eg. to capture pages that running a lot of javascript, then get the source and parse data. All that in command line and without output to the screen.

Lets set phantomjs on Debian based system.

1. Compile phantomjs from source.

sudo apt-get install libqt4-dev qt4-qmake
wget http://phantomjs.googlecode.com/files/phantomjs-1.4.1-source.tar.gz
tar zxvf phantomjs-1.4.1-source.tar.gz
cd phantomjs-1.4.1
sudo cp bin/phantomjs /usr/bin/phantomjs

2. Prepare X virtual framebuffer (Xvfb) for pages rendering.
Xvfb or X virtual framebuffer is an X11 server that performs all graphical operations in memory, not showing any screen output.

sudo apt-get install xvfb xserver-xorg

3. Create start script for xvfb.


#! /bin/sh

# Provides: Xvfb
# Required-Start: $local_fs $remote_fs
# Required-Stop:
# X-Start-Before:
# Default-Start: 2 3 4 5
# Default-Stop:


set -e

case "$1" in
Xvfb :0 -screen 0 1024x768x24 &
echo "Usage: $N {start|stop|restart|force-reload}" >&2exit 1

exit 0

4. Start xvfb and test phantomjs

/etc/init.d/xvfb start
DISPLAY=:0 phantomjs --version

If you get a version number of phantomjs, everything is ok! Now we can do almost everything with webpages:
API Reference
Quick Start

Categories: Linux, Useful commands
  1. http://www.sleeplessbeastie.eu
    25/02/2012 at 20:27

    Dobrze, że dalej piszesz 🙂

    • 25/02/2012 at 21:22

      Mam teraz troche wiecej wolnego czasu 😉

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: