Friday, February 08, 2013

Display web page data in Conky

A friend, who writes a small blog, asked me for some help taking the page view data from StatCounter for his blog and displaying it in Conky.

I haven't seen much on this topic other than this CrunchBang post.

Anyway, I decided to try my own approach and this is what's described here.
It's fairly tricky and involved so pay attention, please.

1. Get the data into a Conky-readable format (basically a text-file).

A LibreOffice or OpenOffice spreadsheet has a Link to External Data command which can be used for this purpose.
Open a new spreadsheet file, place cursor on cell A1 (of course, it can be anywhere, but let's assume you want it to look tidy) and open Insert > Link to External Data.

Now, in the External Data dialog that opens, place the url of your webpage in the url box (well, where else would you put it?)
In the same dialog, check the 'Update every' box in the bottom right corner and select how often you want to update it (I chose the default 60 seconds).
I found this particular dialog just a little temperamental. So, you have to place the mouse cursor at the end of the url in the url box and hit Return.
Then wait for the Import Options dialog to open. I selected the default Automatic option and hit Return again.
Now, the External Data dialog should reappear with entries in the 'Available tables/ranges' box.
In my case, as I want the information displayed in the tables, I selected 'HTML_tables'.
Hit OK and the information you want should show up in the spreadsheet.

Save this file with an appropriate name in the .ods format (mine was called stat.ods and I saved it to my ~ directory).

Now, Conky cannot read the .ods file.
So, we save the same file as a .csv file (comma-separated-values)
File > Save As > File Type > .csv

2. Create script to generate .csv file at selected frequencies to display current blog hit count

For this script, I made extensive use of the xdotool function which is very powerful with an enormous 'vocabulary'.
Essentially, it allows a script to mimic a series of keystrokes (and mouse movements too but I haven't used these).

Here's the bash script I made to update the .csv file I needed every two minutes (you can choose whatever you want):

#!/bin/bash 
##Open the file stat.ods in LibreOffice
          localc --minimized /home/paul/stat.ods "$@" &
## The 'sleep' commands are to ensure the keystrokes have enough time to  
## activate before the next command is called 
sleep 3
## Because the file, when it opens, asks if you want to update the live links 
## (links to webpage). You need to hit Return three times for this 
xdotool key Return Return Return 
sleep 3
## The following series of keystrokes save the .ods file as a .csv file of the same name 
## and in the same directory 
xdotool key alt+f 
sleep 2 
xdotool key alt+a 
sleep 2 
xdotool key Right 
sleep 2 
xdotool type ".csv" 
sleep 2 
xdotool key Return 
sleep 5 
xdotool key alt+s alt+y 
sleep 2 
xdotool key Return 
sleep 2
## Start an endless loop to periodically save the stat.ods file as stat.csv 
## Remember that stat.ods updates from the webpage because of the External Data Link 
while true; do
## For the scripted keystrokes to enter ONLY the stat.csv file, you need to 'steal' focus 
## from whatever other window you might have open at the time. 
## First, we need to identify what window is currently focussed so we can refocus it 
## when the updated stat.csv has been saved 
window_id=$(xdotool getwindowfocus)
## Now focus the stat.csv window to receive the keystrokes 
xdotool search --name stat.csv windowactivate 

## Open the File and Save it 
xdotool key alt+f ctrl+s 
xdotool key Escape 
sleep 0.25
## Return focus to the window from which it was stolen at the start of this loop 
xdotool windowactivate $window_id
## Shade the stat.csv spreadsheet to 'get it out of the way' until needed again
wmctrl -r "stat.csv" -b toggle,shaded
## Repeat this loop every 120 seconds 
sleep 120
         ## go back to the start of the 'while' loop 
done

Save the script I called mine losave (an abbreviation for LibreOffice Save) and make it executable. Although not entirely necessary, I also copied my losave file to /usr/bin/ so it can be used from anywhere.
Now, assuming you have LibreOffice Calc installed, launching losave (either from a terminal or by typing Alt-F2 and entering losave in the box) should open stat.ods in LibreOffice Calc, activate the link to your selected webpage and save the file as stat.csv at whatever frequency you have selected.


3. Create similar script to display location of last hit

Not surprisingly, what's needed here is almost exactly the same as in the previous script.
The major difference is that a different .ods file is required.
StaCounter displays hit location in the Recent Pageload Activity page available for each blog in your account.
I called this spreadsheet loc.ods (loc=Location).
The script is shown below without any explanatory notes as there is no significant difference from the earlier script.


#!/bin/bashlocalc --minimized /home/paul/loc.ods "$@" & 
sleep 2 
xdotool key Return Return Return 
sleep 2 
xdotool key alt+f 
sleep 2 
xdotool key alt+a 
sleep 2 
xdotool key Right 
sleep 2 
xdotool type ".csv" 
sleep 5 
xdotool key Return 
sleep 5 
xdotool key alt+s alt+y 
sleep 2 
xdotool key Return 
sleep 5 
while true; do 
window_id=$(xdotool getwindowfocus) 
xdotool search --name loc.csv windowactivate
xdotool key alt+f ctrl+s 
xdotool key Escape 
sleep 0.25 
xdotool windowactivate $window_id 
wmctrl -r "loc.csv" -b toggle,shaded 
sleep 115 
done

4. Now, we should have two periodically updated .csv files in our home directory (or wherever you placed them).

Here's some examples of what they look like:

stat.csv

,Today?,Yesterday?,This Month?,Total?,Settings, 
MyBlog1,0,0,0,0,Config,                       
MyBlog2,68,106,652,85371,Config,                       
MyBlog3,0,3,5,1689,Config,                       
MyBlog4,0,0,20,15849,Config,                       
MyBlog5,0,0,0,282,Config,                       
5 projects,68,109,677,103191,,,,,,,,,Real-Time Visitor Stats,,Project Settings,,, 
,User & Public Access,,Email Reports,,,

Here, information for a number of blogs is included, most of which are not active.
The only one of interest is MyBlog2.

loc.csv looks like this in part (I've taken out details of what pages were viewed).


,,,WinXP,,"Arizona,",,,,1920x1080,,United States,  
,,,Win7,,"Washington,",,,,1600x1200,,United States,  
,,,MacOSX,,"Para,",,,,1440x900,,Brazil,  
,,,WinXP,,"Andhra Pradesh,",,,,1024x768,,India,  
,,,Linux,,"Pais Vasco,",,,,1440x900,,Spain,

OK, so everything we need is available in these two files but it's mixed up with an awful lot of other stuff.
So, the challenge facing Conky is to pare out what we don't need and print what we do.


5. Sifting the useful information from the .csv files for display in Conky

We're now trying to convert the jumble in the above .csv files into what appears in this partial screenshot of Conky (running in Arch Linux).








The first and second lines above are simply text inputs.
The third line is of primary interest and displays hits so far today, the total for yesterday, the total for the month so far and, finally, the total for all time.

The algorithm used in Conky is this:

$alignr ${exec cat /home/paul/stat.csv | grep MyBlog2 | awk -F\, '{print $2}'}       ${exec cat /home/paul/stat.csv | grep MyBlog2 | awk -F\, '{print $3}'}      ${exec cat /home/paul/stat.csv | grep MyBlog2 | awk -F\, '{print $4}'}     ${exec cat /home/paul/stat.csv | grep MyBlog2 | awk -F\, '{print $5}'}
This is fairly straightforward (at least, in comparison with the next one (:-).
Basically, each section of the algorithm opens out the stat.csv file and selects only the line containing the blog information of interest (MyBlog2).
The 'awk' command breaks up the line into fields while designating the ',' as the field delimiter.
Subsequently, it prints out the second, third, fourth and fifth fields as the hits appropriate to the various categories.

Great, but how do we separate out from loc.csv the location from where came the last hit.

Here's the algorithm:

Last Hit From: $alignr ${exec cat /home/paul/loc.csv | cut -d "," -f 6,7 | sed 's/"Korea, Republic of"/Korea, Republic of/' | awk '!/"/' |sed 's/\,//g' | sed -n 2p}
Possibly the best way to explain what each section of the algorithm does, is to present how the original loc.csv file changes as it goes through each stage of the algorithm:
First, I'll point out that the country names for each location were "almost" invariably without quotes while all other location data (city, state or whatever) were always within quotes.
The one exception to this rule that I've found so far is South Korea (or Republic of Korea). For some strange reason, Statcounter displays 'Korea, Republic of' within quotes.
This explains the rather strange third section of the algorithm which does no more than substitute the name 'Korea, Republic of' within quotes to one without quotes.
Without this, all hits from South Korea would have shown up as blank spaces.

OK, so here's what we start with:


,,,Win8,,"Champagne-Ardenne,",,,,1600x900,,France, ,,,WinVista,,United Kingdom,,,,1280x1024,,, ,,,Linux,,United Kingdom,,,,1280x800,,, ,,,Linux,,"Texas,",,,,1024x600,,United States,
,,,Win8,,"Mazowieckie,",,,,1920x1080,,Poland,
,,,MacOSX,,"Stockholms Lan,",,,,1440x900,,Sweden,
Once again, I've taken out anything that might be construed as private.

Now, after application of the algorithm section 'cut -d "," -f 6,7', we get this:
Location,Host Name/Web Page/Referring Link
"Vendeuvre-sur-barse,"
"Champagne-Ardenne,"
France,
"Luton,"
United Kingdom,
,
"Wolverhampton,"
United Kingdom,
,
"Dallas,"
"Texas,"
United States,
"Warsaw,"
"Mazowieckie,"
Poland,
"Upplands-v�sby,"
"Stockholms Lan,"
Now, it's starting to look more handleable.
Let's see what the next algorithm section does (leaving out the Korea one as this only applies in rare circumstances).
So after cut -d "," -f 6,7 | sed 's/"Korea, Republic of"/Korea, Republic of/' | awk '!/"/' 
we get:

 Location,Host Name/Web Page/Referring LinkFrance,United Kingdom,,United Kingdom,,United States,Poland,Sweden,
So, now we have, other than the heading, just countries and commas. Getting closer.
Let's see what's next.
cat /home/paul/loc.csv | cut -d "," -f 6,7 | sed 's/"Korea, Republic of"/Korea, Republic of/' | awk '!/"/' |sed 's/\,//g' 

gives us

LocationHost Name/Web Page/Referring LinkFranceUnited Kingdom
United Kingdom
United StatesPolandSweden
Looking better as all of those commas have gone.
Now all we need to do is to pick out the last hit location which is in the second line.

Here's what the whole algorithm (cat /home/paul/loc.csv | cut -d "," -f 6,7 | sed 's/"Korea, Republic of"/Korea, Republic of/' | awk '!/"/' |sed 's/\,//g' | sed -n 2p)  gives us:

France
And that's it.
Any questions?











No comments:

Post a Comment