Version: 1.1.2




RSS-TAB Version 1.1.2
DOCUMENTATION

© Copyright 2004 António Vasconcelos


  1. About RSS-TAB
  2. Changelog
  3. Command line options
  4. Configuring the rss-tab program
  5. Adding feeds to rss-tab.cfg
  6. Calling rss-tab from cron
  7. Installing the Perl Modules and Libraries
  8. Configuring the PHP Feed Configurator
  9. Speeding up rss-tab. How to use a local copy of rss 0.91 DTD file
  10. Using rss-tab on a web page



ABOUT RSS-TAB

 I wrote rss-tab not because I wanted to do something new, but because I couldn't find a simple rss client script to add some feeds to my home page. It may be a JARC or JARS (Just another RSS Client/Script) but it was fun to write and I'm so happy with it that I decided to go for full documentation, web page and all... Just to see if I could do it.

 RSS-TAB is a simple script, most of the code (12 Kbytes in the current version) is dedicated to handling the command line options. It's simple to use too. Just 3 lines to configure in the program and one config file with a list of feeds.

 The funny thing is that I was able to write it and still know very little about XML. I had to learn a bit about xslt in order to write the stylesheet, and I went trough some pains to find out how to stop libxml2 to got a file from my.netscape.com every time I parsed a RSS 0.91 file, but, all in all, I didn't learn a lot about XML. This was only possible because XML::RSS::Tools, the perl module that RSS-TAB uses is so high level that it hides almost all the XML complexity from the user.



CHANGELOG

 version 1.1.2|vasco(at)all-2-it.com|Mon May 24 10:26:04 WEST 2004

 version 1.1.1|vasco(at)all-2-it.com|Fri May 14 12:03:31 CDT 2004  version 1.1|vasco(at)all-2-it.com|Sat May 8 02:41:41 CDT 2004  version 1.0|vasco(at)all-2-it.com|Wed May 5 13:40:21 CDT 2004


CLI OPTIONS

RSS-TAB Version 1.1.2
COMMANDS
--feed[=]FEED_INDEX | FEED_NAME | all

Returns (to stdout) the feed identified by FEED_INDEX or FEED_NAME. If the keyword "all" is used ALL the feeds identified in the config file are returned. This option try to use the cached rss files, but if the a file does not exists it is retrieved from the net just like if --refresh was used.

--refresh[=]FEED_INDEX | FEED_NAME | all

Fetch a new copy of the feed identified by FEED_INDEX or FEED_NAME from the net, updates the rss file in the cache and then parses and returns the feed to stdout. If the keyword "all" is used, all the feeds in the config file are processed. If a file does not exist it is created. This could cause some problems if running as root with an umask that clears the rw permission bits for the user that the web server is running with (nobody ?). It could be a good idea to run a chown nobody:nobody /my/cache/dir/* after using it from a shell running as root.

--refreshonly[=]FEED_INDEX | FEED_NAME | all

It works just like --refresh, but it doesn't produce any output. This is the option that should be used when refreshing the cache files from cron. The same caveats about non existent files applies, personally if I'm using a cron job to refresh the files I run it as user nobody (the same user of my web server).

--list

This command doesn't returns any feed, it parses the config file and return a list with FEED_INDEX, FEED_NAME and FEED_URI (the address where the feed is available).

OPTIONS
--order[=]inc | dec

This controls the sort order used when processing an "all" option in --feed, --refresh or --refreshonly and in the --list command. If --order=inc it sorts by FEED_INDEX from lower to higher, if --order=dec it does the opposite.

--filedate

This is just a quick hack to get the last time that a cached feed file was updated. It appends the text <font color=green>File Time: AAAA.MM.DD HH:MM:SS<font> after the output of the xsl stylesheet. There is no config option for this text, to change it look inside function get_feed().

--xsl=XSL_STYLESHEET_PATHNAME

By default rss-tab uses a xsl stylesheet named rss-tab.xsl. This option is used to select another one. It affects ALL the feeds produced.


SETTING UP RSS-TAB

INSTALL

 There is no automated install script (yet) installation should be easy for anyone that knows how to use a shell.
 I guess that most of the problems will be permissions related.
 You must make sure that the user that the web server is running with have read access to the rss-tab.cfg file (defined by $RSS_FEED_LIST) and to the XSL stylesheet (defined in $XSLSS).
 And it must have read/write access to the directory and files defined by $RSS_CACHE_DIR.

CONFIG

 Configuring rss-tab is easy, you just have to edit the perl program in rss-tab.pl and set the value of 3 global variables.

 By the name it should be obvious what each one does, RSS_CACHE_DIR is the directory where to store the cached rss files, RSS_FEED_LIST is the pathname of the config file rss-tab.cfg and XLSSS is the pathname of the default XSL stylesheet. By default all the variables contain the string "config me", this is just to let rss-tab tell you that you forget to configure it.

 An example:

ADDING FEEDS


CALLING RSS-TAB FROM CRON


INSTALLING THE PERL MODULES AND LIBRARIES

 This was no simple task, I'm sorry but I'm still thinking about how to tell this story.


THE PHP FEED CONFIGURATOR

 The FEED CONFIGURATOR is not part of the rss-tab program it's a PHP script I wrote in order to let me setup a user-side configuration so that the user can choose which feeds he want to see when he enters a page that it's using rss-tab feeds. The demo page that you can download from here have PHP code to process this configuration cookie.
 In order to use the CONFIGURATOR, you'll have to setup a file with a PHP array where you define a Title for every feed.

 It looks like this (nb the file is included with the demo page):


SPEEDING UP RSS-TAB - HOW TO USE A LOCAL DTD FILE FOR RSS 0.91

 When I moved the development to my server at home I noticed that some of the feeds took much longer to show up than they did in the server at work. I was very puzzled by that fact, my server at home is not THAT slow, and the time it took was not the same every time.
 The major difference between the server at work and the server at home is the internet pipe, I have 2 Mbits at work, and only a 512K/128K ADSL link at home, so I checked with tcpdump if something was being pulled out from the net when I did a --feed call to rss-tab.
 I was very surprised to find out that YES, something was accessing the net every time rss-tab parsed some rss files.
 Tcpdump was telling me that my machine was accessing port 80 in myns-v2.websys.aol.com, I checked the dns and at last traced it to my.netscape.com, the file that was being pulled again and again was /publish/formats/rss-0.91.dtd.
 It took me about a day to google the answer, yes, this is a feature of libxml2, and most of the XML gurus are saying that's a bad thing, to prove it, it looks that some time ago, when Netscape remaked the my.netscape.com site those .dtd files were removed. And stay removed for a few days, as result a lot of XML applications, specially XML Validatores went down as they couldn't download this file.
 I'm not a XML guru, but I too, don't like the idea of depending of an on-line resource for some very simple task. It's plain stupid that libxml2 can download a file from the net but can't keep a local cache (like any web browser does) that would be used next time the file is needed.

 A bit more digging and I found out that I could tell libxml2 to use a local file instead of downloading it from the net. As I suspected, the answer was in the XML catalogs, a few days ago I added the use of the default XML catalog on /etc/xml/catalog to rss-tab, but just because it looked a good idea, at the time I had no idea that this could be used to keep libxml2 happy without getting stuff from the net.

 Anyway, the default /etc/xml/catalog needs an additional line to make the parsing of version 0.91 feeds a local operation.

  1. Download the DTD file, you can download it from here of from the original Netscape location.
  2. Copy the file to some place safe like /usr/share/xml or /etc/xml. I put it in /etc/xml/rss-0_91.dtd.
  3. Edit /etc/xml/catalog and add the line:

    <rewriteURI uriStartString="http://my.netscape.com/publish/formats/rss-0.91.dtd" rewritePrefix="file:///etc/xml/rss-0_91.dtd"/>

 That should do the trick, as my.netscape.com is not the fastest web site around, this should speed up rss-tab a lot.


USING RSS-TAB ON A WEBPAGE

 Rss-tab is not server dependent, and can be used from any kind of method that let's you execute a script.
 The idea is that you setup some kind of HTML container (like a table cell) and call rss-tab in a way that it's output "fills" the container.

 At this time, I'm only working with PHP, so, I have no way to test the rss-tab with other methods, SSI (server side includes) should work, I guess that mod_perl should work too. I have no idea if ASP can call a server side script, but if it can, than rss-tab should work.
 Version 1.0 could be called from an HTML form or from a link running as a cgi-script (it returned the "Content-type: text/html" header), but I removed that "feature", I don't think that there is much use in having a page only with the feeds. Anyway, it would be simple the make a wrapper script that calls rss-tab after retrieving the parameters from POST or GET.

 The default XSL stylesheet outputs the feed inside a <div> so you shouldn't have any problem locating it where you want.



THE END



António Vasconcelos <vasco(at)all-2-it.com>
São Domingos de Rana.
Last Change: Fri May 28 10:47:50 WEST 2004