I wanted to write a small script to load a webpage and do some analysis. Ordinarily I would fire up perl and use LWP with quick and dirty regexes but this is a good opportunity to use one of the interesting languages.
Now, I haven’t learnt a new language properly for around 10 years although I have spent a little of the intervening time looking at scheme textbooks! I installed chicken scheme and quack.el to support editing with emacs. After a quick look at the chicken website I decided I needed the http and htmlprag eggs (chicken libraries). Installing them using chicken-setup was very easy although I discovered it required a C compiler when I tried it on a number of different boxes :-/
Well, first things first. For some webpages need you to represent yourself as a bonafide browser or else you get a “403 forbidden” message (The way you do this using perl LWP is something like $browser->agent(‘Mozilla/5.0′)). I couldn’t see an easy way to do this in the chicken http library. Fortunately it is easy enough to modify the .scm.
http:send-request is the function I want to modify. It begins like this:
(define (http:send-request req . more)
(let-optionals more ([in #f]
[out #f] )
(let* ([req (if (string? req)
(http:make-request 'GET req '(("Connection" . "close")))
req) ]
...))))
The (req . more) construction isn’t too complicated. It corresponds more or less to this perl:
my ($req, @more) = @_;
However, I wasn’t at all familiar with (let-optionals …). Fortunately, the chicken scheme documentation is excellent and it looks like it maps to the following pseudo-perl.
my $in = shift @more || #f; my $out = shift @more || #f;
i.e. local parameters are created and assigned to the values of more in turn or have a default value if there are not sufficient values in the @more array. Hmmm… I think that description is much less clear than the one in the chicken documentation.
The next bit makes a http request ‘object’, passing in a list of dotted pairs to use as attributes. As we never override the default values for in and out, we can use ‘more’ to pass in our own attributes. The modified send-request therefore becomes:
(define (http:my-send-request req . more)
(let* ((in #f)
(out #f)
[req (if (string? req)
(http:make-request 'GET req (cons '("Connection" . "close") more))
req)]
...)))
We need to add the new function name to the export list so it is available outside the library:
(declare
(fixnum)
(export
http:send-request
http:my-send-request ;; <-- Here
http:GET http:POST
...))
Then we add a wrapper for my-send-request in the client code to pass in the attributes we want:
(define (send-request-wrapper url)
(http:my-send-request url
'("User-Agent" . "Mozilla/5.0")
'("Content-Type" . "application/x-www-form-urlencoded")))
and finally a wrapper to open a url, return the data as a list of lines and close the ports when we have finished with them:
(define (load-url url)
(define-values (h a i o) (send-request-wrapper url))
(let ((data (read-lines i)))
(close-input-port i)
(close-output-port o)
data))
Whew! That seemed like a fair amount of effort, at least in comparison to Perl/LWP. Hopefully it will get easier as I become more familiar with chicken.
Originally posted: Wednesday, July 05, 2006