I know that last time I said we would look at passing data to other pages and data sorting but instead I thought I would make a brief diversion into sqlite. Playing around with table data such as this:
a | b | c |
1 | 2 | 3 |
4 | 5 | 6 |
7 | 8 | 9 |
which is rendered by:
(print-ln (render-table (make-table '("a" "b" "c") '(("1" "2" "3") ("4" "5" "6") ("7" "8" "9"))) '()))
is pretty uninspiring. Instead, I can use some financial data that is available from yahoo. Data for the first 50 (or so) NASDAQ stocks is available here.
(require (lib "url.ss" "net")) (define (print-ln . args) (for-each display args) (newline)) (define *url-prefix* "http://download.finance.yahoo.com/d/quotes.csv?") (define *url* (string-append *url-prefix* "s=@%5EIXIC&f=sl1d1t1c1ohgv&e=.csv&h=50")) (define (display-url url) (let ((p (get-pure-port (string->url url)))) (let loop ((line (read-line p))) (unless (eof-object? line) (print-ln line) (loop (read-line p)))) (close-input-port p))) (display-url *url*)
Running this script gives the rather strange error message Missing Format Variable.
(url->string (string->url *url*))
reveals the reason. string->url corrupts the url! It should be this:
http://download.finance.yahoo.com/d/quotes.csv?s=@%5EIXIC&f=sl1d1t1c1ohgv&e=.csv&h=50
But instead it is this (ampersands converted to semi-colons amongst aother things):
http://download.finance.yahoo.com/d/quotes.csv?s=%40%5EIXIC;f=sl1d1t1c1ohgv;e=.csv;h=50
Yahoo rejects the later url with the error message as above.
Absolutely unbe-fricking-lievable.
I can’t imagine the thinking behind making string->url take a perfectly valid URL and breaking it. If you must do this, at least provide a function that doesn’t have this behaviour, say string->url and string->borked-url. This totally unreasonable design decision has damaged my confidence in the PLT scheme standard library. It is possible to work around the stupid default behaviour (although the fix presumably breaks URLs with semi-colons in the query string).
(require (lib "url.ss" "net") (lib "uri-codec.ss" "net")) (current-alist-separator-mode 'amp) (define (print-ln . args) (for-each display args) (newline)) (define *url-prefix* "http://download.finance.yahoo.com/d/quotes.csv?") (define *url* (string-append *url-prefix* "s=@%5EIXIC&f=sl1d1t1c1ohgv&e=.csv&h=50")) (define (display-url url) (let ((p (get-pure-port (string->url url)))) (let loop ((line (read-line p))) (unless (eof-object? line) (print-ln line) (loop (read-line p)))) (close-input-port p))) (display-url *url*)
The next step is to store the data in the database. First create a suitable table:
CREATE TABLE stock_data ( ticker varchar(8), price double, change double, date datetime )
There is a nice csv library on planet. You declare a csv reader like this:
(define make-csv-reader (make-csv-reader-maker '((separator-chars . (#\,)) (strip-leading-whitespace? . #t) (strip-trailing-whitespace? . #t))))
Then the functions to extract the interesting data and store it in the database. The symbol, price and change from the previous day are the first, second and fifth fields respectively. I think this csv file only changes once a day so we fix the time portion of the datetime to an arbitrary value. I haven’t named the functions particularly well here, but they should convey my intent at least.
(define (db-insert-row symbol price change) (let ((sql (format " INSERT INTO stock_data (ticker, price, change, date) VALUES ('~a', ~a, ~a, strftime('%Y-%m-%d 06:00:00', 'now')) " symbol price change))) (print-ln sql) (exec/ignore *dbh* sql))) (define (csv-insert-into-db p) (let ((reader (make-csv-reader p))) (let loop () (let ((l (reader))) (unless (or (null? l) (null? (cdr l))) (let ((symbol (first l)) (price (second l)) (change (fifth l))) (db-insert-row symbol price change)) (loop))))))
And here is the complete script:
(require (lib "1.ss" "srfi") (lib "url.ss" "net") (lib "uri-codec.ss" "net") (planet "csv.ss" ("neil" "csv.plt" 1 1)) (planet "sqlite.ss" ("jaymccarthy" "sqlite.plt" 3 1))) (current-alist-separator-mode 'amp) (define (print-ln . args) (for-each display args) (newline)) (define *url-prefix* "http://download.finance.yahoo.com/d/quotes.csv?") (define *url* (string-append *url-prefix* "s=@%5EIXIC&f=sl1d1t1c1ohgv&e=.csv&h=50")) (define (display-url url) (let ((p (get-pure-port (string->url url)))) (let loop ((line (read-line p))) (unless (eof-object? line) (print-ln line) (loop (read-line p)))) (close-input-port p))) (define make-csv-reader (make-csv-reader-maker '((separator-chars . (#\,)) (strip-leading-whitespace? . #t) (strip-trailing-whitespace? . #t)))) (define (db-insert-row symbol price change) (let ((sql (format " INSERT INTO stock_data (ticker, price, change, date) VALUES ('~a', ~a, ~a, strftime('%Y-%m-%d 06:00:00', 'now')) " symbol price change))) (print-ln sql) (exec/ignore *dbh* sql))) (define (csv-insert-into-db p) (let ((reader (make-csv-reader p))) (let loop () (let ((l (reader))) (unless (or (null? l) (null? (cdr l))) (let ((symbol (first l)) (price (second l)) (change (fifth l))) (db-insert-row symbol price change)) (loop)))))) (define (process-html url processor) (let ((p (get-pure-port (string->url url)))) (processor p) (close-input-port p))) (define *path-prefix* "c:/tmp") (define (path p) (string->path (string-append *path-prefix* "/" p))) (define *dbh* (open (path "test.db"))) (process-html *url* csv-insert-into-db) (select *dbh* "SELECT count(*) as stocks from stock_data") (define *results* (select *dbh* "SELECT * from stock_data")) (for-each print-ln *results*) (close *dbh*)
Hi Ian,
The strange thing is that the standard recommens semi-colons instead of ampersands.
So string->url doesn’t break the url per se, it is the Yahoo server, that doesn’t follow
the standard. As you found out, you need (current-alist-separator-mode ‘amp) to force it
to use ampersands.
Strange things usually have an explanation, the PLT mailing list is the place to get it.
Hi Jens,
I’ve had a look at the recommendation here:
http://www.w3.org/TR/html401/appendix/notes.html#h-B.2.2 My understanding is that semi-colons are recommended instead of ampersands but not required. This being the case, the Yahoo server is behaving reasonably. Here is a URL from google: http://www.google.co.uk/search?hl=en&q=facebook&btnG=Google+Search&meta=
Replacing ampersands with semi-colons here fails to work too
http://www.google.co.uk/search?hl=en;q=facebook;btnG=Google+Search;meta=
I can’t recall a time I have seen semi-colons rather than ampersands in a query string (admittedly I don’t look at them too closely).
Choosing to mangle valid urls by default violates the principle of least surprise and when the result doesn’t work on some fairly large real-world websites it is simply not the right thing to do.
Ian
Hi Ian,
The two urls ought to refer to the same ressource. The web-server ought to return the same page for both urls.
And as far as the comment goes “(although the fix presumably breaks URLs with semi-colons in the query string)” — I dont believe this is the case, but if you’re having problems, please do let us know, either on the plt-scheme mailing list, or via bug report.
Thanks for using PLT Scheme.
I understand how it is supposed to work. However, in a number of websites (including yahoo and google), it doesn’t work like that which means that string->url corrupts the url.
Btw – I think (90% sure) that the default changed was changed to ampersands in the SVN.
Hi Jens, thanks for the update. That is great news!
Thanks for posting this!
I ran into this same issue today, and not being very familiar with Scheme (I’m just trying to expose the library to Arc in a reasonable way), I wasn’t sure how to fix it. Your solution was quite helpful.
It’s going to be ending of mine day, except before finish I
am reading this impressive paragraph to improve my experience.