Last time we were looking at encoding data. HTML has a nice way of encoding its special characters, e.g. < and > are mapped to < and > We can do something similar. Important characters to encode are quote, equals, ampersand and maybe a few others. Then we have to decide on a character for encoding – I like plus (+) as it doesn’t look like a special character to my eyes. As we are not planning for the future, plus and a single character should do.
First of all list all the encodings.
(define *encodings-list* '((+s " ") (+a "&") (+e "=") (+p "+") (+q "\"") (+r "\r\n")))
Then store the encodings and the reverse encodings in a hash.
(define *hash* (make-hash-table 'equal)) (for-each (lambda (e) (let ((k (symbol->string (car e))) (v (second e))) (hash-table-put! *hash* k v) (hash-table-put! *hash* v k))) *encodings-list*)
pregexp-replace* is perfect for encoding and decoding.
(define (encode-chars regex s) (pregexp-replace* regex s (lambda (k) (hash-table-get *hash* k))))
This would be called like
(encode-chars "[+ &\"]|\r\n" "<a string>")
I’ve decided to compress and base64 encode my data in case it becomes really big.
(define (data-encode data) (let ((i (open-input-string data)) (o (open-output-bytes))) (deflate i o) (let ((r (base64-encode (get-output-bytes o)))) (encode-chars "[+=]|\r\n" (bytes->string/utf-8 (subbytes r 0 (- (bytes-length r) 2)))))))
We also need a decoder that performs the inverse operation.
(define (data-decode data) (let ((i (open-input-bytes (base64-decode (string->bytes/utf-8 (encode-chars "\\+." data))))) (o (open-output-string))) (inflate i o) (get-output-string o)))
And finally, my url looks reasonable:
(print-ln "<a href=\"ser.scm?data=" (data-encode (object->string *data*)) "\">next page</a><br/>")
Does decode perform the inverse of encode?
(print-ln (data-decode (data-encode (object->string *data*))))
which gives:
> (("Task" "Monday" "Tuesday" "Wednesday") ("Cooking" "1hr" "1hr" "2hrs"))
The encoded data itself looks like this:
> "BcExCsAgDEDRq4Q/mdFeoXM3oXMgUougYOjg7fteShSLjnDN4bYRylfDbSPc1UcNt41K4pyz+rv+pNByG0h5LYQjrYC1R8+e"
I fixed print-ln too – display is for human-readable stuff, print is for the computer. Next time we will have to extract the data from the query string, load the data into a variable (this shouldn’t be too difficult) and think about rendering the table. For anyone following along at home, the full script is as follows:
#!mzscheme -mqf (require (lib "1.ss" "srfi") (lib "pregexp.ss") (lib "base64.ss" "net") (lib "deflate.ss") (lib "inflate.ss")) (define-struct table-data (columns data)) (define (print-ln . args) (for-each display args) (newline)) (define (header type) (string-append "Content-type: " type "; charset=iso-8859-1~n~n")) (printf (header "text/html")) (define *encodings-list* '((%20 " ") (+a "&") (+e "=") (+p "+") (+q "\"") (+r "\r\n"))) (define *hash* (make-hash-table 'equal)) (for-each (lambda (e) (let ((k (symbol->string (car e))) (v (second e))) (hash-table-put! *hash* k v) (hash-table-put! *hash* v k))) *encodings-list*) (define (blank-if-null s) (if (string? s) s "")) (define *query-string* (blank-if-null (getenv "QUERY_STRING"))) (when (> (string-length *query-string*) 0) (printf "[~a]<br/>~n" *query-string*)) (define (string->object s) (read (open-input-string s))) (define (object->string o) (let ((string-port (open-output-string))) (write o string-port) (get-output-string string-port))) (define (manifest-string-encode s) (pregexp-replace* "'" s "\"")) (define (manifest-string->data s) (string->object (manifest-string-encode s))) (define *data* #f) (set! *data* (manifest-string->data "(('Task' 'Monday' 'Tuesday' 'Wednesday') ('Cooking' '1hr' '1hr' '2hrs'))")) (define (encode-chars regex s) (pregexp-replace* regex s (lambda (k) (hash-table-get *hash* k)))) (define (data-encode data) (let ((i (open-input-string data)) (o (open-output-bytes))) (deflate i o) (let ((r (base64-encode (get-output-bytes o)))) (encode-chars "[+=]|\r\n" (bytes->string/utf-8 (subbytes r 0 (- (bytes-length r) 2))))))) (define (data-decode data) (let ((i (open-input-bytes (base64-decode (string->bytes/utf-8 (encode-chars "\\+." data))))) (o (open-output-string))) (inflate i o) (get-output-string o))) (print-ln "<a href=\"ser.scm?data=" (data-encode (object->string *data*)) "\">next page</a><br/>") (print-ln "<a href=\"ser.scm\">Restart</a><br>") (print-ln (data-decode (data-encode (object->string *data*)))) (exit)
Hi IanO,
Excellent post. I found a small problem with the regexp in data-decode, nothing big.
Here is my corrected version:
(define (data-decode data)
(let ((i (open-input-bytes
(base64-decode
(string->bytes/utf-8
(encode-chars "[+]." data)))))
(o (open-output-string)))
(inflate i o)
(get-output-string o)))
Here is the output:
Welcome to DrScheme, version 370.3-svn9jun2007 [3m].
Language: SchemeKeys.
Content-type: text/html; charset=iso-8859-1
next page
Restart
(("Task" "Monday" "Tuesday" "Wednesday") ("Cooking" "1hr" "1hr" "2hrs"))
Is this what you had in mind?
Take care,
–kyle
Hi Kyle,
Thanks for the comment. When I entered the article, I had escaped the + symbol with backslashes but WordPress seems to have selectively eaten my backslashes (it hasn’t got them all!). It also broke the carriage return encoding. Let me try updating the article to see if that fixes it.
Cheers,
Ian
There are a few more libraries in mzscheme that you can use to dissolve a bit of the code you have there. In particular, the uri-codec.ss library in the net collection handles URI encoding stuff:
(require (lib "uri-codec.ss" "net"))
(parameterize ([current-alist-separator-mode 'amp])
(display (alist->form-urlencoded
'((name . "Danny")
(email . "dyoo@cs.wpi.edu")))))
For the HTML generation, you may want to look at the xml.ss library in the xml package. I wrote a small example using it here.