In response to my Berkeley DB benchmarking post, Pedro Melo points out that Tokyo Cabinet is faster and that JSON::XS is faster than Storable.
I couldn’t find an up to date Ubuntu package that included the TC perl libraries so I had to build everything from source. It was pretty straightforward though.
First we need to get the database handle.
my $tc_file = "$ENV{HOME}/test.tc"; unlink $tc_file; my $hdb = TokyoCabinet::HDB->new(); if(!$hdb->open($tc_file, $hdb->OWRITER | $hdb->OCREAT)){ my $ecode = $hdb->ecode(); printf STDERR ("open error: %s\n", $hdb->errmsg($ecode)); }
Presumably putasync is the fastest database put method.
my $ORDER_ID = 0; sub store_record_tc { my ($db, $ref_record, $no_sync, $json) = @_; $json //= 0; $no_sync //= 0; $ref_record->{'order_id'} = ++$ORDER_ID; my $key = "/order/$ref_record->{'order_id'}"; $db->putasync($key, $json ? encode_json($ref_record) : Storable::freeze($ref_record)); }
I needed to amend store_record to compare json and storable too.
sub store_record { my ($db, $ref_record, $no_sync, $json) = @_; $json //= 0; $no_sync //= 0; $ref_record->{'order_id'} = ++$ORDER_ID; my $key = "/order/$ref_record->{'order_id'}"; $db->db_put($key, $json ? encode_json($ref_record) : Storable::freeze($ref_record)); $db->db_sync() unless $no_sync; }
The benchmarking code looks like this.
Benchmark::cmpthese(-1, {
'json-only-50/50' => sub { json_only($db, $rec_50_50) },
'freeze-only-50/50' => sub { freeze_only($db, $rec_50_50) },
'freeze-no-sync-50/50' => sub { store_record($db, $rec_50_50, 1) },
'freeze-no-sync-50/50-tc' => sub { store_record_tc($hdb, $rec_50_50, 1) },
'json-no-sync-50/50' => sub { store_record($db, $rec_50_50, 1, 1) },
'json-no-sync-50/50-tc' => sub { store_record_tc($hdb, $rec_50_50, 1, 1) },
});
And the results are as follows:
Rate freeze-no-sync-50/50 json-no-sync-50/50 freeze-no-sync-50/50-tc json-no-sync-50/50-tc freeze-only-50/50 json-only-50/50
freeze-no-sync-50/50 7791/s -- -9% -39% -47% -59% -81%
json-no-sync-50/50 8605/s 10% -- -33% -41% -55% -79%
freeze-no-sync-50/50-tc 12800/s 64% 49% -- -13% -33% -69%
json-no-sync-50/50-tc 14698/s 89% 71% 15% -- -23% -64%
freeze-only-50/50 19166/s 146% 123% 50% 30% -- -54%
json-only-50/50 41353/s 431% 381% 223% 181% 116% --
Pedro was right. Tokyo Cabinet is significantly faster than Berkeley DB, at least in this simple benchmark.
Edit: json and no_sync parameter switch has been fixed.
Jared, while you’re at it, would you also consider experimenting with Kyoto Cabinet?
It is another project by the same author of Tokyo Cabinet, offering even more parallelism, space efficiency, robustness, a simpler API with OO design and better portability, supporting non-POSIX environments. The performance of TC appears to be slightly higher than KC though, at least in single-thread operations.
http://fallabs.com/kyotocabinet/
I’ve been looking at both of them for quite some time now, and would love to hear your thoughts on each.
Might also be interesting to include tdb in that too…
Hi garu,
Yes, I’ll probably take a look at Kyoto Cabinet. I’m often forced to use Windows unfortunately, so a library that works on both Windows and POSIX is preferable.
Hi Adam,
What is tdb? Is that the trivial database thing designed by the samba guy?
Yes, it is.