A fairly common task I need to do is:
- query a remote service which returns a large amount of data
- extract just the bits I need from that data
- do something with the extracted bits of data
Often the initial query will take a few seconds to run and I’ll be thinking I can’t be bothered to wait for this, why don’t I just cache the data.
If I decide it is worth it, the next question is where and how to cache.
And what I generally think of first is an AnyEvent-based Proxy Server. Just as quickly, I discard that option as I can’t be bothered to figure out how to ensure the proxy is up when I need it. For example, what happens when the physical host where the proxy is running reboots? What happens if the sysadmin kills my process? Etc.
Storable and Freeze / Thaw
This often leads to a couple of issues too – where should I store my datafiles. Should I store it somewhere in
$HOME and then if multiple people want to cache the data, they are each hitting the remote service, or should I find I have a shared area? Then, should this shared area be on a network drive, or local to the box?
So, finally, I get to thinking about caching the data in the database. I know for sure that will always be available.
Can you guess which option I generally choose? Does anyone else have any thoughts and where and how to cache data?