Herein, I aim to log interesting statistics I gather, in order to have
some baseline data to tune cache sizes and the like.
Block sizes
Here's a rundown of block types/sizes from my large vault:
sqlite> SELECT type, CASE WHEN length < 16 THEN '0-15'
...> WHEN length < 32 THEN '16-31'
...> WHEN length < 64 THEN '32-63'
...> WHEN length < 128 THEN '64-127'
...> WHEN length < 256 THEN '128-255'
...> WHEN length < 512 THEN '256-511'
...> WHEN length < 1024 THEN '512-1024'
...> ELSE '>1024'
...> END AS lc, COUNT (*)
...> FROM blocks GROUP BY type, lc;
d|0-15|1
d|128-255|124285
d|256-511|203371
d|512-1024|168253
d|64-127|240
d|>1024|428314
di|128-255|283
di|256-511|7
di|512-1024|6
di|64-127|214
di|>1024|6
f|0-15|7277
f|128-255|125059
f|16-31|22742
f|256-511|171423
f|32-63|40536
f|512-1024|225475
f|64-127|69289
f|>1024|3220513
fi|0-15|1
fi|128-255|36333
fi|256-511|17444
fi|512-1024|8959
fi|64-127|26344
fi|>1024|41406
snapshot|128-255|1
snapshot|256-511|440
snapshot|512-1024|1553
snapshot|>1024|449
Large snapshot blocks are because we put the log inline - this should
go out into "f" blocks!
If we make backend-cache able to cache data blocks, we should probably
give it per-block-type size thresholds for caching blocks, as well as
an overall space quota for the whole thing.
Looking at the above, we should probably cache all snapshot blocks, d/di
blocks less than 1KiB or so, and probably no f/fi blocks, to ensure
snappy scanning of snapshot chains and directories.