Dave's Free Press

Technology: Data-CompactReadonly-0.1.0

 

Older stuff

No Title


NAME

Data::CompactReadonly


DESCRIPTION

A Compact Read Only Database that consumes very little memory. Once created a database can not be practically updated except by re-writing the whole thing. The aim is for random-access read performance to be on a par with DBM::Deep and for files to be much smaller.


VERSION 'NUMBERS'

This module uses semantic versioning. That means that the version 'number' isn't really a number but has three parts: major.minor.patch.

The major number will increase when the API changes incompatibly;

The minor number will increase when backward-compatible additions are made to the API;

The patch number will increase when bugs are fixed backward-compatibly.


FILE FORMAT VERSIONS

All versions so far support file format version 0 only.

See Data::CompactReadonly::V0::Format for details of what that means.


METHODS

create

Takes two arguments, the name of file into which to write a database, and some data. The data can be undef, a number, some text, or a reference to an array or hash that in turn consists of undefs, numbers, text, references to arrays or hashes, and so on ad infinitum.

This method may be very slow. It constructs a file by making lots of little writes and seek()ing all over the place. It doesn't do anything clever to figure out what pointer size to use, it just tries the shortest first, and then if that's not enough tries again, and again, bigger each time. See Data::CompactReadonly::Format for more on pointer sizes. It may also eat lots of memory. It keeps a cache of everything it has seen while building your database, so that it can re-use data by just pointing at it instead of writing multiple copies of the same data into the file.

It tries really hard to preserve data types. So for example, 60000 is stored and read back as an integer, but "60000" is stored and read back as a string. This means that you can correctly store and retrieve "007" but that 007 will have the leading zeroes removed before Data::CompactReadonly ever sees it and so will be treated as exactly equivalent to 7. The same applies to floating point values too. "7.10" is stored as a four byte string, but 7.10 is stored the same as 7.1, as an eight byte IEEE754 double precision float. Note that perl parses values like 7.0 as floating point, and thus so does this module.

Finally, while the file format permits numeric keys and Booleans in hashes, this method always coerces them to text. It does that to numbers because if you allow numeric keys, numbers that can't be represented in an int, such as 1e100 or 3.14 will be subject to floating point imprecision, and so it is unlikely that you will ever be able to retrieve them as no exact match is possible. And it does it to Booleans because when you un-serialise them on an older perl they may be confused with strings, leading to loss of data if those strings are also present as keys in the dictionary.

read

Takes a single compulsory argument, which is a filename or an already open file handle, and some options.

If the first argument is a filehandle, the current file pointer should be at the start of the database (not necessarily at the start of the file; the database could be in a __DATA__ segment) and must have been opened in ``just the bytes ma'am'' mode.

It is a fatal error to pass in a filehandle which was not opened correctly or the name of a file that can't be opened or which doesn't contain a valid database.

The options are name/value pairs. Valid options are:

tie
If true return tied objects instead of normal objects. This means that you will be able to access data by de-referencing and pretending to access elements directly. Under the bonnet this wraps around the objects as documented below, so is just a layer of indirection. On modern hardware you probably won't notice the concomittant slow down but may appreciate the convenience.

fast_collections
If true Dictionary keys and values will be permanently cached in memory the first time they are seen, instead of being fetched from the file when needed. Yes, this means that objects will grow in memory, potentially very large. Only use this if if it an acceptable pay-off for much faster access.

This is not yet implemented for Arrays.

Returns the ``root node'' of the database. If that root node is a number, some piece of text, True, False, or Null, then it is decoded and the value returned. Otherwise an object (possibly a tied object) representing an Array or a Dictionary is returned.


OBJECTS

If you asked for normal objects to be returned instead of tied objects, then these are sub-classes of either Data::CompactReadonly::Array or Data::CompactReadonly::Dictionary. Both implement the following three methods:

id

Returns a unique id for this object within the database. Note that circular data structures are supported, and looking at the id is the only way to detect them.

This is not accessible when using tied objects.

count

Returns the number of elements in the structure.

indices

Returns a list of all the available indices in the structure.

element

Takes a single argument, which must match one of the values that would be returned by indices, and returns the associated data.

If the data is a number, Null, or text, the value will be returned directly. If the data is in turn another array or dictionary, an object will be returned.

exists

Takes a single argument and tell you whether an index exists for it. It will still die if you ask it fomr something stupid such as a floating point array index or a Null dictionary entry.


UNSUPPORTED PERL TYPES

Globs, Regexes, References (except to Arrays and Dictionaries).

Booleans are only supported on perl version 5.35.7 or later. On earlier perls, a Boolean in the database will be decoded as a true or false value, but its type will be numeric or string. And a older perls will never write a True or False node to the database, they'll always write numbers or strings with true/false values, which other implementations will decode as numbers or strings.


BUGS/FEEDBACK

Please report bugs by at https://github.com/DrHyde/perl-modules-Data-CompactReadonly/issues, including, if possible, a test case.


SEE ALSO

DBM::Deep if you need updateable databases.


SOURCE CODE REPOSITORY

git://github.com/DrHyde/perl-modules-Data-CompactReadonly.git


AUTHOR, COPYRIGHT and LICENCE

Copyright 2020 David Cantrell <david@cantrell.org.uk>

This software is free-as-in-speech software, and may be used, distributed, and modified under the terms of either the GNU General Public Licence version 2 or the Artistic Licence. It's up to you which one you use. The full text of the licences can be found in the files GPL2.txt and ARTISTIC.txt, respectively.


CONSPIRACY

This module is also free-as-in-mason software.