Back to library index.
Package std-checksum (in std.i) - crc, md5, sha1 checksums
Index of documented functions or symbols:
DOCUMENT crc = crc_on(x)
or crc = crc_on(x, crc0)
or crc_table = crc_on(crc_def, -)
or crc = crc_on(x, crc_table)
or crc = crc_on(x, crc_table, crc0)
or crc_def = crc_on(crc_table, -)
return a cyclic redundancy check on X. The crc has type long which
is very likely (1 chance in 4 billion) to remain unchanged if X is
corrupted by random noise. With a non-nil crc0 argument previously
returned by crc_on, begins with crc0 to yield (roughly speaking) the
result you would have gotten on a single call if the two X arguments
had been concatenated (note that the order matters).
If X is a string array, the strings themselves, not including trailing
'\0', are concatenated; string(0) is indistinguishable from "".
If X is a struct instance array or a pointer array, the checksum
returns an error. X must be an array or nil [].
Note that the crc value for any data type other than char depends
on the native binary format on the platform where crc_on runs; the
value will be different on other formats, just as it will be different
for an array cast to a different data type.
There are many different CRC algorithms, which can be parameterized
by five integer values:
CRC_DEF = [width, poly, init, reflect, xor]
Here width is the width in bits, reflect is either 0 or 1 (false or
true), and poly, init, and xor are zero except for at most their
width least significant bits. The returned crc is also zero except
for its width least significant bits. The parameterization is
described in "A Painless Guide to CRC Error Detection Algorithms" at
http://www.ross.net/crc/
(The reflect parameter corresponds to refin and refot, which must
be equal for crc_on to work, and the xor parameter corresponds to
xorot.) You can find a list of popular parameter values at
http://regregex.bbcmicro.net/crc-catalogue.htm
Do not try to "roll your own" parameters; let the experts do it.
Here are some popular choices (crc_on requires width>=8):
crc_def = [32, 0x04C11DB7, 0xFFFFFFFF, 1, 0xFFFFFFFF] ("pkzip")
crc_def = [32, 0x04C11DB7, 0, 0, 0xFFFFFFFF] ("cksum")
crc_def = [24, 0x864CFB, 0xB704CE, 0, 0] ("crc24")
crc_def = [16, 0x8005, 0, 1, 0] ("arc")
crc_def = [16, 0x1021, 0, 1, 0] ("kermit")
The default is "pkzip". You can pass any of these five strings
instead of an array of five numbers as CRC_DEF.
To use a CRC algorithm other than "pkzip", you must first generate
a CRC_TABLE by calling crc_on(crc_def,-), then pass the CRC_TABLE
as the second argument with X as the first to compute the CRC.
Finally, crc_on(crc_table,-) returns the corresponding CRC_DEF;
crc_on(,-) returns the CRC_DEF for the default "pkzip" algorithm.
DOCUMENT digest = md5(data) compute digest of DATA array
state = [] initialize STATE
md5, state, data process DATA updating STATE
digest = md5(state, data) return DIGEST from STATE
sha1 function has same semantics as md5 function
The md5 and sha1 functions compute message digests or hashes. The
digest returned by md5 is an array of 16 char (128 bits); the digest
returned by sha1 is an array of 20 char (160 bits). There is a
single call form, in which the input DATA array comprises the entire
"message" to be digested. There is also a multi-call sequence, in
which you invoke md5 or sha1 with two arguments: The first argument
is a STATE variable, while the second argument is the DATA to be
appended to the "message" in that call. DATA may be nil [], which
causes no change in the STATE. STATE must be a simple variable
reference, which is updated and returned with each call. As long
as you call md5 or sha1 as a subroutine, STATE continues to be updated;
call md5 or sha1 as a function to return the final digest of the
concatenated "message". Before the first call, you must initialize
STATE to []. The final function call destroys STATE, returning it
as []. (To generate the final digest, both the md5 and sha1 algorithms
append some padding to the input message, which destroys STATE.)
If DATA is a string array, the strings themselves, not including trailing
'\0', are concatenated; string(0) is indistinguishable from "".
If DATA is a struct instance array or a pointer array, the digesting
function returns an error. DATA must be an array or nil [].
Although STATE is an array of char, it is platform dependent even
though the final digest is not. Do not attempt to save STATE or
to use STATE itself as a digest.
The advantage of md5 or sha1 over the crc_on function is that
the resulting digest will be unique; no two DATA streams will
ever produce the same digest for nearly all practical purposes.
For the 32-bit crc result, you will get different data streams
that have identical crc values after you process a modest number
of data streams (tens of thousands). Thus, md5 or sha1 can be used
to "fingerprint" data streams; if the fingerprints of two streams
agree, you can be practically sure the streams themselves agree.
Both md5 and sha1 have now been "broken" cryptographically, which
means that it is possible by heroic effort to create two different
streams with the same digest. (At this writing, many examples of
streams with identical md5 digests exist. No such examples exist
yet for sha1, but it is clear that a few will appear fairly soon.)
However, absent malicious intent and huge levels of effort, both
md5 and sha1 are perfectly useful for fingerprinting data. These
algorithms give the same results as the md5sum and sha1sum
utilities, widely used to fingerprint files on the Web.
SEE ALSO: crc_on
SEE: md5
