Yorick Standard Library

Back to library index.

Package std-checksum (in std.i) - crc, md5, sha1 checksums

Index of documented functions or symbols:

crc_on

DOCUMENT crc = crc_on(x)
      or crc = crc_on(x, crc0)
      or crc_table = crc_on(crc_def, -)
      or crc = crc_on(x, crc_table)
      or crc = crc_on(x, crc_table, crc0)
      or crc_def = crc_on(crc_table, -)
  return a cyclic redundancy check on X.  The crc has type long which
  is very likely (1 chance in 4 billion) to remain unchanged if X is
  corrupted by random noise.  With a non-nil crc0 argument previously
  returned by crc_on, begins with crc0 to yield (roughly speaking) the
  result you would have gotten on a single call if the two X arguments
  had been concatenated (note that the order matters).

  If X is a string array, the strings themselves, not including trailing
  '\0', are concatenated; string(0) is indistinguishable from "".
  If X is a struct instance array or a pointer array, the checksum
  returns an error.  X must be an array or nil [].

  Note that the crc value for any data type other than char depends
  on the native binary format on the platform where crc_on runs; the
  value will be different on other formats, just as it will be different
  for an array cast to a different data type.

  There are many different CRC algorithms, which can be parameterized
  by five integer values:
    CRC_DEF = [width, poly, init, reflect, xor]
  Here width is the width in bits, reflect is either 0 or 1 (false or
  true), and poly, init, and xor are zero except for at most their
  width least significant bits.  The returned crc is also zero except
  for its width least significant bits.  The parameterization is
  described in "A Painless Guide to CRC Error Detection Algorithms" at
    http://www.ross.net/crc/
  (The reflect parameter corresponds to refin and refot, which must
  be equal for crc_on to work, and the xor parameter corresponds to
  xorot.)  You can find a list of popular parameter values at
    http://regregex.bbcmicro.net/crc-catalogue.htm
  Do not try to "roll your own" parameters; let the experts do it.
  Here are some popular choices (crc_on requires width>=8):
    crc_def = [32, 0x04C11DB7, 0xFFFFFFFF, 1, 0xFFFFFFFF]  ("pkzip")
    crc_def = [32, 0x04C11DB7, 0, 0, 0xFFFFFFFF]           ("cksum")
    crc_def = [24, 0x864CFB, 0xB704CE, 0, 0]               ("crc24")
    crc_def = [16, 0x8005, 0, 1, 0]                        ("arc")
    crc_def = [16, 0x1021, 0, 1, 0]                        ("kermit")
  The default is "pkzip".  You can pass any of these five strings
  instead of an array of five numbers as CRC_DEF.
  
  To use a CRC algorithm other than "pkzip", you must first generate
  a CRC_TABLE by calling crc_on(crc_def,-), then pass the CRC_TABLE
  as the second argument with X as the first to compute the CRC.
  Finally, crc_on(crc_table,-) returns the corresponding CRC_DEF;
  crc_on(,-) returns the CRC_DEF for the default "pkzip" algorithm.

md5

DOCUMENT digest = md5(data)         compute digest of DATA array
         state = []                 initialize STATE
         md5, state, data           process DATA updating STATE
         digest = md5(state, data)  return DIGEST from STATE
         sha1 function has same semantics as md5 function
  The md5 and sha1 functions compute message digests or hashes.  The
  digest returned by md5 is an array of 16 char (128 bits); the digest
  returned by sha1 is an array of 20 char (160 bits).  There is a
  single call form, in which the input DATA array comprises the entire
  "message" to be digested.  There is also a multi-call sequence, in
  which you invoke md5 or sha1 with two arguments: The first argument
  is a STATE variable, while the second argument is the DATA to be
  appended to the "message" in that call.  DATA may be nil [], which
  causes no change in the STATE.  STATE must be a simple variable
  reference, which is updated and returned with each call.  As long
  as you call md5 or sha1 as a subroutine, STATE continues to be updated;
  call md5 or sha1 as a function to return the final digest of the
  concatenated "message".  Before the first call, you must initialize
  STATE to [].  The final function call destroys STATE, returning it
  as [].  (To generate the final digest, both the md5 and sha1 algorithms
  append some padding to the input message, which destroys STATE.)

  If DATA is a string array, the strings themselves, not including trailing
  '\0', are concatenated; string(0) is indistinguishable from "".
  If DATA is a struct instance array or a pointer array, the digesting
  function returns an error.  DATA must be an array or nil [].

  Although STATE is an array of char, it is platform dependent even
  though the final digest is not.  Do not attempt to save STATE or
  to use STATE itself as a digest.

  The advantage of md5 or sha1 over the crc_on function is that
  the resulting digest will be unique; no two DATA streams will
  ever produce the same digest for nearly all practical purposes.
  For the 32-bit crc result, you will get different data streams
  that have identical crc values after you process a modest number
  of data streams (tens of thousands).  Thus, md5 or sha1 can be used
  to "fingerprint" data streams; if the fingerprints of two streams
  agree, you can be practically sure the streams themselves agree.

  Both md5 and sha1 have now been "broken" cryptographically, which
  means that it is possible by heroic effort to create two different
  streams with the same digest.  (At this writing, many examples of
  streams with identical md5 digests exist.  No such examples exist
  yet for sha1, but it is clear that a few will appear fairly soon.)
  However, absent malicious intent and huge levels of effort, both
  md5 and sha1 are perfectly useful for fingerprinting data.  These
  algorithms give the same results as the md5sum and sha1sum
  utilities, widely used to fingerprint files on the Web.

sha1

SEE: md5