computeDifference - Decode integer from diffmap
dumpDifference - Extract other string using original string and diffmap
extractDifference - Compute difference between original and other and return "difference map" string
IntegerToTwoCharString - String difference routines:
Barry Brannan, September 1997.
TwoCharStringToInteger - Encode integer into diffmap
function computeDifference(block_size: integer; original, second: string): string;
Decode integer from diffmap
function dumpDifference(diffmap: string): string;
Extract other string using original string and diffmap
function extractDifference(original, diffmap: string): string;
Compute difference between original and other and return "difference map" string
function IntegerToTwoCharString(n: integer): string;
String difference routines:
Barry Brannan, September 1997.
barrylb@poboxes.com
http://www.poboxes.com/barrylb
Bug reports, suggestions welcome. If you make any improvements I would
appreciate if you sent them to me.
Some possible improvements...
* faster speed
* cleaner code
The routines:
-------------
computeDiff(blocksize, original, second) --> diffmap
extractDiff(original, diffmap) --> second
dumpDifference(diffmap) --> string illustrating diffmap
nb blocksize is stored in diffmap
What it is useful for:
----------------------
Suppose you are downloading pages of data from an online service at regular
intervals. Rather than saving whole snapshots of the data each time, you
save the first page then save "difference maps" for each update. If the
pages don't change much then you have huge gains in storage efficency.
How it works:
-------------
1. Each string (original, second) is divided into "blocks" of the specified
blocksize.
2. Algorithm:
For each block in second do
Begin
Find corresponding block in original.
If found block in original then
write block number to diffmap
Else
write literal text to diffmap
End.
3. When trying to find corresponding block in original, the routines attempt
to find a consecutive block rather than just the first block to optimize
storage efficiency. See "Diffmap format", below.
4. The blocksize would ideally be 1 but that generally is too slow, so a
blocksize of say 32 byte would be suitable for files under 64K. The
maximum number of blocks in a file cannot exceed 65535 so you may have
to increase the block size anyway if the files are too large.
Diffmap format:
---------------
Starts with one byte magic number then two bytes indicating the blocksize
then followed by a sequence of codes...
chr #1 -- source block range indicator, followed by four bytes. First two are
start block number, second two are end block number.
chr #2 -- literal string, followed by two bytes indicating the number of
characters, then the characters themselves.
chr #3 -- single source block, followed by two bytes indicating the source
block number.
Current magic number is '1' (ascii 49)
function TwoCharStringToInteger(s: string): integer;
Encode integer into diffmap