See more on Chunks in general.
See the DATA-nd chunk.
The DATA chunk holds the actual content of an archived file. Generally1, each file in the archive is stored as one DATA instance.
For example, suppose a ZIP2 archive is created with three files: foo.txt, bar.txt, and baz.txt. The archiver would create three instances of DATA, and store the complete contents of foo.txt in DATA#1, store bar.txt in DATA#2, and baz.txt in DATA#3. These may then be emitted as three separate chunks, each with its payload compressed and/or encrypted. Or, the orignal data may be concatenated together and compressed/encrypted as one chunk labeled as DATA#1-3.
Or, perhaps DATA#1 is too large to fit on the specified media even after compressing, so it is split into two chunks, DATA#1p1 and DATA#1p2, each of which is compressed/encrypted independantly and emitted into different parts of a split archive.
The point is, each file is stored in one DATA instance, not necessarily one physical chunk having one chunk header. Instances may be combined or split, so any number of physical chunks is possible. But there is still logically one DATA#1 instance in the archive containing the contents of foo.txt, regardless of whether it was “solidly packed” with other DATA instances, or split into many smaller parts.
The association of a stored file in the archive to a distinct DATA instance number is a fundimental principle of the ZIP2 archive structure. Other information, such as the file name and attributes, will be associated using this same number.
Flag | usage |
---|---|
a (correlated) | Cleared. |
b (subtype) | Cleared. |
r (range of instances) | all used in the general manner. |
p (multi-part) & c | |
y (payload specification) | |
i (instance sizes) | Set when r is set, Clear otherwise. |
n (pointer) | Cleared. DATA-nd has a distinct meaning, documented below. |
d (redundant) |
The y flag is typically present, as this payload contains the principle data (the file content) that is to be compressed and/or encrypted.
DATA-nd indicates that the specified instances are in a different file in a multi-part archive.
DATA-nd may only have the r (range) flag set, besides the n and d flags. When the n flag is used, the d flag must also be used.
The Payload contains a single uintV that specifies the file number of the multi-part archive that is expected to contain all the instances noted by this chunk.
The DATA-nd chunk is used to inform the program that DATA chunks may be found in a different part of a multi-part archive, without having to scan all the files first. So, if the user invokes the program giving it one disc in the set, and the part on that disc contains a DATA-nd#45 for example, stating that DATA#45 may be expected in file 47, then the implementation can prompt for disc number 47 and not have to ask for every disc in turn until it finds the right one.
Note that this is only a hint. If the program scans part 47 and does not find DATA#45 in it, then it is not an error. Perhaps that file was updated and not all the discs were rewritten. To this end, file 47 of the multi-part archive may have another DATA-nd#45 as a “forwarding address”, and the user is again prompted for a different disc. Or, with no further information to go on, the program has no choice but to start looking at all the discs in turn until it finds it, which is exactly what it would have to do if there was no DATA-nd chunk in the first place.
An implementation that chases forwarding DATA-nd chunks must watch for cycles and treat a DATA-nd instance that closes a cycle as if there was no such DATA-nd instance (that is, it gives up).
An implementation that prompts for media change must provide a way for the user to say “I don’t have it”. For DATA-nd chunks, cancelling the media change or not finding an expected file must be treated similarly to not finding the corresponding DATA chunk, as opposed to being a fatal error. That is, the implementation can give up and search in other files.
It is suggested that DATA-nd chunks be written to the same file in a multi-part archive that contains the INDX#0 chunk.
For example, the following chunk states that DATA#17 can be expected in file number 3.
05 ; size of chunk 18 01 ; DATA-nd type 11 ; Instance #17 03 ; Payload is the number 3 2E ; checksum
This one states that DATA#1 through DATA#42 inclusive may be found in file number 4. Note that file number 4 of the multi-part archive doesn’t have to contain a corresponding DATA-r#1-42 chunk, but can contain individual chunks for each instance or whatever.
06 ; size of chunk 98 01 ; DATA-ndr 01 2A ; Instances #1 through 42 04 ; Payload is the number 4 70 ; checksum
It’s also possible for a compression or encryption algorithm to use additional DATA instances, so each entry actually points to a main DATA instance but may refer to others as well.
Page content copyright 2003 by John M. Dlugosz. Home:http://www.dlugosz.com, email:mailto:john@dlugosz.com