[png-mng-misc] ANG draft 7 (with multi-layer frames)

   [74][png-mng-misc]
   ANG draft 7 (with multi-layer frames)
   From: Adam M. Costello <png.amc+<0@ni...> - 2007-05-02 06:36:52
   Since both APNG and anIM allow frames to be constructed by compositing
   multiple images, and since none of the implementors here seem the
   slightest bit put off by this, but instead seem enthusiastic about the
   potential compression gains, I've added this capability to ANG.  I've
   expressed it in a slightly different way, however, distinguishing
   between layers and frames, for reasons given in the Rationale section.

   ANG-7 is more like a midpoint between APNG and anIM.  I've tried to keep
   the most important advantages of both: automatic fallback to a single
   frame in browsers, and a clear distinction between still images and
   animations as indicated by filename extensions and media types.  I am
   still hopeful that the PNG folks and the Mozilla folks can meet each
   other halfway, rather than go off in divergent directions.

   AMC

   Animated Network Graphics (ANG), draft 7 (2007-May-01-Tue)
   Adam M. Costello
   [75]http://www.nicemice.net/amc/

   Changes from draft 6

       Introduced the concepts of layer and substrate, to allow frames to
       be constructed by compositing multiple images, like in APNG and
       anIM, with the goal of improving compression.  This required the
       addition of a section explaining alpha-over-alpha compositing,
       expansion of the remarks about frame numbering, and the addition
       of remarks about the lossyness of frames contrasted with the
       losslessness of layers.

       Moved the discussion of the rejected media type out of the Rationale
       section and into an editorial comment that would not be included in
       a final draft.

   Acknowledgements

       Several good ideas have been taken from the PNG mailing list
       (currently png-mng-misc@...).

   Contents
       Goals
       Relationship to PNG
       Datastream tagging
       Conceptual model
       Datastream format
       Rationale

   Goals

        1) Capabilities comparable to animated GIF, plus the added features
           of PNG (like 24-bit color and alpha).

        2) Automatic fallback to PNG in existing web browsers, using the
           <img> tag, showing an author-selected single frame instead of
           the animation.

        3) Respect for the PNG specification and existing PNG applications
           and users, to the extent possible given goal 2.

        4) Simplicity.

        5) Compression at least as good as in animated GIF, or even better
           if possible in a simple format.  The compression need not rival
           that of a complex format like MNG.

   Relationship to PNG

       ANG is not PNG, but it is deliberately very similar.  PNG contains
       a single still image, whereas ANG contains both a still image and
       an animation.  The ANG datastream format is identical to the PNG
       datastream format (including the signature) except that an ANG
       datastream must contain an ahDR chunk before IDAT and must contain
       an adAT chunk after IDAT, whereas a PNG datastream must not contain
       these chunks, because the PNG specification prohibits multiple
       images in a PNG datastream.

       The still image in an ANG serves two purposes:

        1) A fallback for applications or display technologies (like paper)
           that do not support animation.

        2) A source image to be used (optionally) in the animation, in
           addition to the montage in adAT.

       Unlike PNG and GIF, ANG is not a fully streamable format.  ANG
       encoders cannot produce ANG in a streaming fashion because the
       frame data is contained in a single chunk.  ANG decoders can or
       cannot consume ANG in a streaming fashion depending on how encoders
       choose to lay out the data.  There is typically a trade-off between
       streamability and compression.

   Datastream tagging

       Because both PNG and ANG use the same signature, it is important
       that ANG be tagged correctly.  Its media type is "video/x-ang".  The
       media type "image/png" must not be used for ANG, because ANG is not
       PNG, and because a video is not an image.

       [[ If/when this media type is registered, the "x-" prefix will be
       removed. ]]

       The recommended file extension for ANG is ".ang".  The extension
       ".png" should never be used for ANG, because it is important that
       users be able to easily distinguish PNG and ANG, so that they are
       not surprised if a PNG viewer does not show the animation in an ANG,
       or if a PNG editor drops the animation from an ANG.

       The deliberate similarity between the ANG and PNG formats
       facilitates incremental deployment of ANG, with automatic fallback
       to PNG.  When ANG-unaware PNG-aware applications are fed an ANG
       datastream, they will misinterpret the ANG as a PNG, ignore the
       unrecognized ahDR and adAT chunks, and display the still image.
       This is potentially confusing to users, but hopefully the media type
       and the filename will mitigate that hazard.  For example, this HTML
       inline image will display as a still-image PNG in most ANG-unaware
       web browsers, even though the URL ends in ".ang" and the HTTP server
       tags it as "video/x-ang":

           <img src="[76]http://example/foo.ang">;

       [[ I have verified this for Firefox 1.5 & 2.0, IE 6 & 7, Safari,
       Konqueror 3.1, and Opera Mini.  To help people test other browsers,
       I have created [77]http://www.nicemice.net/amc/test/ang.html, which
       contains three instances of the same inline PNG image, one served as
       "image/png; x-anim=1", one served as "video/x-ang", and one served
       as "application/octet-stream".  The first of those types might, in
       theory, be more likely than "video/x-ang" to facilitate automatic
       fallback, because RFCs 2045 and 2046 require that unrecognize media
       type parameters be ignored.  However, "video/x-ang" works just fine
       in practice, so the param-hackery is not needed. ]]

       The distinct media types for ANG and PNG allow greater control over
       the fallback, if desired.  For example, if you wanted ANG-unaware
       web browsers to fall back to animated GIF rather than still PNG, you
       could do something like this:

           <object data="foo.ang" type="video/x-ang">
           <img src="foo.gif">
           </object>

       ANG-aware decoders should use the following logic to determine
       whether a datastream beginning with the PNG signature is PNG or ANG:

        1) If a media type is available, trust it.

        2) Otherwise, if a filename is available, and it ends with ".png"
           or ".ang" (or any capitalization thereof), trust it.

        3) Otherwise, if ahDR is present, assume ANG.

        4) Otherwise, assume PNG.

       Since ahDR and adAT are invalid in PNG, they are errors if
       encountered in a PNG datastream.  Decoders that recognize them
       should treat them like any erroneous ancillary chunks: ignore them,
       and notify the user if appropriate.  For this particular error, the
       notification could perhaps suggest that the user rename or re-tag
       the file if possible.  Of course decoders that do not recognize
       these chunks will just ignore them.

   Conceptual model

       An ANG datastream encodes a still image, just like PNG, and also
       encodes an animation, which is a sequence of images, called frames,
       all the same width and height as the still image, which are to
       be displayed consecutively in the same place, each for a nonzero
       duration indicated in the datastream (but interactive applications
       should allow the user to pause or jump to the next frame at any
       time).

       Each frame of the animation is the result of stacking zero or more
       constituent images, called layers, in front of a default image,
       called the substrate.  The substrate has the same width, height, and
       position as the frames, and is uniformly filled with a single pixel
       value.  The layers can be smaller than a frame, and they always lie
       completely within the frame boundary.  Each layer is a positioned
       and clipped copy of either the still image or a second image called
       the montage.  The layers and the substrate do not necessarily
       hide what lies behind them, because pixels can be transparent or
       partially transparent.

       The pixel value used to fill the substrate is different for
       different color types:  For color type 1 (indexed-color), it is
       palette index 0.  For color types 0 and 2 (non-indexed without
       alpha) it is the value of the tRNS chunk if present, otherwise all
       zeros (black).  For color types 4 and 6 (non-indexed with alpha), it
       is all zeros (fully transparent black).

       All meta-data (chunk fields) that apply to the still image in IDAT
       also apply to the montage in adAT, with only three exceptions:  The
       width, height, and interlace method in IHDR do not apply to the
       montage.  The montage has its own width, height, and interlace
       method given in ahDR.  All meta-data that apply to each pixel of the
       still image and the montage (like color type, bit depth, significant
       bits, palette, color space, physical size) also apply to each pixel
       of the layers and the substrate.

       The still image and montage are represented losslessly in an ANG
       datastream.  Since the layers are simply positioned and clipped
       copies of those images, they are also represented losslessly.  The
       frames, however, are represented as compositions of images which
       can be lossy.  For datastreams with alpha channels (color types 4
       and 6), the composition involves gamma decoding and alpha blending
       (and perhaps gamma re-encoding), which are subject to floating-point
       round-off errors and slight differences in implementation.  For
       datastreams without alpha channels (color types 0, 1, and 4), the
       composition involves only simple pixel replacement, and the frames
       are lossless.

   Frame and layer numbering

       Frame and layer numbering is specified here for consistency among
       applications that allow users to refer to particular frames and
       layers.  The frame and layer numbers are not used inside ANG
       datastreams.

       The frames of the animation are numbered starting with 1.  Frame 0
       refers to the still image, which unlike the animation frames does
       not have the substrate underlying it.  An animation can request to
       be played more than once, but this does not affect the frame count.

       The layers of the animation are numbered starting with 3.  Layer 0
       refers to the still image.  Layer 1 refers to the montage.  Layer 2
       refers to the substrate.  A frame can inherit the layers of the
       previous frame and add new layers in front, but in this case only
       the new layers get new layer numbers; the inherited layers keep
       their original layer numbers.

       Layers within a frame can also be numbered relative to the frame,
       starting with 0 for the back-most layer.  For example, if frame 2
       is composed of layers 5 and 6, and frame 3 inherits the layers from
       frame 2, then frame-2-layer-0 equals frame-3-layer-0 equals layer 5.

   Datastream format

       See the PNG specification for all aspects of the ANG datastream
       format except the ahDR and adAT chunks, which are specified here.

       [[ If/when these chunks are registered, the second letter of each
       will be capitalized. ]]

       ahDR must appear exactly once, before IDAT.  It contains:

           num_frames (4 bytes, unsigned)
               The number of frame specifiers in adAT.

           ticks_per_second (4 bytes, unsigned)
               Defines the time unit for frame durations.  If this is zero,
               all frame durations are infinite.

           num_plays (4 bytes, unsigned)
               Number of times to play the animation.  Zero means infinity.

           montage_width (4 bytes, unsigned)
               Width of the montage in adAT, in pixels.  Not zero.

           montage_height (4 bytes, unsigned)
               Height of the montage in adAT, in pixels.  Not zero.

           montage_interlace_method (1 byte)
               Interlace method used by the montage in adAT.

           still_image_used (1 byte, boolean)
               Must be 0 or 1.

               If 0, the still image is not used in the animation, and need
               not be decoded in order to display the animation, and the
               layer specifiers in adAT do not include from_still_image
               fields.

               If 1, the still image may be used in the animation, and each
               layer specifier in adAT includes a from_still_image field.

       adAT must appear exactly once, after IDAT.  It contains a compressed
       stream (using the compression method indicated in IHDR), which
       contains a sequence of num_frames frame specifiers, immediately
       followed by a montage.  Each frame specifier contains:

           frame_duration (4 bytes, unsigned)
               Duration of the frame, in ticks.  Zero means infinity.

           keep_prior_layers (1 byte, boolean)
               Must be 0 or 1.

               If 0, this frame has layers indicated by its own layer
               specifiers and no others.

               If 1, the layers indicated by this frame's layer specifiers
               are added (in front) to the stack of layers inherited from
               the previous frame.  For the first frame (frame 1), the
               inherited stack is empty (has zero layers).  Even if the
               animation loops, the first frame does not inherit layers
               from the last frame of the previous loop.

           num_layers (1 byte, unsigned)
               The number of layer specifiers for this frame.

           layer_specifiers (num_layers * (24 + still_image_used) bytes)
               A sequence of num_layers layer specifiers, in order from
               back to front.

       Each layer specifier contains:

           from_still_image (0 or 1 byte, boolean)
               This field appears if and only if the still_image_used
               field of ahDR is 1.  If from_still_image is 0 or absent,
               the montage is the source image for this layer.  If
               from_still_image is present and 1, the still image is the
               source image for this layer.  No other values are allowed.

           shift_left (4 bytes, signed)
           shift_up (4 bytes, signed)
           clip_left (4 bytes, signed)
           clip_top (4 bytes, signed)
           clip_width (4 bytes, unsigned)
           clip_height (4 bytes, unsigned)
               The layer is derived from the source image as follows.
               Starting with the source image positioned with its
               upper-left corner aligned with the upper-left corner of the
               frame, the source image is shifted shift_left pixels to
               the left and shift_up pixels upward, then it is clipped to
               both the clip boundaries and the frame boundaries, where
               clip_left and clip_top are the offsets (in pixels) from the
               upper-left corner of the frame to the upper-left corner of
               the clip rectangle, and clip_width and clip_height are the
               dimensions (in pixels) of the clip rectangle.  Note that
               some of these fields are signed and can be negative.

               The width and height of the layer are the width and
               height of the overlap of the frame rectangle and the clip
               rectangle.  If the two do not overlap, then the width and
               height of the layer are zero, which is not a problem for
               displaying the animation, but is an error when extracting
               layers as PNG datastreams, because a PNG image is required
               to have nonzero width and height.  Therefore encoders should
               not specify zero-area layers (which are pointless anyway).

       Immediately following the sequence of frame specifiers is:

           montage (bytes)
               Filtered scanlines.

       The montage is formatted exactly like the uncompressed contents
       of IDAT chunk data, except that it uses the width, height, and
       interlace method indicated in ahDR rather than the ones in IHDR.

       Typically the best compression is obtained when the montage is very
       wide and not very tall, with similar layers adjacent; however, this
       layout makes it necessary for the decoder to decode the entire
       montage before it can display even the first frame.  If the montage
       is taller and less wide, and earlier layers appear closer to the
       top, it becomes more possible for the decoder to display as it
       decodes, but the compression is likely to suffer.

   Notes on layer composition

       To display a substrate and layers in front of a background, there
       are two approaches.  One way is to composite the substrate over the
       background, then composite the back-most layer over the result,
       and so on, performing the compositing as described in the PNG
       specification.  Another way is to composite the substrate and the
       layers first (from back to front) to yield the frame image, then
       composite the frame image over the background.  The second approach
       allows the frame image to be exported, or cached for re-use in case
       the background changes.

       The second approach can involve compositing over a not-fully-opaque
       image, but the PNG specification does not tell how to do that.  For
       images without alpha channels, it is trivial: just keep the front
       pixel or the back pixel depending on whether the front pixel is
       transparent.  For images with an alpha channel, it can be done as
       follows.

        1. Normalize all the alpha samples the range [0,1].

        2. Gamma-decode all the non-alpha samples (or undo the more
           sophisticated transfer function indicated by sRGB or iCCP) to
           yield samples that are proportional to light intensity.

        3. Multiply every non-alpha sample by the alpha sample from the
           same pixel (that is, convert to premultiplied form).

        4. Store the substrate in an output buffer.

        5. For each layer (in order from back to front), for each pixel in
           the output buffer:

           5a. Let A be the alpha sample of the layer pixel.

           5b. For each channel (including the alpha channel),
               let output_sample = output_sample * (1 - A) + layer_sample

       At this point the output buffer contains a non-gamma-encoded
       premultiplied frame image ready to be composited over a background.
       If the frame image is to be exported, it may be desirable to perform
       additional steps:

        6. Divide each non-alpha sample by the alpha sample of the same
           pixel (that is, convert to non-premultiplied form).

        7. Gamma-encode all the non-alpha samples (or encode them using a
           more sophisticated transfer function).

       Gamma encoding and premultiplication are not commutative--the order
       of steps 2, 3, 6, and 7 is significant.

   Rationale

       Putting all the frame data in one chunk avoids the complication of
       how to deal with reordering by ANG-unaware PNG editors.  The cost
       is encoder streamability, but that capability of animated GIF is
       not used in practice.  If encoder streamability is needed, MNG is
       available.

       Separating the concepts of layer and frame, rather than speaking
       of "zero-duration frames", is more consistent with existing
       animation terminology.  It also helps clarify what is and is not
       lossless, and avoids tempting decoder implementors to momentarily
       display partially-constructed "frames" that were never meant to be
       displayed.

       The use of alpha composition within the animation (between layers)
       rather than just between the animation and the external background
       adds complication that is not strictly necessary, because the
       encoder could precompute the composed frames and include them in
       the montage.  On the other hand, it can improve compression for
       animations that can be modeled as sprites moving over each other
       (possibly with semi-transparent regions and anti-aliased edges),
       and the general alpha-over-alpha compositing is not much different
       from the alpha-over-opaque-background compositing that PNG decoders
       already know.

       The substrate concept avoids awkward specifications of how to
       composite an image that lies partly over something and partly over
       nothing.  It avoids questions of what the frame dimensions are if
       the bounding box of all the layers is smaller than the frame.  It
       ensures that every frame has the same dimensions, which is what
       people expect.  Finally, it lets ANG preserve a property of PNG that
       the background can show through only if an alpha channel or tRNS is
       present.

       The interlace method of the montage is independent of the interlace
       method of the still image because interlacing is less useful for
       animations.

   End of draft.


     __________________________________________________________________

Thread view

   [78][png-mng-misc] ANG draft 7 (with multi-layer frames)
   From: Adam M. Costello <png.amc+<0@ni...> - 2007-05-02 06:36:52
   Since both APNG and anIM allow frames to be constructed by compositing
   multiple images, and since none of the implementors here seem the
   slightest bit put off by this, but instead seem enthusiastic about the
   potential compression gains, I've added this capability to ANG.  I've
   expressed it in a slightly different way, however, distinguishing
   between layers and frames, for reasons given in the Rationale section.

   ANG-7 is more like a midpoint between APNG and anIM.  I've tried to keep
   the most important advantages of both: automatic fallback to a single
   frame in browsers, and a clear distinction between still images and
   animations as indicated by filename extensions and media types.  I am
   still hopeful that the PNG folks and the Mozilla folks can meet each
   other halfway, rather than go off in divergent directions.

   AMC

   Animated Network Graphics (ANG), draft 7 (2007-May-01-Tue)
   Adam M. Costello
   [79]http://www.nicemice.net/amc/

   Changes from draft 6

       Introduced the concepts of layer and substrate, to allow frames to
       be constructed by compositing multiple images, like in APNG and
       anIM, with the goal of improving compression.  This required the
       addition of a section explaining alpha-over-alpha compositing,
       expansion of the remarks about frame numbering, and the addition
       of remarks about the lossyness of frames contrasted with the
       losslessness of layers.

       Moved the discussion of the rejected media type out of the Rationale
       section and into an editorial comment that would not be included in
       a final draft.

   Acknowledgements

       Several good ideas have been taken from the PNG mailing list
       (currently png-mng-misc@...).

   Contents
       Goals
       Relationship to PNG
       Datastream tagging
       Conceptual model
       Datastream format
       Rationale

   Goals

        1) Capabilities comparable to animated GIF, plus the added features
           of PNG (like 24-bit color and alpha).

        2) Automatic fallback to PNG in existing web browsers, using the
           <img> tag, showing an author-selected single frame instead of
           the animation.

        3) Respect for the PNG specification and existing PNG applications
           and users, to the extent possible given goal 2.

        4) Simplicity.

        5) Compression at least as good as in animated GIF, or even better
           if possible in a simple format.  The compression need not rival
           that of a complex format like MNG.

   Relationship to PNG

       ANG is not PNG, but it is deliberately very similar.  PNG contains
       a single still image, whereas ANG contains both a still image and
       an animation.  The ANG datastream format is identical to the PNG
       datastream format (including the signature) except that an ANG
       datastream must contain an ahDR chunk before IDAT and must contain
       an adAT chunk after IDAT, whereas a PNG datastream must not contain
       these chunks, because the PNG specification prohibits multiple
       images in a PNG datastream.

       The still image in an ANG serves two purposes:

        1) A fallback for applications or display technologies (like paper)
           that do not support animation.

        2) A source image to be used (optionally) in the animation, in
           addition to the montage in adAT.

       Unlike PNG and GIF, ANG is not a fully streamable format.  ANG
       encoders cannot produce ANG in a streaming fashion because the
       frame data is contained in a single chunk.  ANG decoders can or
       cannot consume ANG in a streaming fashion depending on how encoders
       choose to lay out the data.  There is typically a trade-off between
       streamability and compression.

   Datastream tagging

       Because both PNG and ANG use the same signature, it is important
       that ANG be tagged correctly.  Its media type is "video/x-ang".  The
       media type "image/png" must not be used for ANG, because ANG is not
       PNG, and because a video is not an image.

       [[ If/when this media type is registered, the "x-" prefix will be
       removed. ]]

       The recommended file extension for ANG is ".ang".  The extension
       ".png" should never be used for ANG, because it is important that
       users be able to easily distinguish PNG and ANG, so that they are
       not surprised if a PNG viewer does not show the animation in an ANG,
       or if a PNG editor drops the animation from an ANG.

       The deliberate similarity between the ANG and PNG formats
       facilitates incremental deployment of ANG, with automatic fallback
       to PNG.  When ANG-unaware PNG-aware applications are fed an ANG
       datastream, they will misinterpret the ANG as a PNG, ignore the
       unrecognized ahDR and adAT chunks, and display the still image.
       This is potentially confusing to users, but hopefully the media type
       and the filename will mitigate that hazard.  For example, this HTML
       inline image will display as a still-image PNG in most ANG-unaware
       web browsers, even though the URL ends in ".ang" and the HTTP server
       tags it as "video/x-ang":

           <img src="[80]http://example/foo.ang">;

       [[ I have verified this for Firefox 1.5 & 2.0, IE 6 & 7, Safari,
       Konqueror 3.1, and Opera Mini.  To help people test other browsers,
       I have created [81]http://www.nicemice.net/amc/test/ang.html, which
       contains three instances of the same inline PNG image, one served as
       "image/png; x-anim=1", one served as "video/x-ang", and one served
       as "application/octet-stream".  The first of those types might, in
       theory, be more likely than "video/x-ang" to facilitate automatic
       fallback, because RFCs 2045 and 2046 require that unrecognize media
       type parameters be ignored.  However, "video/x-ang" works just fine
       in practice, so the param-hackery is not needed. ]]

       The distinct media types for ANG and PNG allow greater control over
       the fallback, if desired.  For example, if you wanted ANG-unaware
       web browsers to fall back to animated GIF rather than still PNG, you
       could do something like this:

           <object data="foo.ang" type="video/x-ang">
           <img src="foo.gif">
           </object>

       ANG-aware decoders should use the following logic to determine
       whether a datastream beginning with the PNG signature is PNG or ANG:

        1) If a media type is available, trust it.

        2) Otherwise, if a filename is available, and it ends with ".png"
           or ".ang" (or any capitalization thereof), trust it.

        3) Otherwise, if ahDR is present, assume ANG.

        4) Otherwise, assume PNG.

       Since ahDR and adAT are invalid in PNG, they are errors if
       encountered in a PNG datastream.  Decoders that recognize them
       should treat them like any erroneous ancillary chunks: ignore them,
       and notify the user if appropriate.  For this particular error, the
       notification could perhaps suggest that the user rename or re-tag
       the file if possible.  Of course decoders that do not recognize
       these chunks will just ignore them.

   Conceptual model

       An ANG datastream encodes a still image, just like PNG, and also
       encodes an animation, which is a sequence of images, called frames,
       all the same width and height as the still image, which are to
       be displayed consecutively in the same place, each for a nonzero
       duration indicated in the datastream (but interactive applications
       should allow the user to pause or jump to the next frame at any
       time).

       Each frame of the animation is the result of stacking zero or more
       constituent images, called layers, in front of a default image,
       called the substrate.  The substrate has the same width, height, and
       position as the frames, and is uniformly filled with a single pixel
       value.  The layers can be smaller than a frame, and they always lie
       completely within the frame boundary.  Each layer is a positioned
       and clipped copy of either the still image or a second image called
       the montage.  The layers and the substrate do not necessarily
       hide what lies behind them, because pixels can be transparent or
       partially transparent.

       The pixel value used to fill the substrate is different for
       different color types:  For color type 1 (indexed-color), it is
       palette index 0.  For color types 0 and 2 (non-indexed without
       alpha) it is the value of the tRNS chunk if present, otherwise all
       zeros (black).  For color types 4 and 6 (non-indexed with alpha), it
       is all zeros (fully transparent black).

       All meta-data (chunk fields) that apply to the still image in IDAT
       also apply to the montage in adAT, with only three exceptions:  The
       width, height, and interlace method in IHDR do not apply to the
       montage.  The montage has its own width, height, and interlace
       method given in ahDR.  All meta-data that apply to each pixel of the
       still image and the montage (like color type, bit depth, significant
       bits, palette, color space, physical size) also apply to each pixel
       of the layers and the substrate.

       The still image and montage are represented losslessly in an ANG
       datastream.  Since the layers are simply positioned and clipped
       copies of those images, they are also represented losslessly.  The
       frames, however, are represented as compositions of images which
       can be lossy.  For datastreams with alpha channels (color types 4
       and 6), the composition involves gamma decoding and alpha blending
       (and perhaps gamma re-encoding), which are subject to floating-point
       round-off errors and slight differences in implementation.  For
       datastreams without alpha channels (color types 0, 1, and 4), the
       composition involves only simple pixel replacement, and the frames
       are lossless.

   Frame and layer numbering

       Frame and layer numbering is specified here for consistency among
       applications that allow users to refer to particular frames and
       layers.  The frame and layer numbers are not used inside ANG
       datastreams.

       The frames of the animation are numbered starting with 1.  Frame 0
       refers to the still image, which unlike the animation frames does
       not have the substrate underlying it.  An animation can request to
       be played more than once, but this does not affect the frame count.

       The layers of the animation are numbered starting with 3.  Layer 0
       refers to the still image.  Layer 1 refers to the montage.  Layer 2
       refers to the substrate.  A frame can inherit the layers of the
       previous frame and add new layers in front, but in this case only
       the new layers get new layer numbers; the inherited layers keep
       their original layer numbers.

       Layers within a frame can also be numbered relative to the frame,
       starting with 0 for the back-most layer.  For example, if frame 2
       is composed of layers 5 and 6, and frame 3 inherits the layers from
       frame 2, then frame-2-layer-0 equals frame-3-layer-0 equals layer 5.

   Datastream format

       See the PNG specification for all aspects of the ANG datastream
       format except the ahDR and adAT chunks, which are specified here.

       [[ If/when these chunks are registered, the second letter of each
       will be capitalized. ]]

       ahDR must appear exactly once, before IDAT.  It contains:

           num_frames (4 bytes, unsigned)
               The number of frame specifiers in adAT.

           ticks_per_second (4 bytes, unsigned)
               Defines the time unit for frame durations.  If this is zero,
               all frame durations are infinite.

           num_plays (4 bytes, unsigned)
               Number of times to play the animation.  Zero means infinity.

           montage_width (4 bytes, unsigned)
               Width of the montage in adAT, in pixels.  Not zero.

           montage_height (4 bytes, unsigned)
               Height of the montage in adAT, in pixels.  Not zero.

           montage_interlace_method (1 byte)
               Interlace method used by the montage in adAT.

           still_image_used (1 byte, boolean)
               Must be 0 or 1.

               If 0, the still image is not used in the animation, and need
               not be decoded in order to display the animation, and the
               layer specifiers in adAT do not include from_still_image
               fields.

               If 1, the still image may be used in the animation, and each
               layer specifier in adAT includes a from_still_image field.

       adAT must appear exactly once, after IDAT.  It contains a compressed
       stream (using the compression method indicated in IHDR), which
       contains a sequence of num_frames frame specifiers, immediately
       followed by a montage.  Each frame specifier contains:

           frame_duration (4 bytes, unsigned)
               Duration of the frame, in ticks.  Zero means infinity.

           keep_prior_layers (1 byte, boolean)
               Must be 0 or 1.

               If 0, this frame has layers indicated by its own layer
               specifiers and no others.

               If 1, the layers indicated by this frame's layer specifiers
               are added (in front) to the stack of layers inherited from
               the previous frame.  For the first frame (frame 1), the
               inherited stack is empty (has zero layers).  Even if the
               animation loops, the first frame does not inherit layers
               from the last frame of the previous loop.

           num_layers (1 byte, unsigned)
               The number of layer specifiers for this frame.

           layer_specifiers (num_layers * (24 + still_image_used) bytes)
               A sequence of num_layers layer specifiers, in order from
               back to front.

       Each layer specifier contains:

           from_still_image (0 or 1 byte, boolean)
               This field appears if and only if the still_image_used
               field of ahDR is 1.  If from_still_image is 0 or absent,
               the montage is the source image for this layer.  If
               from_still_image is present and 1, the still image is the
               source image for this layer.  No other values are allowed.

           shift_left (4 bytes, signed)
           shift_up (4 bytes, signed)
           clip_left (4 bytes, signed)
           clip_top (4 bytes, signed)
           clip_width (4 bytes, unsigned)
           clip_height (4 bytes, unsigned)
               The layer is derived from the source image as follows.
               Starting with the source image positioned with its
               upper-left corner aligned with the upper-left corner of the
               frame, the source image is shifted shift_left pixels to
               the left and shift_up pixels upward, then it is clipped to
               both the clip boundaries and the frame boundaries, where
               clip_left and clip_top are the offsets (in pixels) from the
               upper-left corner of the frame to the upper-left corner of
               the clip rectangle, and clip_width and clip_height are the
               dimensions (in pixels) of the clip rectangle.  Note that
               some of these fields are signed and can be negative.

               The width and height of the layer are the width and
               height of the overlap of the frame rectangle and the clip
               rectangle.  If the two do not overlap, then the width and
               height of the layer are zero, which is not a problem for
               displaying the animation, but is an error when extracting
               layers as PNG datastreams, because a PNG image is required
               to have nonzero width and height.  Therefore encoders should
               not specify zero-area layers (which are pointless anyway).

       Immediately following the sequence of frame specifiers is:

           montage (bytes)
               Filtered scanlines.

       The montage is formatted exactly like the uncompressed contents
       of IDAT chunk data, except that it uses the width, height, and
       interlace method indicated in ahDR rather than the ones in IHDR.

       Typically the best compression is obtained when the montage is very
       wide and not very tall, with similar layers adjacent; however, this
       layout makes it necessary for the decoder to decode the entire
       montage before it can display even the first frame.  If the montage
       is taller and less wide, and earlier layers appear closer to the
       top, it becomes more possible for the decoder to display as it
       decodes, but the compression is likely to suffer.

   Notes on layer composition

       To display a substrate and layers in front of a background, there
       are two approaches.  One way is to composite the substrate over the
       background, then composite the back-most layer over the result,
       and so on, performing the compositing as described in the PNG
       specification.  Another way is to composite the substrate and the
       layers first (from back to front) to yield the frame image, then
       composite the frame image over the background.  The second approach
       allows the frame image to be exported, or cached for re-use in case
       the background changes.

       The second approach can involve compositing over a not-fully-opaque
       image, but the PNG specification does not tell how to do that.  For
       images without alpha channels, it is trivial: just keep the front
       pixel or the back pixel depending on whether the front pixel is
       transparent.  For images with an alpha channel, it can be done as
       follows.

        1. Normalize all the alpha samples the range [0,1].

        2. Gamma-decode all the non-alpha samples (or undo the more
           sophisticated transfer function indicated by sRGB or iCCP) to
           yield samples that are proportional to light intensity.

        3. Multiply every non-alpha sample by the alpha sample from the
           same pixel (that is, convert to premultiplied form).

        4. Store the substrate in an output buffer.

        5. For each layer (in order from back to front), for each pixel in
           the output buffer:

           5a. Let A be the alpha sample of the layer pixel.

           5b. For each channel (including the alpha channel),
               let output_sample = output_sample * (1 - A) + layer_sample

       At this point the output buffer contains a non-gamma-encoded
       premultiplied frame image ready to be composited over a background.
       If the frame image is to be exported, it may be desirable to perform
       additional steps:

        6. Divide each non-alpha sample by the alpha sample of the same
           pixel (that is, convert to non-premultiplied form).

        7. Gamma-encode all the non-alpha samples (or encode them using a
           more sophisticated transfer function).

       Gamma encoding and premultiplication are not commutative--the order
       of steps 2, 3, 6, and 7 is significant.

   Rationale

       Putting all the frame data in one chunk avoids the complication of
       how to deal with reordering by ANG-unaware PNG editors.  The cost
       is encoder streamability, but that capability of animated GIF is
       not used in practice.  If encoder streamability is needed, MNG is
       available.

       Separating the concepts of layer and frame, rather than speaking
       of "zero-duration frames", is more consistent with existing
       animation terminology.  It also helps clarify what is and is not
       lossless, and avoids tempting decoder implementors to momentarily
       display partially-constructed "frames" that were never meant to be
       displayed.

       The use of alpha composition within the animation (between layers)
       rather than just between the animation and the external background
       adds complication that is not strictly necessary, because the
       encoder could precompute the composed frames and include them in
       the montage.  On the other hand, it can improve compression for
       animations that can be modeled as sprites moving over each other
       (possibly with semi-transparent regions and anti-aliased edges),
       and the general alpha-over-alpha compositing is not much different
       from the alpha-over-opaque-background compositing that PNG decoders
       already know.

       The substrate concept avoids awkward specifications of how to
       composite an image that lies partly over something and partly over
       nothing.  It avoids questions of what the frame dimensions are if
       the bounding box of all the layers is smaller than the frame.  It
       ensures that every frame has the same dimensions, which is what
       people expect.  Finally, it lets ANG preserve a property of PNG that
       the background can show through only if an alpha channel or tRNS is
       present.

       The interlace method of the montage is independent of the interlace
       method of the still image because interlacing is less useful for
       animations.

   End of draft.


References

  74. file:///p/png-mng/mailman/message/2439203/
  75. http://www.nicemice.net/amc/
  76. http://example/foo.ang">
  77. http://www.nicemice.net/amc/test/ang.html
  78. file:///p/png-mng/mailman/message/2439203/
  79. http://www.nicemice.net/amc/
  80. http://example/foo.ang">
  81. http://www.nicemice.net/amc/test/ang.html