Date: Wed, 22 Sep 2004 06:42:22 +0000
From: "Adam M. Costello" <png.amc+0@nicemice.net.RemoveThisWord>
To: png-list@ccrc.wustl.edu
Subject: [png-list] Animated Network Graphics (ANG), draft 3
Message-ID: <20040922064222.GA4009~@nicemice.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.5.6+20040722i
Sender: owner-png-list@ccrc.wustl.edu
Precedence: bulk
Reply-To: png-list@ccrc.wustl.edu

Animated Network Graphics (ANG), draft 3 (2004-Sep-21-Tue)
Adam M. Costello
http://www.nicemice.net/amc/

Changes from draft 2

    bit depth --> sample depth
    revised PLTE/tRNS/hIST inheritance
    staging offsets can change
    redesigned action_flags
    source & destination regions may overlap
    ticks_per_second = 0 means all frame durations are infinite
    impossible-to-achieve frame durations may be fudged
    optional ANG signature before PNG datastream
    PLTE forbidden in non-indexed-color PNGanim substreams
    revised start & end of PNGanim datastream grammar
    optimized the notes on compositing

Acknowledgements

    Many good ideas have been taken from png-list.

Contents
    Conceptual model
    Encoding
    Datastream tagging
    External control
    Comparison with APNG 0.4
    Note on compositing

Conceptual model

    Recall that a Portable Network Graphics (PNG) datastream encodes
    a single reference image.  An Animated Network Graphics (ANG)
    datastream encodes a main reference image (intended to be viewed
    alone as a still image) plus a sequence of reference images that are
    building blocks (not frames) of an animation.  The main reference
    image can optionally be used as a building block in the animation.

    The building blocks are represented losslessly, as are the
    instructions for assembling them into frames, but the resulting
    frames are not represented losslessly unless the encoder opts to use
    the same sample depth for all building blocks and avoids compositing
    partially transparent building blocks over other building blocks
    Decoders are allowed to introduce inexactness when compositing
    partially transparent pixels over other pixels and when performing
    sample depth scaling.

    A viewer (that is, a decoder that displays images) shows either the
    main image or the animation, depending on the viewer's capabilities,
    the user's preferences, external signals, and the limitations of the
    medium (for example, paper does not support animation very well).

    Each frame of the animation is shown as a still image in front of a
    decoder-supplied background.  Frames that are not fully opaque allow
    the background to show through.  The frames are shown in succession,
    creating the illusion of motion if their durations are sufficiently
    brief.

    The sequence of frames can be shown more than once (looped).  After
    all iterations of the animation have been shown, a still image is
    shown, which can be either the last frame, the main image, or a
    fully transparent image, as declared in the animation header at the
    start.

    All meta-data belonging to the main image applies not only to the
    main image but to the entire animation as well, with exceptions only
    for IHDR, PLTE, hIST, and tRNS:

      * The main image's IHDR information applies only to the main
        image, except that the width and height apply also to every
        frame, but not to any building blocks other than the main image.

      * If the main image is indexed-color:  The main image's PLTE
        information applies only to the main image and to every
        indexed-color building block that does not provide its own PLTE
        information.  The main image's tRNS information (if present)
        applies only to the main image and to every indexed-color
        building block that provides neither its own PLTE nor tRNS
        information.  The main image's hIST information (if present)
        applies only to the main image.

      * If the main image is not indexed-color:  The main image's
        PLTE/hIST information (if present) is a suggested palette
        and applies to the entire animation.  The main image's tRNS
        information (if present) applies only to the main image and to
        every building block that has the same color type as the main
        image and does not provide its own tRNS information.

    While decoding a PNG animation, a decoder needs to maintain the
    following mutable state:

      * An image buffer at least as large as the main image.  There is
        a rectangular region of this buffer, with dimensions matching
        the main image, that can be displayed as a frame; this region
        is the staging area for constructing the next frame.  The
        rest of the buffer cannot be displayed, and serves as scratch
        space.  The staging area can move, but only immediately after
        it is displayed; therefore the decoder always knows where the
        next frame will come from as it is being constructed.  The
        minimal set of channels needed for the buffer is declared in the
        animation header, though a decoder is free to use more channels
        (for example, a decoder could use RGBA even if grayscale without
        alpha would suffice).  If the decoder is extracting lossless
        frames then the buffer needs to have the same sample depth
        as the building blocks.  If the decoder is merely displaying
        the animation then the buffer can have any sample depth, or
        indeed any representation.  The buffer supports the following
        operations:  Read a rectangular region of the buffer, write a
        rectangular region of the buffer, composite a rectangular region
        over the buffer, display the staging area as a frame.

      * The frame duration, which determines how long a frame is
        displayed.  This value can be changed just before a frame is
        displayed, in order to affect the duration of that frame, but
        changing this variable has no effect on the duration of the
        frame that is already on the display.  A duration of zero means
        infinity, which can be useful in conjunction with external
        signals (see section "External control").  It is deliberately
        impossible to assign an actual zero duration, because that would
        indicate that the frame is not intended to be displayed at all,
        in which case it does not deserve to be called a frame, it is
        merely an intermediate state along the way to constructing a
        frame.

      * An iteration counter, which counts the number of times the
        animation has been played so that the decoder knows when to stop
        looping.

    The animation is represented as a sequence of actions of the
    following kinds: decode a compressed datastream into or onto the
    buffer, copy a rectangular region of the buffer into or onto a
    same-sized region of the buffer, alter the frame duration, display
    the staging area as a frame, redefine the staging area,

    The frame durations are ideal targets, but sometimes decoders will
    be unable to achieve the targets because of forces beyond their
    control.  For example, there might not be sufficient computing
    resources to keep up, or the data might be streaming in too slowly,
    or the decoder might be under constraints to keep the animation
    synchronized with something else.  Decoders may shorten or lengthen
    the frame durations when necessary.

Encoding

    The encoding of the model described above is largely independent
    of (and separable from) the model itself.

    Conjecture:  It is possible to encode the model using a restricted
    profile of MNG.

    Another approach is to encode the model in a way that allows
    ANG-unaware PNG decoders to decode the main image.  There are
    various ways to achieve that, one of which is described here.

    Bit numbering convention:  Bit 0 means the least significant bit.
    Bit i means the bit with value 2^i.

    An Animated Network Graphics (ANG) datastream is a simple container
    format with the following grammar:

        ANG_datastream ::=
            ANG_signature?
            PNG_datastream
            PNGanim_datastream

        ANG_signature = 138 65 78 71

        PNGanim_datastream ::=
            PNGanim_signature
            ( AnIC PNGanim_building_block? )*
            AnIE

        PNGanim_signature = 0 0 0 0 137 80 78 71 97 110 105 109

        PNGanim_building_block ::= IHDR PLTE? tRNS? IDAT+

    The ANG signature is the byte 138 followed by "ANG" in ASCII.  It
    does not replicate the line-ending corruption detection features of
    the PNG signature, because the PNG signature is still present and
    fulfills that role.  Notice that the ANG signature can be omitted.
    See section "Datastream tagging".

    The PNG datastream inside an ANG datastream is a regular PNG
    datastream, except that it must contain exactly one anIH (animation
    header) chunk, which must appear before IDAT.

    The PNGanim datastream cannot be interpreted in the absence of the
    preceeding PNG datastream.  Within the PNGanim datastream, no other
    chunk types are allowed besides the ones listed in the grammar rules
    above, unless they are defined specifically for PNGanim datastreams.
    Regular PNG chunks other than those listed in the grammar rule
    belong in the PNG datastream, not the PNGanim datastream.  Of course
    decoders must be prepared to encounter and ignore unknown ancillary
    chunks anywhere between the PNGanim signature and the AnIE chunk.

    The PNGanim signature is the bytes 0,0,0,0,137 followed by "PNGanim"
    in ASCII.  There should not be any decoders looking for chunks
    after IEND, but in case there are, they will find what appears to
    be a zero-length chunk with a syntactically invalid chunk name and
    an incorrect CRC.  The PNGanim signature does not replicate the
    line-ending corruption detection features of the PNG signature,
    because a PNGanim datastream is always preceeded by a PNG datastream
    with a PNG signature to fulfill that role.  A PNGanim datastream
    would be useless without an accompanying PNG datastream to serve as
    the main image and carry the anIH chunk,

    A PNGanim building block closely resembles a PNG datastream,
    but is not one.  It does not begin with the PNG signature, does
    not end with IEND, and might lack PLTE even when the color type
    is indexed-color, and even when tRNS is present.  However, its
    similarity to a PNG datastream is designed to facilitate reuse of
    PNG decoding libraries.

    Within a PNGanim building block, PLTE may be present only if
    the color type in IHDR is indexed-color (there are no suggested
    palettes), and tRNS (if present) must have the form appropriate for
    the color type, as in PNG.

    AnIE is an empty chunk to mark the end of the PNGanim datastream.
    A chunk type other than IEND is used so that PNG-animation-aware
    programs encountering a file beginning with the PNG signature can
    quickly seek to the end of the file and determine whether a PNGanim
    datastream is present, rather than having to scan forward to see
    whether anIH appears before IDAT.

    anIH contains the following fields:

        buffer_width (4 bytes, unsigned)
        buffer_height (4 bytes, unsigned)
        staging_X_offset (4 bytes, unsigned)
        staging_Y_offset (4 bytes, unsigned)
            These must satisfy:
                staging_X_offset + main image width <= buffer_width
                staging_Y_offset + main image height <= buffer_height

        ticks_per_second (4 bytes, unsigned)
            Defines the time unit for frame durations.  If this is
            zero, all frame durations (including zero) are treated as
            infinite.

        num_iterations (4 bytes, unsigned)
            Zero means infinity.

        promises (1 byte)
            Declares limitations that the encoder has imposed on the
            building blocks and actions in the animation.  Each set bit
            constitutes a promise as follows:

                bit 0 ==> Color is not used.  Explicit RGB channels
                          never appear, and PLTE entries always satisfy
                          R=G=B.
                bit 1 ==> Full alpha is not used.  Explicit alpha
                          channels never appear, and every tRNS entry is
                          0 or 255.
                bit 2 ==> tRNS never appears.
                bit 3 ==> No partially transparent pixels are ever
                          composited over the buffer.  If bit 1 is set
                          this limitation is automatically satisfied.
                bit 4 ==> All building blocks have the same sample
                          depth.
                bit 5 ==> The staging offsets never change.

            The remaining two bits are reserved; encoders must not
            set them (because they cannot know what they would be
            promising).

        after_image (1 byte)
            To be displayed in the "after" state:
                0 ==> fully transparent
                1 ==> last frame
                2 ==> main image

        iteration_start_action (variable-length, optional)
            At most one action can appear here, using the same syntax as
            the actions that appear in AnIC (see below).  If an action
            is present, its write_buffer flag must be set, and its
            read_buffer flag must be unset (the main image serves as the
            source).

    The main image serves as a building block of the animation iff an
    action is present in anIH.  If so, it is bound by the promises, same
    as any other building block.

    Encoders are encouraged to set the appropriate bits of the promises
    field, but an unset bit is always valid, because it is not a
    negative promise, but merely the lack of a promise.  Decoders may
    ignore any of the bits (including the reserved ones) or they can use
    the bits to enable optimized decoding routines.  If the conditions
    for bits 3 and 4 are both satisfied, the frames are represented
    losslessly and can be recovered exactly.

    There is no palette or tRNS data associated with the buffer.
    Palette and tRNS data is used for decoding compressed datastreams,
    not for displaying or interpreting the buffers.

    At the start of each iteration of the animation the following four
    actions are performed:

      * The entire buffer is initialized to fully opaque black.

        [[ Fully transparent was my first inclination, but if the
        encoder promises not to use transparency of any kind, we
        ought to let the decoder use a buffer that lacks transparency
        information. ]]

      * The staging area offsets are initialized to their values from
        anIH.

      * The frame duration is initialized to 1 tick.

      * The action in anIH, if present, is performed.  This action must
        be performed after the other three.

    Actions to be performed during an iteration of the animation are
    indicated by AnIC chunks.  An AnIC chunk contains a sequence of one
    or more actions.  An action contains:

        action_flags (1 byte)
            bit 0: reserved (nonzero is a fatal error)
            bit 1: write_buffer (boolean)
            bit 2: read_buffer (boolean)
            bit 3: composite_over (boolean)
            bit 4: change_frame_duration (boolean)
            bit 5: display_staging_area (boolean)
            bit 6: change_staging_offsets (boolean)
            bit 7: reserved (nonzero is a fatal error)

            If write_buffer is unset then read_buffer and composite_over
            must also be unset.  If display_staging_area is unset then
            change_frame_duration and change_staging_offsets must also
            be unset.  Violations of these rules are fatal errors.

            The flags write_buffer, read_buffer, change_frame_duration,
            and change_staging_offsets indicate that additional fields
            are present (see below).  Those fields appear in bit-number
            order.

            The flags write_buffer, change_frame_duration,
            display_staging_area, and change_staging_offsets indicate
            that actions are to be performed (the other two flags,
            read_buffer and composite_over, are parameters of the
            write_buffer action).  The actions are performed in
            bit-number order.

        destination_X_offset (1, 2, or 4 bytes, unsigned)
        destination_Y_offset (1, 2, or 4 bytes, unsigned)
            These are present iff write_buffer is set.  They must
            satisfy:
                destination_X_offest + source_width <= buffer_width
                destination_Y_offest + source_height <= buffer_height
            If read_buffer is unset then source_width and source_height
            are the width and height from the IHDR in the PNGanim
            building block following the AnIC chunk.

        source_width    (1, 2, or 4 bytes, unsigned)
        source_X_offset (1, 2, or 4 bytes, unsigned)
        source_height   (1, 2, or 4 bytes, unsigned)
        source_Y_offset (1, 2, or 4 bytes, unisgned)
            These are present iff read_buffer is set.  They must
            satisfy:
                source_X_offest + source_width <= buffer_width
                source_Y_offest + source_height <= buffer_height

        frame_duration (4 bytes, signed)
            Present iff change_frame_duration is set.

        staging_X_offset (1, 2, or 4 bytes, unsigned)
        staging_Y_offset (1, 2, or 4 bytes, unsigned)
            These are present iff change_staging_offsets is set.  They
            must satisfy:
                staging_X_offset + main image width <= buffer_width
                staging_Y_offset + main image height <= buffer_height

    Within AnIC chunks, all horizontal dimensions are 1 byte if
    buffer_width <= 255, 2 bytes if buffer_width <= 65535, 4 bytes
    otherwise.  Vertical dimensions depend on buffer_height analogously.
    Recall that buffer_width and buffer_height are set once in anIH and
    never change.

    The write_buffer flag indicates whether a buffer modification is
    to be performed.  When it is set, the read_buffer flag indicates
    whether the source of the operation is the buffer or the stream, and
    the composite_over flag indicates whether the source replaces the
    destination or is composited over it.

    When the source is a region of the buffer, it may overlap the
    destination region.  Decoders must handle this case correctly
    and not overwrite source pixels before reading them.  Two scan
    orders will suffice.  For example, suppose ForwardScan scans
    pixels from left to right within rows and scans rows from top to
    bottom within regions, while ReverseScan scans pixels from right
    to left within rows and scans rows from bottom to top within
    regions.  ForwardScan will work correctly whenever (destY,destX) <=
    (srcY,srcX) lexicographically, and ReverseScan will work correctly
    whenever (srcY,srcX) <= (destY,destX).

    When write_buffer is set and read_buffer is unset, the action must
    be the last action in the AnIC chunk, and the AnIC chunk must be
    followed (before the next critical chunk) by a PNGanim building
    block to be used as the source of the buffer modification.

    If display_staging_area is set, the staging area is displayed as
    a frame (after any buffer modification has been performed).  The
    number of frames in the animation equals the number of actions with
    set display_staging_area flags.  If display_staging_area is set,
    change_frame_duration may be set in order to make the duration of
    this frame different from the duration of the previous frame, and
    change_staging_offsets may be set in order to make the staging area
    of the next frame different from the staging area of this frame.

    A PNGanim building block is decoded like a PNG datastream, except
    that PLTE may be absent when color_type is 3 (indexed-color) if the
    main PNG datastream contains a PLTE chunk, and PLTE and tRNS are
    inherited from the main PNG datastream under certain conditions
    specified above in section "Conceptual model".

    On encountering any fatal error while decoding the animation,
    decoders must cease decoding the animation, and must fall back to
    displaying the main image if possible.

Datastream tagging

    When backward compatibility with ANG-unaware decoders is not needed,
    the ANG signature should be present, the preferred media type is
    "video/ang", and the preferred file extension is ".ang".

    When necessary for interoperating with animation-unaware decoders,
    the ANG signature may be omitted, and the media type "image/png" may
    be used.  The file extension ".png" can also be used, but should be
    avoided if at all possible, because it will tend to confuse users.

    The new media type is necessary for applying greater control over
    fallback.  For example, if you want ANG-unaware web browsers to fall
    back to animated GIF rather than still PNG, you need to do something
    like

        <object data="foo.ang" type="video/ang">
        <img src="foo.gif">
        </object>

    or something analogous at the HTTP layer using content negotation.
    Without a new media type, this control would not be possible; web
    browsers that support PNG but not ANG would always fall back to
    still PNG.

    The ANG signature is useful when failure is preferable to fallback,
    such as when the animation is considered essential and showing a
    still image would be misleading.

External control

    [[ This is a rough idea that should be reworked after a review of
    relevant standards, or removed. ]]

    Animation viewers can optionally support five external control
    signals "stop", "start", "suspend", "resume", and "advance".  A
    viewer that supports these signals is free to map them to the user
    interface in any way (possibly using fewer buttons than signals)
    and to provide additional controls (for example, to override the
    frame durations, or to play in reverse if all the previous frames
    have been cached).  This control model is intended as a common set
    of primitives that could be useful for scripting (especially in web
    pages).  Bindings to scripting languages and/or the document object
    model would need to be standardized.

    The "suspend" and "resume" signals simply set and clear a
    "suspended" flag.  The other three signals cause state transitions
    between the three states "main", "during", and "after".  It would
    be useful to be able to externally control the initial state of the
    animation: main, during, or during-suspended.

    The states and signals behave as follows:

    "main"
        The main image is displayed.
            "stop"    ==> "main" (no effect)
            "start"   ==> "during" (first frame, first iteration)
            "advance" ==> equivalent to "suspend" then "start"

    "during"
        The animation frames are displayed in sequence, looping from the
        last to the first as declared in the animation header.  If the
        "suspended" flag is unset, each frame endures for its alloted
        duration or until a signal arrives.  If the "suspended" flag is
        set, each frame endures until a signal arrives.
            "stop"    ==> "main"
            "start"   ==> "during" (restart from 1st frame & iteration)
            "advance" ==> "during" or "after" (see below)
        The "advance" signal terminates the currently displayed frame
        immediately (before its duration has fully elapased).  When
        the last frame of the last iteration is terminated (either by
        the passage of time or by an "advance" signal), there is a
        transition to "after".

    "after"
        A still image is displayed, either the last frame, the main
        image, or a fully transparent image, as declared in the
        animation header.
            "stop"    ==> "main"
            "start"   ==> "during" (first frame, first iteration)
            "advance" ==> "after" (no effect)

    Frames are numbered starting with 1.  Frame numbering is
    standardized because it might be useful for scripting.  The image
    displayed in the "after" state does not count as a frame.  The
    number of frames is independent of the number of iterations (looping
    merely displays the same frames again, it does not duplicate the
    frames).  "Frame 0" is a convenient misnomer for the main image
    (which is not part of the animation, and might or might not be the
    same as frame 1, the first frame of the animation).

Comparison with APNG 0.4

    APNG takes the view that an APNG is a PNG.  This draft takes the
    view that an ANG is not a PNG, an ANG contains a PNG, and an ANG
    can be disguised as a PNG when necessary (by omitting the ANG
    signature).

    This draft puts the animation data in a separate datastream after
    the PNG datastream, rather than embedding it in chunks inside the
    PNG datastream.  This is simpler because sequence numbers are not
    necessary, and there is no chunk-in-chunk embedding.  It also avoids
    the controversy about PNG being a single-image format.  The one
    disadvantage is that animations will not immediately be usable
    inside container files that don't mark the end of the embedded
    PNG, but rather depend on IEND to mark the end.  Encoders of such
    containers (if properly written) will either refuse to insert an ANG
    or will insert only the main PNG datastream and discard the rest.
    This is probably an uncommon case, however, not a deal breaker.
    The killer app is ANG files, for which the two-datastream approach
    should work fine.

    The chunk structure of the PNGanim datastream is identical to the
    chunk structure of the datastream encapsulated in APNG's aDAT.

    Both APNG and this draft use a single image buffer that supports
    both pixel replacement and composite-over operations (APNG calls it
    a canvas).  This draft relaxes the requirement that the buffer have
    the same dimensions as the main image and the frames, and instead
    allows the buffer to be larger, with the extra area available for
    use as scratch space.  This relaxation is not expected to add much
    complexity, since there's still just one buffer, and the location
    of the frame staging area is still known while a frame is being
    constructed.

    In APNG, the only way to modify the buffer is by decoding an
    embedded image.  This draft adds the ability to copy pixels from
    one part of the buffer into or onto another part.  This is not
    difficult, and has great potential to reduce the size of animations,
    especially in combination with the scratch space.

    Both drafts allow the same kinds of drawing operations: replace and
    composite-over.  The APNG explanation of how to do composite-over is
    fatally incomplete.  This draft adapts the explanation from the MNG
    spec.

    APNG neglects to specify the initial state of the buffer
    (canvas), so we don't know how a decoder should handle
    APNG_RENDER_OP_DISPOSE_PREVIOUS in the first fCTL chunk.  APNG
    does not specify that the buffer is reinitialized at the start of
    each iteration.  If it's not, then every iteration can produce a
    different set of frames (for example, if a one-frame animation
    composites a partially transparent image over the buffer, then
    the buffer will get more and more opaque on each iteration),
    and the decoder lacks the option of caching the frames to avoid
    reconstructing them on each iteration.  In this draft, the buffer is
    initialized at the start of each iteration, so the decoder always
    has the option of caching the frames.

    This draft adds mutable state variables that are absent in APNG:
    the frame duration and the staging offsets.  In APNG the staging
    offsets are always implicitly zero, and the frame duration appears
    explicitly for every frame.  In this draft, the frame duration
    and staging offsets are remembered by the decoder; they appear
    explicitly only when they change.  In most animations, almost all
    frames will have the same duration.

    This draft expresses offsets using integers of a size appropriate
    for the total buffer dimensions.  This wouldn't be worth the trouble
    in APNG, where the offsets are dwarfed by the embedded image that
    invariably follows, but in this draft, where a single AnIC chunk can
    contain a long list of intra-buffer operations, the savings from
    shorter offsets and ellided frame durations can add up.

    APNG uses a few predefined "macros" for manipulating the image
    buffer, called disposal methods.  This draft uses more primitive
    operations that that be used to simulate the macros or to do more
    custom things.

    The meaning of APNG_RENDER_OP_SKIP_FRAME is not entirely clear
    from the APNG 0.4 spec, but discussions have revealed that it was
    *not* intended to be equivalent to display_staging_area (with
    reversed polarity).  Whereas an unset display_staging_area means
    modify the buffer but don't display the result (yet), a set
    APNG_RENDER_OP_SKIP_FRAME means don't modify the buffer at all and
    don't even decode the embedded image.

    PLTE/tRNS inheritance is slightly different.  In APNG, tRNS can be
    inherited by an indexed-color image even if PLTE is not inherited,
    which is unlikely to be useful, given how tightly coupled the two
    chunks are.  An empty tRNS chunk would need to be included to
    avoid inheriting tRNS.  In this draft, tRNS is not inherited by an
    indexed-color image if PLTE is not inherited.

    This draft uses 4 bytes for the numerator and denominator of the
    frame duration rather than 2 bytes as in APNG, and uses one global
    denominator rather than a per-frame denominator,

    A frame delay of zero means "as quickly as possible" in APNG, but
    means infinity in this draft.

    In both drafts, zero iterations means loop infinitely.

    Both drafts use their respective usual mechanisms for incorporating
    the main image into the animation, same as any other image, although
    this draft puts it at the end of the animation header chunk rather
    than in its own chunk, because AnIC is a critical chunk and cannot
    be used in the PNG datastream.

    APNG always finishes an animation by holding the last frame.  This
    draft adds two alternatives: become fully transparent, or display
    the main image.  The last option is expected to be useful quite
    often.

    This draft adds a field in the header where encoders can (but need
    not) inform decoders of potential optimizations (which decoders can
    ignore).

    Both drafts specify that PNG meta-data generally applies to the
    whole animation, not just the main image or any one frame or
    embedded image.

    In APNG there is a one-to-one correspondence between images and
    frames, and all images (even skipped images) get frame numbers,
    starting with 0 for the main image.  In this draft, "frame 0" refers
    to the main image, and frames (not building blocks) are numbered
    starting at 1, so "frame 1" always refers to the first frame, even
    if it is identical to the main image.

    This draft provides recommendations for the use of both old and new
    media types, signatures, and file extensions.

    This draft provides an optional external control model to support
    scripting.

Note on compositing

    (Adapted from the MNG 1.0 specification, section 11.3.)

    The PNG specification gives a good explanation of how to composite
    a partially transparent image over an opaque image, but things get
    more complicated when both images are partially transparent.

    Pixels in PNG and PNGanim are represented using gamma-encoded RGB
    (or gray) samples along with a linear alpha value.  Alpha processing
    can be performed only on linear samples.  This section assumes that
    R, G, B, and A values have all been converted to real numbers in the
    range [0..1], and that any gamma encoding has been undone.

    For a top pixel (Rt,Gt,Bt,At) and a bottom pixel (Rb,Gb,Bb,Ab), the
    composite pixel (Rc,Gc,Bc,Ac) is given by:

        Ac = 1 - (1 - At)(1 - Ab)
        if (Ac != 0) then
          s = At / Ac
          t = (1 - At) Ab / Ac
        else
          s = 0.0
          t = 1.0
        endif
        Rc = s Rt + t Rb
        Gc = s Gt + t Gb
        Bc = s Bt + t Bb

    When the bottom pixel is fully opaque (Ab = 1.0), the function
    reduces to:

        Ac = 1
        Rc = At Rt + (1 - At) Rb
        Gc = At Gt + (1 - At) Gb
        Bc = At Bt + (1 - At) Bb

    When the bottom pixel is not fully opaque, the function is much
    simpler if pixels are represented as (R*,G*,B*,A*) rather than
    (R,G,B,A), where A* is the complement of A, and (R*,G*,B*) is
    (R,G,B) "premultiplied" by A:

        A* = 1 - A
        R* =  R A
        G* =  G A
        B* =  B A

    For a top pixel (Rt*,Gt*,Bt*,At*) and a bottom pixel
    (Rb*,Gb*,Bb*,Ab*), the composite pixel (Rc*,Gc*,Bc*,Ac*) is given
    by:

        Ac* =  0  + At* Ab*
        Rc* = Rt* + At* Rb*
        Gc* = Gt* + At* Gb*
        Bc* = Bt* + At* Bb*

    As mentioned in the PNG specification, the equations become much
    simpler when no pixel has an alpha value other than 0.0 or 1.0, and
    the RGB samples need not be linear in that case.

End of draft.