446 lines
		
	
	
		
			18 KiB
		
	
	
	
		
			HTML
		
	
	
	
			
		
		
	
	
			446 lines
		
	
	
		
			18 KiB
		
	
	
	
		
			HTML
		
	
	
	
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
 | 
						|
<html>
 | 
						|
<head>
 | 
						|
 | 
						|
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-15"/>
 | 
						|
<title>Ogg Documentation</title>
 | 
						|
 | 
						|
<style type="text/css">
 | 
						|
body {
 | 
						|
  margin: 0 18px 0 18px;
 | 
						|
  padding-bottom: 30px;
 | 
						|
  font-family: Verdana, Arial, Helvetica, sans-serif;
 | 
						|
  color: #333333;
 | 
						|
  font-size: .8em;
 | 
						|
}
 | 
						|
 | 
						|
a {
 | 
						|
  color: #3366cc;
 | 
						|
}
 | 
						|
 | 
						|
img {
 | 
						|
  border: 0;
 | 
						|
}
 | 
						|
 | 
						|
#xiphlogo {
 | 
						|
  margin: 30px 0 16px 0;
 | 
						|
}
 | 
						|
 | 
						|
#content p {
 | 
						|
  line-height: 1.4;
 | 
						|
}
 | 
						|
 | 
						|
h1, h1 a, h2, h2 a, h3, h3 a, h4, h4 a {
 | 
						|
  font-weight: bold;
 | 
						|
  color: #ff9900;
 | 
						|
  margin: 1.3em 0 8px 0;
 | 
						|
}
 | 
						|
 | 
						|
h1 {
 | 
						|
  font-size: 1.3em;
 | 
						|
}
 | 
						|
 | 
						|
h2 {
 | 
						|
  font-size: 1.2em;
 | 
						|
}
 | 
						|
 | 
						|
h3 {
 | 
						|
  font-size: 1.1em;
 | 
						|
}
 | 
						|
 | 
						|
li {
 | 
						|
  line-height: 1.4;
 | 
						|
}
 | 
						|
 | 
						|
#copyright {
 | 
						|
  margin-top: 30px;
 | 
						|
  line-height: 1.5em;
 | 
						|
  text-align: center;
 | 
						|
  font-size: .8em;
 | 
						|
  color: #888888;
 | 
						|
  clear: both;
 | 
						|
}
 | 
						|
</style>
 | 
						|
 | 
						|
</head>
 | 
						|
 | 
						|
<body>
 | 
						|
 | 
						|
<div id="xiphlogo">
 | 
						|
  <a href="http://www.xiph.org/"><img src="fish_xiph_org.png" alt="Fish Logo and Xiph.org"/></a>
 | 
						|
</div>
 | 
						|
 | 
						|
<h1>Page Multiplexing and Ordering in a Physical Ogg Stream</h1>
 | 
						|
 | 
						|
<p>The low-level mechanisms of an Ogg stream (as described in the Ogg
 | 
						|
Bitstream Overview) provide means for mixing multiple logical streams
 | 
						|
and media types into a single linear-chronological stream. This
 | 
						|
document specifies the high-level arrangement and use of page
 | 
						|
structure to multiplex multiple streams of mixed media type within a
 | 
						|
physical Ogg stream.</p>
 | 
						|
 | 
						|
<h2>Design Elements</h2>
 | 
						|
 | 
						|
<p>The design and arrangement of the Ogg container format is governed by
 | 
						|
several high-level design decisions that form the reasoning behind
 | 
						|
specific low-level design decisions.</p>
 | 
						|
 | 
						|
<h3>Linear media</h3> 
 | 
						|
 | 
						|
<p>The Ogg bitstream is intended to encapsulate chronological,
 | 
						|
time-linear mixed media into a single delivery stream or file. The
 | 
						|
design is such that an application can always encode and/or decode a
 | 
						|
full-featured bitstream in one pass with no seeking and minimal
 | 
						|
buffering. Seeking to provide optimized encoding (such as two-pass
 | 
						|
encoding) or interactive decoding (such as scrubbing or instant
 | 
						|
replay) is not disallowed or discouraged, however no bitstream feature
 | 
						|
must require nonlinear operation on the bitstream.</p>
 | 
						|
 | 
						|
<h3>Multiplexing</h3>
 | 
						|
 | 
						|
<p>Ogg bitstreams multiplex multiple logical streams into a single
 | 
						|
physical stream at the page level. Each page contains an abstract
 | 
						|
time stamp (the Granule Position) that represents an absolute time
 | 
						|
landmark within the stream. After the pages representing stream
 | 
						|
headers (all logical stream headers occur at the beginning of a
 | 
						|
physical bitstream section before any logical stream data), logical
 | 
						|
stream data pages are arranged in a physical bitstream in strict 
 | 
						|
non-decreasing order by chronological absolute time as 
 | 
						|
specified by the granule position.</p>
 | 
						|
 | 
						|
<p>The only exception to arranging pages in strictly ascending time order
 | 
						|
by granule position is those pages that do not set the granule
 | 
						|
position value. This is a special case when exceptionally large
 | 
						|
packets span multiple pages; the specifics of handling this special
 | 
						|
case are described later under 'Continuous and Discontinuous
 | 
						|
Streams'.</p>
 | 
						|
 | 
						|
<h3>Seeking</h3> 
 | 
						|
 | 
						|
<p>Ogg is designed to use a bisection search to implement exact
 | 
						|
positional seeking rather than building an index; an index requires
 | 
						|
two-pass encoding and as such is not acceptable given the requirement
 | 
						|
for full-featured linear encoding.</p>
 | 
						|
 | 
						|
<p><i>Even making an index optional then requires an
 | 
						|
application to support multiple methods (bisection search for a
 | 
						|
one-pass stream, indexing for a two-pass stream), which adds no
 | 
						|
additional functionality as bisection search delivers the same
 | 
						|
functionality for both stream types.</i></p>
 | 
						|
 | 
						|
<p>Seek operations are by absolute time; a direct bisection search must
 | 
						|
find the exact time position requested. Information in the Ogg
 | 
						|
bitstream is arranged such that all information to be presented for
 | 
						|
playback from the desired seek point will occur at or after the
 | 
						|
desired seek point. Seek operations are neither 'fuzzy' nor
 | 
						|
heuristic.</p>
 | 
						|
 | 
						|
<p><i>Although key frame handling in video appears to be an exception to
 | 
						|
"all needed playback information lies ahead of a given seek",
 | 
						|
key frames can still be handled directly within this indexless
 | 
						|
framework. Seeking to a key frame in video (as well as seeking in other
 | 
						|
media types with analogous restraints) is handled as two seeks; first
 | 
						|
a seek to the desired time which extracts state information that
 | 
						|
decodes to the time of the last key frame, followed by a second seek
 | 
						|
directly to the key frame. The location of the previous key frame is
 | 
						|
embedded as state information in the granulepos; this mechanism is
 | 
						|
described in more detail later.</i></p>
 | 
						|
 | 
						|
<h3>Continuous and Discontinuous Streams</h3>
 | 
						|
 | 
						|
<p>Logical streams within a physical Ogg stream belong to one of two
 | 
						|
categories, "Continuous" streams and "Discontinuous" streams.
 | 
						|
Although these are discussed in more detail later, the distinction is
 | 
						|
important to a high-level understanding of how to buffer an Ogg
 | 
						|
stream.</p>
 | 
						|
 | 
						|
<p>A stream that provides a gapless, time-continuous media type with a
 | 
						|
fine-grained timebase is considered to be 'Continuous'. A continuous
 | 
						|
stream should never be starved of data. Clear examples of continuous
 | 
						|
data types include broadcast audio and video.</p>
 | 
						|
 | 
						|
<p>A stream that delivers data in a potentially irregular pattern or with
 | 
						|
widely spaced timing gaps is considered to be 'Discontinuous'. A
 | 
						|
discontinuous stream may be best thought of as data representing
 | 
						|
scattered events; although they happen in order, they are typically
 | 
						|
unconnected data often located far apart. One possible example of a
 | 
						|
discontinuous stream types would be captioning. Although it's
 | 
						|
possible to design captions as a continuous stream type, it's most
 | 
						|
natural to think of captions as widely spaced pieces of text with
 | 
						|
little happening between.</p>
 | 
						|
 | 
						|
<p>The fundamental design distinction between continuous and
 | 
						|
discontinuous streams concerns buffering.</p>
 | 
						|
 | 
						|
<h3>Buffering</h3>
 | 
						|
 | 
						|
<p>Because a continuous stream is, by definition, gapless, Ogg buffering
 | 
						|
is based on the simple premise of never allowing any active continuous
 | 
						|
stream to starve for data during decode; buffering proceeds ahead
 | 
						|
until all continuous streams in a physical stream have data ready to
 | 
						|
decode on demand.</p>
 | 
						|
 | 
						|
<p>Discontinuous stream data may occur on a fairly regular basis, but the
 | 
						|
timing of, for example, a specific caption is impossible to predict
 | 
						|
with certainty in most captioning systems. Thus the buffering system
 | 
						|
should take discontinuous data 'as it comes' rather than working ahead
 | 
						|
(for a potentially unbounded period) to look for future discontinuous
 | 
						|
data. As such, discontinuous streams are ignored when managing
 | 
						|
buffering; their pages simply 'fall out' of the stream when continuous
 | 
						|
streams are handled properly.</p>
 | 
						|
 | 
						|
<p>Buffering requirements need not be explicitly declared or managed for
 | 
						|
the encoded stream; the decoder simply reads as much data as is
 | 
						|
necessary to keep all continuous stream types gapless (also ensuring
 | 
						|
discontinuous data arrives in time) and no more, resulting in optimum
 | 
						|
implicit buffer usage for a given stream. Because all pages of all
 | 
						|
data types are stamped with absolute timing information within the
 | 
						|
stream, inter-stream synchronization timing is always explicitly
 | 
						|
maintained without the need for explicitly declared buffer-ahead
 | 
						|
hinting.</p>
 | 
						|
 | 
						|
<p>Further details, mechanisms and reasons for the differing arrangement
 | 
						|
and behavior of continuous and discontinuous streams is discussed
 | 
						|
later.</p>
 | 
						|
 | 
						|
<h3>Whole-stream navigation</h3>
 | 
						|
 | 
						|
<p>Ogg is designed so that the simplest navigation operations treat the
 | 
						|
physical Ogg stream as a whole summary of its streams, rather than
 | 
						|
navigating each interleaved stream as a separate entity.</p>
 | 
						|
 | 
						|
<p>First Example: seeking to a desired time position in a multiplexed (or
 | 
						|
unmultiplexed) Ogg stream can be accomplished through a bisection
 | 
						|
search on time position of all pages in the stream (as encoded in the
 | 
						|
granule position). More powerful searches (such as a key frame-aware
 | 
						|
seek within video) are also possible with additional search
 | 
						|
complexity, but similar computational complexity.</p>
 | 
						|
 | 
						|
<p>Second Example: A bitstream section may consist of three multiplexed
 | 
						|
streams of differing lengths. The result of multiplexing these
 | 
						|
streams should be thought of as a single mixed stream with a length
 | 
						|
equal to the longest of the three component streams. Although it is
 | 
						|
also possible to think of the multiplexed results as three concurrent
 | 
						|
streams of different lengths and it is possible to recover the three
 | 
						|
original streams, it will also become obvious that once multiplexed,
 | 
						|
it isn't possible to find the internal lengths of the component
 | 
						|
streams without a linear search of the whole bitstream section.
 | 
						|
However, it is possible to find the length of the whole bitstream
 | 
						|
section easily (in near-constant time per section) just as it is for a
 | 
						|
single-media unmultiplexed stream.</p>
 | 
						|
 | 
						|
<h2>Granule Position</h2>
 | 
						|
 | 
						|
<h3>Description</h3>
 | 
						|
 | 
						|
<p>The Granule Position is a signed 64 bit field appearing in the header
 | 
						|
of every Ogg page. Although the granule position represents absolute
 | 
						|
time within a logical stream, its value does not necessarily directly
 | 
						|
encode a simple timestamp. It may represent frames elapsed (as in
 | 
						|
Vorbis), a simple timestamp, or a more complex bit-division encoding
 | 
						|
(such as in Theora). The exact encoding of the granule position is up
 | 
						|
to a specific codec.</p>
 | 
						|
 | 
						|
<p>The granule position is governed by the following rules:</p>
 | 
						|
 | 
						|
<ul>
 | 
						|
 | 
						|
<li>Granule Position must always increase forward or remain equal from
 | 
						|
page to page, be unset, or be zero for a header page. The absolute
 | 
						|
time to which any correct sequence of granule position maps must
 | 
						|
similarly always increase forward or remain equal. <i>(A codec may
 | 
						|
make use of data, such as a control sequence, that only affects codec
 | 
						|
working state without producing data and thus advancing granule
 | 
						|
position and time. Although the packet sequence number increases in
 | 
						|
this case, the granule position, and thus the time position, do
 | 
						|
not.)</i></li>
 | 
						|
 | 
						|
<li>Granule position may only be unset if there no packet defining a
 | 
						|
time boundary on the page (that is, if no packet in a continuous
 | 
						|
stream ends on the page, or no packet in a discontinuous stream begins
 | 
						|
on the page. This will be discussed in more detail under Continuous
 | 
						|
and Discontinuous streams).</li>
 | 
						|
 | 
						|
<li>A codec must be able to translate a given granule position value
 | 
						|
to a unique, deterministic absolute time value through direct
 | 
						|
calculation. A codec is not required to be able to translate an
 | 
						|
absolute time value into a unique granule position value.</li>
 | 
						|
 | 
						|
<li>Codecs shall choose a granule position definition that allows that
 | 
						|
codec means to seek as directly as possible to an immediately
 | 
						|
decodable point, such as the bit-divided granule position encoding of
 | 
						|
Theora allows the codec to seek efficiently to key frame without using
 | 
						|
an index. That is, additional information other than absolute time
 | 
						|
may be encoded into a granule position value so long as the granule
 | 
						|
position obeys the above points.</li>
 | 
						|
 | 
						|
</ul>
 | 
						|
 | 
						|
<h4>Example: timestamp</h4>
 | 
						|
 | 
						|
<p>In general, a codec/stream type should choose the simplest granule
 | 
						|
position encoding that addresses its requirements. The examples here
 | 
						|
are by no means exhaustive of the possibilities within Ogg.</p>
 | 
						|
 | 
						|
<p>A simple granule position could encode a timestamp directly. For
 | 
						|
example, a granule position that encoded milliseconds from beginning
 | 
						|
of stream would allow a logical stream length of over 100,000,000,000
 | 
						|
days before beginning a new logical stream (to avoid the granule
 | 
						|
position wrapping).</p>
 | 
						|
 | 
						|
<h4>Example: framestamp</h4>
 | 
						|
 | 
						|
<p>A simple millisecond timestamp granule encoding might suit many stream
 | 
						|
types, but a millisecond resolution is inappropriate to, eg, most
 | 
						|
audio encodings where exact single-sample resolution is generally a
 | 
						|
requirement. A millisecond is both too large a granule and often does
 | 
						|
not represent an integer number of samples.</p>
 | 
						|
 | 
						|
<p>In the event that audio frames are always encoded as the same number of
 | 
						|
samples, the granule position could simply be a linear count of frames
 | 
						|
since beginning of stream. This has the advantages of being exact and
 | 
						|
efficient. Position in time would simply be <tt>[granule_position] *
 | 
						|
[samples_per_frame] / [samples_per_second]</tt>.</p>
 | 
						|
 | 
						|
<h4>Example: samplestamp (Vorbis)</h4>
 | 
						|
 | 
						|
<p>Frame counting is insufficient in codecs such as Vorbis where an audio
 | 
						|
frame [packet] encodes a variable number of samples. In Vorbis's
 | 
						|
case, the granule position is a count of the number of raw samples
 | 
						|
from the beginning of stream; the absolute time of
 | 
						|
a granule position is <tt>[granule_position] /
 | 
						|
[samples_per_second]</tt>.</p>
 | 
						|
 
 | 
						|
<h4>Example: bit-divided framestamp (Theora)</h4>
 | 
						|
 | 
						|
<p>Some video codecs may be able to use the simple framestamp scheme for
 | 
						|
granule position. However, most modern video codecs introduce at
 | 
						|
least the following complications:</p>
 | 
						|
 | 
						|
<ul>
 | 
						|
 | 
						|
<li>video frames are relatively far apart compared to audio samples;
 | 
						|
for this reason, the point at which a video frame changes to the next
 | 
						|
frame is usually a strictly defined offset within the frame 'period'.
 | 
						|
That is, video at 50fps could just as easily define frame transitions
 | 
						|
<.015, .035, .055...> as at <.00, .02, .04...>.</li>
 | 
						|
 | 
						|
<li>frame rates often include drop-frames, leap-frames or other
 | 
						|
rational-but-non-integer timings.</li>
 | 
						|
 | 
						|
<li>Decode must begin at a 'key frame' or 'I frame'. Keyframes usually
 | 
						|
occur relatively seldom.</li>
 | 
						|
 | 
						|
</ul>
 | 
						|
 | 
						|
<p>The first two points can be handled straightforwardly via the fact
 | 
						|
that the codec has complete control mapping granule position to
 | 
						|
absolute time; non-integer frame rates and offsets can be set in the
 | 
						|
codec's initial header, and the rest is just arithmetic.</p>
 | 
						|
 | 
						|
<p>The third point appears trickier at first glance, but it too can be
 | 
						|
handled through the granule position mapping mechanism. Here we
 | 
						|
arrange the granule position in such a way that granule positions of
 | 
						|
key frames are easy to find. Divide the granule position into two
 | 
						|
fields; the most-significant bits are an absolute frame counter, but
 | 
						|
it's only updated at each key frame. The least significant bits encode
 | 
						|
the number of frames since the last key frame. In this way, each
 | 
						|
granule position both encodes the absolute time of the current frame
 | 
						|
as well as the absolute time of the last key frame.</p>
 | 
						|
 | 
						|
<p>Seeking to a most recent preceding key frame is then accomplished by
 | 
						|
first seeking to the original desired point, inspecting the granulepos
 | 
						|
of the resulting video page, extracting from that granulepos the
 | 
						|
absolute time of the desired key frame, and then seeking directly to
 | 
						|
that key frame's page. Of course, it's still possible for an
 | 
						|
application to ignore key frames and use a simpler seeking algorithm
 | 
						|
(decode would be unable to present decoded video until the next
 | 
						|
key frame). Surprisingly many player applications do choose the
 | 
						|
simpler approach.</p>
 | 
						|
 | 
						|
<h3>granule position, packets and pages</h3>
 | 
						|
 | 
						|
<p>Although each packet of data in a logical stream theoretically has a
 | 
						|
specific granule position, only one granule position is encoded
 | 
						|
per page. It is possible to encode a logical stream such that each
 | 
						|
page contains only a single packet (so that granule positions are
 | 
						|
preserved for each packet), however a one-to-one packet/page mapping
 | 
						|
is not intended to be the general case.</p>
 | 
						|
 | 
						|
<p>Because Ogg functions at the page, not packet, level, this
 | 
						|
once-per-page time information provides Ogg with the finest-grained
 | 
						|
time information is can use. Ogg passes this granule positioning data
 | 
						|
to the codec (along with the packets extracted from a page); it is the
 | 
						|
responsibility of codecs to track timing information at granularities
 | 
						|
finer than a single page.</p>
 | 
						|
 | 
						|
<h3>start-time and end-time positioning</h3>
 | 
						|
 | 
						|
<p>A granule position represents the <em>instantaneous time location
 | 
						|
between two pages</em>. However, continuous streams and discontinuous
 | 
						|
streams differ on whether the granulepos represents the end-time of
 | 
						|
the data on a page or the start-time. Continuous streams are
 | 
						|
'end-time' encoded; the granulepos represents the point in time
 | 
						|
immediately after the last data decoded from a page. Discontinuous
 | 
						|
streams are 'start-time' encoded; the granulepos represents the point
 | 
						|
in time of the first data decoded from the page.</p>
 | 
						|
 | 
						|
<p>An Ogg stream type is declared continuous or discontinuous by its
 | 
						|
codec. A given codec may support both continuous and discontinuous
 | 
						|
operation so long as any given logical stream is continuous or
 | 
						|
discontinuous for its entirety and the codec is able to ascertain (and
 | 
						|
inform the Ogg layer) as to which after decoding the initial stream
 | 
						|
header. The majority of codecs will always be continuous (such as
 | 
						|
Vorbis) or discontinuous (such as Writ).</p>
 | 
						|
 | 
						|
<p>Start- and end-time encoding do not affect multiplexing sort-order;
 | 
						|
pages are still sorted by the absolute time a given granulepos maps to
 | 
						|
regardless of whether that granulepos represents start- or
 | 
						|
end-time.</p>
 | 
						|
 | 
						|
<h2>Multiplex/Demultiplex Division of Labor</h2>
 | 
						|
 | 
						|
<p>The Ogg multiplex/demultiplex layer provides mechanisms for encoding
 | 
						|
raw packets into Ogg pages, decoding Ogg pages back into the original
 | 
						|
codec packets, determining the logical structure of an Ogg stream, and
 | 
						|
navigating through and synchronizing with an Ogg stream at a desired
 | 
						|
stream location. Strict multiplex/demultiplex operations are entirely
 | 
						|
in the Ogg domain and require no intervention from codecs.</p>
 | 
						|
 | 
						|
<p>Implementation of more complex operations does require codec
 | 
						|
knowledge, however. Unlike other framing systems, Ogg maintains
 | 
						|
strict separation between framing and the framed bitstream data; Ogg
 | 
						|
does not replicate codec-specific information in the page/framing
 | 
						|
data, nor does Ogg blur the line between framing and stream
 | 
						|
data/metadata. Because Ogg is fully data-agnostic toward the data it
 | 
						|
frames, operations which require specifics of bitstream data (such as
 | 
						|
'seek to key frame') also require interaction with the codec layer
 | 
						|
(because, in this example, the Ogg layer is not aware of the concept
 | 
						|
of key frames). This is different from systems that blur the
 | 
						|
separation between framing and stream data in order to simplify the
 | 
						|
separation of code. The Ogg system purposely keeps the distinction in
 | 
						|
data simple so that later codec innovations are not constrained by
 | 
						|
framing design.</p>
 | 
						|
 | 
						|
<p>For this reason, however, complex seeking operations require
 | 
						|
interaction with the codecs in order to decode the granule position of
 | 
						|
a given stream type back to absolute time or in order to find
 | 
						|
'decodable points' such as key frames in video.</p>
 | 
						|
 | 
						|
<h2>Unsorted Discussion Points</h2>
 | 
						|
 | 
						|
<p>flushes around key frames? RFC suggestion: repaginating or building a
 | 
						|
stream this way is nice but not required</p>
 | 
						|
 | 
						|
<h2>Appendix A: multiplexing examples</h2>
 | 
						|
 | 
						|
<div id="copyright">
 | 
						|
  The Xiph Fish Logo is a
 | 
						|
  trademark (™) of Xiph.Org.<br/>
 | 
						|
 | 
						|
  These pages © 1994 - 2005 Xiph.Org. All rights reserved.
 | 
						|
</div>
 | 
						|
 | 
						|
</body>
 | 
						|
</html>
 |