122 lines
5.6 KiB
Markdown
122 lines
5.6 KiB
Markdown
# Absolute Capture Time
|
||
|
||
The Absolute Capture Time extension is used to stamp RTP packets with a NTP
|
||
timestamp showing when the first audio or video frame in a packet was originally
|
||
captured. The intent of this extension is to provide a way to accomplish
|
||
audio-to-video synchronization when RTCP-terminating intermediate systems (e.g.
|
||
mixers) are involved.
|
||
|
||
**Name:**
|
||
"Absolute Capture Time"; "RTP Header Extension for Absolute Capture Time"
|
||
|
||
**Formal name:**
|
||
<http://www.webrtc.org/experiments/rtp-hdrext/abs-capture-time>
|
||
|
||
**Status:**
|
||
This extension is defined here to allow for experimentation. Once experience has
|
||
shown that it is useful, we intend to make a proposal based on it for
|
||
standardization in the IETF.
|
||
|
||
Contact <chxg@google.com> for more info.
|
||
|
||
## RTP header extension format
|
||
|
||
### Data layout overview
|
||
Data layout of the shortened version of `abs-capture-time` with a 1-byte header
|
||
\+ 8 bytes of data:
|
||
|
||
0 1 2 3
|
||
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
|
||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
||
| ID | len=7 | absolute capture timestamp (bit 0-23) |
|
||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
||
| absolute capture timestamp (bit 24-55) |
|
||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
||
| ... (56-63) |
|
||
+-+-+-+-+-+-+-+-+
|
||
|
||
Data layout of the extended version of `abs-capture-time` with a 1-byte header +
|
||
16 bytes of data:
|
||
|
||
0 1 2 3
|
||
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
|
||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
||
| ID | len=15| absolute capture timestamp (bit 0-23) |
|
||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
||
| absolute capture timestamp (bit 24-55) |
|
||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
||
| ... (56-63) | estimated capture clock offset (bit 0-23) |
|
||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
||
| estimated capture clock offset (bit 24-55) |
|
||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
||
| ... (56-63) |
|
||
+-+-+-+-+-+-+-+-+
|
||
|
||
### Data layout details
|
||
#### Absolute capture timestamp
|
||
|
||
Absolute capture timestamp is the NTP timestamp of when the first frame in a
|
||
packet was originally captured. This timestamp MUST be based on the same clock
|
||
as the clock used to generate NTP timestamps for RTCP sender reports on the
|
||
capture system.
|
||
|
||
It's not always possible to do an NTP clock readout at the exact moment of when
|
||
a media frame is captured. A capture system MAY postpone the readout until a
|
||
more convenient time. A capture system SHOULD have known delays (e.g. from
|
||
hardware buffers) subtracted from the readout to make the final timestamp as
|
||
close to the actual capture time as possible.
|
||
|
||
This field is encoded as a 64-bit unsigned fixed-point number with the high 32
|
||
bits for the timestamp in seconds and low 32 bits for the fractional part. This
|
||
is also known as the UQ32.32 format and is what the RTP specification defines as
|
||
the canonical format to represent NTP timestamps.
|
||
|
||
#### Estimated capture clock offset
|
||
|
||
Estimated capture clock offset is the sender's estimate of the offset between
|
||
its own NTP clock and the capture system's NTP clock. The sender is here defined
|
||
as the system that owns the NTP clock used to generate the NTP timestamps for
|
||
the RTCP sender reports on this stream. The sender system is typically either
|
||
the capture system or a mixer.
|
||
|
||
This field is encoded as a 64-bit two’s complement **signed** fixed-point number
|
||
with the high 32 bits for the seconds and low 32 bits for the fractional part.
|
||
It’s intended to make it easy for a receiver, that knows how to estimate the
|
||
sender system’s NTP clock, to also estimate the capture system’s NTP clock:
|
||
|
||
Capture NTP Clock = Sender NTP Clock + Capture Clock Offset
|
||
|
||
### Further details
|
||
|
||
#### Capture system
|
||
|
||
A receiver MUST treat the first CSRC in the CSRC list of a received packet as if
|
||
it belongs to the capture system. If the CSRC list is empty, then the receiver
|
||
MUST treat the SSRC as if it belongs to the capture system. Mixers SHOULD put
|
||
the most prominent CSRC as the first CSRC in a packet’s CSRC list.
|
||
|
||
#### Intermediate systems
|
||
|
||
An intermediate system (e.g. mixer) MAY adjust these timestamps as needed. It
|
||
MAY also choose to rewrite the timestamps completely, using its own NTP clock as
|
||
reference clock, if it wants to present itself as a capture system for A/V-sync
|
||
purposes.
|
||
|
||
#### Timestamp interpolation
|
||
|
||
A sender SHOULD save bandwidth by not sending `abs-capture-time` with every
|
||
RTP packet. It SHOULD still send them at regular intervals (e.g. every second)
|
||
to help mitigate the impact of clock drift and packet loss. Mixers SHOULD always
|
||
send `abs-capture-time` with the first RTP packet after changing capture system.
|
||
|
||
A receiver SHOULD memorize the capture system (i.e. CSRC/SSRC), capture
|
||
timestamp, and RTP timestamp of the most recently received `abs-capture-time`
|
||
packet on each received stream. It can then use that information, in combination
|
||
with RTP timestamps of packets without `abs-capture-time`, to extrapolate
|
||
missing capture timestamps.
|
||
|
||
Timestamp interpolation works fine as long as there’s reasonably low NTP/RTP
|
||
clock drift. This is not always true. Senders that detect "jumps" between its
|
||
NTP and RTP clock mappings SHOULD send `abs-capture-time` with the first RTP
|
||
packet after such a thing happening.
|