RTP Payload for Timed Text Markup Language (TTML)British Broadcasting CorporationDock House, MediaCityUKSalfordUnited Kingdom+44 30304 09549james.sandford@bbc.co.uk
Internet
Audio/Video Transport Core Maintenance Working GroupsubtitlescaptionsimscmediastreamingsdpxmlThis memo describes a Real-time Transport Protocol (RTP) payload format for
Timed Text Markup Language (TTML), an XML-based timed text format from
W3C. This payload format is specifically targeted at streaming workflows using
TTML.IntroductionTTML (Timed Text Markup Language) is a media type for
describing timed text, such as closed captions and subtitles in television
workflows or broadcasts, as XML. This document specifies how TTML should be
mapped into an RTP stream in streaming workflows, including (but not restricted
to) those described in the television-broadcast-oriented European Broadcasting
Union Timed Text (EBU-TT) Part 3 specification. This document does not define a media type
for TTML but makes use of the existing application/ttml+xml media type .Conventions and DefinitionsUnless otherwise stated, the term "document" refers to the TTML document
being transmitted in the payload of the RTP packet(s).The term "word" refers to a data word aligned to a specified number of bits
in a computing sense and not to linguistic words that might appear in
the transported text.
The key words "MUST", "MUST NOT",
"REQUIRED", "SHALL", "SHALL
NOT", "SHOULD", "SHOULD NOT",
"RECOMMENDED", "NOT RECOMMENDED",
"MAY", and "OPTIONAL" in this document are
to be interpreted as
described in BCP 14
when, and only when, they appear in all capitals, as shown here.
Media Format DescriptionRelation to Other Text Payload TypesPrior payload types for text are not suited to the carriage of closed
captions in television workflows. "RTP Payload for Text Conversation" is intended for low data rate conversation with its own
session management and minimal formatting capabilities. "Definition of Events for
Modem, Fax, and Text Telephony Signals" deals in large
parts with the control signalling of facsimile and other systems. "RTP Payload Format for
3rd Generation Partnership Project (3GPP) Timed Text"
describes the carriage of a timed text format with much more restricted
formatting capabilities than TTML. The lack of an existing format for TTML or
generic XML has necessitated the creation of this payload format.TTML2TTML2 (Timed Text Markup Language, Version 2) is an
XML-based markup language for describing textual information with associated
timing metadata. One of its primary use cases is the description of subtitles
and closed captions. A number of profiles exist that adapt TTML2 for use in
specific contexts . These include both file-based
and streaming workflows.Payload FormatIn addition to the required RTP headers, the payload contains a section for
the TTML document being transmitted (User Data Words) and a field for the
length of that data. Each RTP payload contains one or part of one TTML
document.A representation of the payload format for TTML is .RTP Header UsageRTP packet header fields SHALL be interpreted, as per , with the following specifics:
Marker Bit (M): 1 bit
The marker bit is set to "1" to indicate the last packet of a
document. Otherwise, set to "0". Note: The first packet might also be the
last.
Timestamp: 32 bits
The RTP Timestamp encodes the epoch of the TTML document in User Data
Words. Further detail on its usage may be found in . The clock frequency used is dependent on the
application and is specified in the media type rate parameter, as per . Documents spread across multiple packets MUST
use the same timestamp but different consecutive Sequence Numbers. Sequential
documents MUST NOT use the same timestamp. Because packets do
not represent any constant duration, the timestamp cannot be used to directly
infer packet loss.
Reserved: 16 bits
These bits are reserved for future use and MUST be set to
0x0 and ignored upon reception.
Length: 16 bits
The length of User Data Words in bytes.
User Data Words: The length of User Data Words MUST match
the value specified in the Length field
The User Data Words section contains the text of the whole document being transmitted
or a part of the document being transmitted. Documents using character
encodings where characters are not represented by a single byte
MUST be serialised in big-endian order, a.k.a., network byte
order. Where a document will not fit within the Path MTU, it may be fragmented
across multiple packets. Further detail on fragmentation may be found in .
Payload DataTTML documents define a series of changes to text over time. TTML documents
carried in User Data Words are encoded in accordance with one or more of the
defined TTML profiles specified in the TTML registry . These profiles specify the document structure used,
systems models, timing, and other considerations. TTML profiles may restrict
the complexity of the changes, and operational requirements may limit the
maximum duration of TTML documents by a deployment configuration. Both of
these cases are out of scope of this document.Documents carried over RTP MUST conform to the following
profile, in addition to any others used.Payload Content RestrictionsThis section defines constraints on the content of TTML documents carried
over RTP.Multiple TTML subtitle streams MUST NOT be interleaved in a
single RTP stream.The TTML document instance's root tt element in the
http://www.w3.org/ns/ttml namespace MUST include a
timeBase attribute in the
http://www.w3.org/ns/ttml#parameter namespace containing the value
media.This is equivalent to the TTML2 content profile definition document in
.Payload Processing RequirementsThis section defines constraints on the processing of the TTML documents carried over RTP.If a TTML document is assessed to be invalid, then it MUST be
discarded. This includes empty documents, i.e., those of zero length. When
processing a valid document, the following requirements apply.Each TTML document becomes active at its epoch E. E MUST be
set to the RTP Timestamp in the header of the RTP packet carrying the TTML
document. Computed TTML media times are offset relative to E, in accordance
with Section I.2 of .When processing a sequence of TTML documents, where each is delivered in
the same RTP stream, exactly zero or one document SHALL be
considered active at each moment in the RTP time line.
In the event that a document
Dn-1 with En-1 is active, and document Dn is
delivered with En where En-1 < En,
processing of Dn-1MUST be stopped at En
and processing of DnMUST begin.When all defined content within a document has ended, then processing of the
document MAY be stopped. This can be tested by constructing the
intermediate synchronic document sequence from the document, as defined by
. If the last intermediate synchronic document in the
sequence is both active and contains no region elements, then all defined
content within the document has ended.As described above, the RTP Timestamp does not specify the exact timing of
the media in this payload format. Additionally, documents may be fragmented
across multiple packets. This renders the RTCP jitter calculation
unusable.TTML Processor ProfileFeature Extension DesignationThis specification defines the following TTML feature extension designation:
urn:ietf:rfc:8759#rtp-relative-media-time
The namespace urn:ietf:rfc:8759 is as defined by .A TTML content processor supports the #rtp-relative-media-time
feature extension if it processes media times in accordance with the payload
processing requirements specified in this document, i.e., that the epoch E is
set to the time equivalent to the RTP Timestamp, as detailed above in .Processor Profile DocumentThe required syntax and semantics declared in the minimal TTML2 processor
profile in MUST be supported by
the receiver,
as signified by those feature or extension elements whose
value attribute is set to required.Note that this requirement does not imply that the receiver needs to
support either TTML1 or TTML2 profile processing, i.e., the TTML2
#profile-full-version-2 feature or any of
its dependent features.Processor Profile SignallingThe codecs media type parameter MUST specify at
least one processor profile. Short codes for TTML profiles are registered at
. The processor profiles specified in
codecsMUST be compatible with the processor profile
specified in this document. Where multiple options exist in codecs
for possible processor profile combinations (i.e., separated by |
operator), every permitted option MUST be compatible with the
processor profile specified in this document. Where processor profiles (other
than the one specified in this document) are advertised in the codecs
parameter, the requirements of the processor profile specified in this
document MAY be signalled, additionally using the +
operator with its registered short code.A processor profile (X) is compatible with the processor profile specified
here (P) if X includes all the features and extensions in P (identified by
their character content) and the value attribute of each is, at least,
as restrictive as the value attribute of the feature or extension in
P that has the same character content. The term "restrictive" here is as
defined in Section 6 of .Payload Examples is an example of a valid TTML document that may
be carried using the payload format described in this document.Fragmentation of TTML DocumentsMany of the use cases for TTML are low bit-rate with RTP packets expected
to fit within the Path MTU. However, some documents may exceed the Path
MTU. In these cases, they may be split between multiple packets. Where
fragmentation is used, the following guidelines MUST be
followed:
It is RECOMMENDED that documents be fragmented as seldom
as possible, i.e., the least possible number of fragments is created out of a
document.
Text strings MUST split at character boundaries. This
enables decoding of partial documents. As a consequence, document
fragmentation requires knowledge of the UTF-8/UTF-16 encoding formats to
determine character boundaries.
Document fragments SHOULD be protected against packet
losses. More information can be found in .
When a document spans more than one RTP packet, the entire document is
obtained by concatenating User Data Words from each consecutive contributing
packet in ascending order of Sequence Number.As described in , only zero or one TTML
document may be active at any point in time. As such, there
MUST only be one document transmitted for a given RTP
Timestamp. Furthermore, as stated in , the
marker bit MUST be set for a packet containing the last
fragment of a document. A packet following one where the marker bit is set
contains the first fragment of a new document. The first fragment might also
be the last.Protection against Loss of DataConsideration must be devoted to keeping loss of documents due to packet
loss within acceptable limits. What is deemed acceptable limits is dependent
on the TTML profile(s) used and use case, among other things. As such, specific
limits are outside the scope of this document.Documents MAY be sent without additional protection if
end-to-end network conditions guarantee that document loss will be within
acceptable limits under all anticipated load conditions. Where such guarantees
cannot be provided, implementations MUST use a mechanism to
protect against packet loss. Potential mechanisms include Forward Error
Correction (FEC) , retransmission , duplication , or an equivalent
technique.Congestion Control ConsiderationsCongestion control for RTP SHALL be used in accordance with
and with any applicable RTP profile, e.g., . "Multimedia Congestion Control: Circuit Breakers for
Unicast RTP Sessions" is an update to
"RTP: A Transport Protocol for Real-time
Applications" , which defines criteria for when one is required to
stop sending RTP packet streams. Applications implementing this standard
MUST comply with , with particular
attention paid to Section
on Media Usability. provides additional information
on the best practices for applying congestion control to UDP streams.Payload Format ParametersThis RTP payload format is identified using the existing
application/ttml+xml media type as registered with IANA
and defined in .Clock RateThe default clock rate for TTML over RTP is 1000 Hz. The clock rate
SHOULD be included in any advertisements of the RTP stream
where possible. This parameter has not been added to the media type definition
as it is not applicable to TTML usage other than within RTP streams. In other
contexts, timing is defined within the TTML document.When choosing a clock rate, implementers should consider what other media
their TTML streams may be used in conjunction with (e.g., video or audio). In
these situations, it is RECOMMENDED that streams use the same
clock source and clock rate as the related media. As TTML streams may be
aperiodic, implementers should also consider the frequency range over which
they expect packets to be sent and the temporal resolution required.Session Description Protocol (SDP) ConsiderationsThe mapping of the application/ttml+xml media type and its parameters SHALL be done according to
.
The type name "application" goes in SDP "m=" as the media name.
The media subtype "ttml+xml" goes in SDP "a=rtpmap" as the encoding name.
The clock rate also goes in "a=rtpmap" as the clock rate.
Additional format-specific parameters, as described in the media type
specification, SHALL be included in the SDP file in "a=fmtp" as
a semicolon-separated list of "parameter=value" pairs, as described in . The codecs parameter MUST be
included in the a=fmtp line of the SDP file. Specific requirements
for the "codecs" parameter are included in .ExamplesA sample SDP mapping is presented in .In this example, a dynamic payload type 112 is used. The 90 kHz RTP
timestamp rate is specified in the "a=rtpmap" line after the subtype.
The codecs parameter defined in the "a=fmtp" line indicates that the TTML data
conforms to Internet Media and Captions (IMSC) 1.1 Text profile .IANA ConsiderationsThis document has no IANA actions.Security ConsiderationsRTP packets using the payload format defined in this specification are
subject to the security considerations discussed in the RTP specification
and in any applicable RTP profile, such as RTP/AVP
, RTP/AVPF , RTP/SAVP , or RTP/SAVPF .
However, as
"Securing the RTP Protocol Framework: Why RTP Does Not Mandate a Single Media
Security Solution" discusses, it is not an RTP
payload format's responsibility to discuss or mandate what solutions are used
to meet the basic security goals (like confidentiality, integrity, and source
authenticity) for RTP in general. This responsibility lays on anyone using RTP
in an application. They can find guidance on available security mechanisms
and important considerations in "Options for Securing RTP Sessions" . Applications SHOULD use one or more
appropriate strong security mechanisms. The rest of this Security
Considerations section discusses the security impacting properties of the
payload format itself.To avoid potential buffer overflow attacks, receivers should take care to
validate that the User Data Words in the RTP payload are of the appropriate
length (using the Length field).This payload format places no specific restrictions on the size of TTML
documents that may be transmitted. As such, malicious implementations could be
used to perform denial-of-service (DoS) attacks. provides more information on DoS attacks and describes some
mitigation strategies. Implementers should take into consideration that the
size and frequency of documents transmitted using this format may vary over
time. As such, sender implementations should avoid producing streams that
exhibit DoS-like behaviour, and receivers should avoid false identification of
a legitimate stream as malicious.As with other XML types and as noted in "XML Media Types",
repeated expansion of maliciously constructed XML
entities can be used to consume large amounts of memory, which may cause XML
processors in constrained environments to fail.In addition, because of the extensibility features for TTML and of XML in
general, it is possible that "application/ttml+xml" may describe content that
has security implications beyond those described here. However, TTML does not
provide for any sort of active or executable content, and if the processor
follows only the normative semantics of the published specification, this
content will be outside TTML namespaces and may be ignored. Only in the case
where the processor recognizes and processes the additional content or where
further processing of that content is dispatched to other processors would
security issues potentially arise. And in that case, they would fall outside
the domain of this RTP payload format and the application/ttml+xml
registration document.Although not prohibited, there are no expectations that XML signatures or
encryption would normally be employed.Further information related to privacy and security at a document level can
be found in Appendix P of .Normative ReferencesEBU-TT, Part 3, Live Subtitling Applications: System Model and
Content Profile for Authoring and ContributionEuropean Broadcasting UnionTimed Text Markup Language 2 (TTML2)TTML Media Type Definition and Profile RegistryW3C Working Group NoteMedia TypesIANAInformative ReferencesTTML Profiles for Internet Media Subtitles and Captions 1.1Seamless Protection Switching of RTP DatagramsSMPTEAcknowledgementsThanks to , , , , , , , and for their valuable
feedback throughout the
development of this document. Thanks to the W3C Timed Text Working Group and
EBU Timed Text Working Group for their substantial efforts in developing the
timed text format this payload format is intended to carry.