RTP Data Transfer Protocol

CiscoNetwork 2016. 10. 11. 11:56 Posted by TanSanC
336x280(권장), 300x250(권장), 250x250, 200x200 크기의 광고 코드만 넣을 수 있습니다.

출처 : https://tools.ietf.org/html/rfc3550#page-13




RTP Protocol




5. RTP Data Transfer Protocol

5.1 RTP Fixed Header Fields

The RTP header has the following format: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |V=2|P|X| CC |M| PT | sequence number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | timestamp | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | synchronization source (SSRC) identifier | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | contributing source (CSRC) identifiers | | .... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The first twelve octets are present in every RTP packet, while the list of CSRC identifiers is present only when inserted by a mixer. The fields have the following meaning: version (V): 2 bits This field identifies the version of RTP. The version defined by this specification is two (2). (The value 1 is used by the first draft version of RTP and the value 0 is used by the protocol initially implemented in the "vat" audio tool.) padding (P): 1 bit If the padding bit is set, the packet contains one or more additional padding octets at the end which are not part of the payload. The last octet of the padding contains a count of how many padding octets should be ignored, including itself. Padding may be needed by some encryption algorithms with fixed block sizes or for carrying several RTP packets in a lower-layer protocol data unit. extension (X): 1 bit If the extension bit is set, the fixed header MUST be followed by exactly one header extension, with a format defined in Section 5.3.1. CSRC count (CC): 4 bits The CSRC count contains the number of CSRC identifiers that follow the fixed header.
   marker (M): 1 bit
      The interpretation of the marker is defined by a profile.  It is
      intended to allow significant events such as frame boundaries to
      be marked in the packet stream.  A profile MAY define additional
      marker bits or specify that there is no marker bit by changing the
      number of bits in the payload type field (see Section 5.3).

   payload type (PT): 7 bits
      This field identifies the format of the RTP payload and determines
      its interpretation by the application.  A profile MAY specify a
      default static mapping of payload type codes to payload formats.
      Additional payload type codes MAY be defined dynamically through
      non-RTP means (see Section 3).  A set of default mappings for
      audio and video is specified in the companion RFC 3551 [1].  An
      RTP source MAY change the payload type during a session, but this
      field SHOULD NOT be used for multiplexing separate media streams
      (see Section 5.2).

      A receiver MUST ignore packets with payload types that it does not
      understand.

   sequence number: 16 bits
      The sequence number increments by one for each RTP data packet
      sent, and may be used by the receiver to detect packet loss and to
      restore packet sequence.  The initial value of the sequence number
      SHOULD be random (unpredictable) to make known-plaintext attacks
      on encryption more difficult, even if the source itself does not
      encrypt according to the method in Section 9.1, because the
      packets may flow through a translator that does.  Techniques for
      choosing unpredictable numbers are discussed in [17].

   timestamp: 32 bits
      The timestamp reflects the sampling instant of the first octet in
      the RTP data packet.  The sampling instant MUST be derived from a
      clock that increments monotonically and linearly in time to allow
      synchronization and jitter calculations (see Section 6.4.1).  The
      resolution of the clock MUST be sufficient for the desired
      synchronization accuracy and for measuring packet arrival jitter
      (one tick per video frame is typically not sufficient).  The clock
      frequency is dependent on the format of data carried as payload
      and is specified statically in the profile or payload format
      specification that defines the format, or MAY be specified
      dynamically for payload formats defined through non-RTP means.  If
      RTP packets are generated periodically, the nominal sampling
      instant as determined from the sampling clock is to be used, not a
      reading of the system clock.  As an example, for fixed-rate audio
      the timestamp clock would likely increment by one for each
      sampling period.  If an audio application reads blocks covering

      160 sampling periods from the input device, the timestamp would be
      increased by 160 for each such block, regardless of whether the
      block is transmitted in a packet or dropped as silent.

      The initial value of the timestamp SHOULD be random, as for the
      sequence number.  Several consecutive RTP packets will have equal
      timestamps if they are (logically) generated at once, e.g., belong
      to the same video frame.  Consecutive RTP packets MAY contain
      timestamps that are not monotonic if the data is not transmitted
      in the order it was sampled, as in the case of MPEG interpolated
      video frames.  (The sequence numbers of the packets as transmitted
      will still be monotonic.)

      RTP timestamps from different media streams may advance at
      different rates and usually have independent, random offsets.
      Therefore, although these timestamps are sufficient to reconstruct
      the timing of a single stream, directly comparing RTP timestamps
      from different media is not effective for synchronization.
      Instead, for each medium the RTP timestamp is related to the
      sampling instant by pairing it with a timestamp from a reference
      clock (wallclock) that represents the time when the data
      corresponding to the RTP timestamp was sampled.  The reference
      clock is shared by all media to be synchronized.  The timestamp
      pairs are not transmitted in every data packet, but at a lower
      rate in RTCP SR packets as described in Section 6.4.

      The sampling instant is chosen as the point of reference for the
      RTP timestamp because it is known to the transmitting endpoint and
      has a common definition for all media, independent of encoding
      delays or other processing.  The purpose is to allow synchronized
      presentation of all media sampled at the same time.

      Applications transmitting stored data rather than data sampled in
      real time typically use a virtual presentation timeline derived
      from wallclock time to determine when the next frame or other unit
      of each medium in the stored data should be presented.  In this
      case, the RTP timestamp would reflect the presentation time for
      each unit.  That is, the RTP timestamp for each unit would be
      related to the wallclock time at which the unit becomes current on
      the virtual presentation timeline.  Actual presentation occurs
      some time later as determined by the receiver.

      An example describing live audio narration of prerecorded video
      illustrates the significance of choosing the sampling instant as
      the reference point.  In this scenario, the video would be
      presented locally for the narrator to view and would be
      simultaneously transmitted using RTP.  The "sampling instant" of a
      video frame transmitted in RTP would be established by referencing
      its timestamp to the wallclock time when that video frame was
      presented to the narrator.  The sampling instant for the audio RTP
      packets containing the narrator's speech would be established by
      referencing the same wallclock time when the audio was sampled.
      The audio and video may even be transmitted by different hosts if
      the reference clocks on the two hosts are synchronized by some
      means such as NTP.  A receiver can then synchronize presentation
      of the audio and video packets by relating their RTP timestamps
      using the timestamp pairs in RTCP SR packets.

   SSRC: 32 bits
      The SSRC field identifies the synchronization source.  This
      identifier SHOULD be chosen randomly, with the intent that no two
      synchronization sources within the same RTP session will have the
      same SSRC identifier.  An example algorithm for generating a
      random identifier is presented in Appendix A.6.  Although the
      probability of multiple sources choosing the same identifier is
      low, all RTP implementations must be prepared to detect and
      resolve collisions.  Section 8 describes the probability of
      collision along with a mechanism for resolving collisions and
      detecting RTP-level forwarding loops based on the uniqueness of
      the SSRC identifier.  If a source changes its source transport
      address, it must also choose a new SSRC identifier to avoid being
      interpreted as a looped source (see Section 8.2).

   CSRC list: 0 to 15 items, 32 bits each
      The CSRC list identifies the contributing sources for the payload
      contained in this packet.  The number of identifiers is given by
      the CC field.  If there are more than 15 contributing sources,
      only 15 can be identified.  CSRC identifiers are inserted by
      mixers (see Section 7.1), using the SSRC identifiers of
      contributing sources.  For example, for audio packets the SSRC
      identifiers of all sources that were mixed together to create a
      packet are listed, allowing correct talker indication at the
      receiver.

제어비트  : 16 비트
     - Ver (버젼)      : 2 비트
        . 현재 RTP 버젼은, 2 (RFC 3550)
     - P (padding)     : 1 비트
        . 1 이면 실제 유료부하 끝에 덧붙여진 패딩 데이터 있음
        . 응용프로그램이 32 비트 같은 정수배 단위로 RTP 패킷 페이로드 구성을 위함
     - X (extension)   : 1 비트
        . 1 이면 가변길이 헤더 확장(Extension Header)이 있음을 나타냄
     - CC (CSRC Count) : 4 비트
        . 기본 헤더 바로 뒤에 나타나는 CSRC(Countributing SouRCe) ID의 갯수
        . 여러 미디어가 합성되는 경우에, 그 개수를 CC로써 나타내고,
          모두의 기준 동기를 맞추려면 SRRC ID로써 이를 나타냄
     - M (Marker)      : 1 비트
        . 이벤트 발생이 시작되었음을 알림

  ㅇ 유료부하 타입(Payload type) : (7 비트)  오디오/비디오 인코딩(코덱) 종류
     - 오디오 타입 번호 例)
        . 0 -> G.711 PCM(mu-law), 샘플링주파수 8000 Hz
        . 3 -> GSM,               샘플링주파수 8000 Hz
        . 4 -> G.723,             샘플링주파수 8000 Hz
        . 6 -> DVI4 (ADPCM),      샘플링주파수 16000 Hz
        . 7 -> LPC,               샘플링주파수 8000 Hz
        . 8 -> G.711 PCM(A-Law)   샘플링주파수 8000 Hz
        . 9 -> G.722,             샘플링주파수 8000 Hz
        . 14 -> MPEG 오디오,      샘플링주파수 90000 Hz
        . 15 -> G.728,            샘플링주파수 8000 Hz
     - 비디오 타입 번호 例)
        . 26 -> 화상 JPEG, 31 -> H.261, 32 -> MPEG-1 또는 MPEG-2 비디오, 
          33 -> MPEG-2 TS 등
     - 기타 임의 지정 가능(dynamic payload type) : 96~127

     * RTP유료부하 유형(Payload type) 표준 목록 ☞ IANA RTP Parameters
        . RFC 3551에서 오디오 신호/비디오 신호인코딩 방법,샘플링 주파수 등이 기술됨

  ㅇ 순서번호(Sequence number) : (16 비트)
     - 패킷 손실 검출 및 순서 재구성 
        . 초기값은 랜덤이고, 매 패킷 마다 1씩 증가
           .. 수신측은 패킷 재전송 요청 보다는 패킷 손실 검출 및 뒤바뀐 순서 복구를 위함

  ㅇ 타임스탬프(Timestamp) : (32 비트)
     - RTP 스트림 내 각 RTP 패킷샘플링시간관계를 나타냄
        . 랜덤한 초기값부터 시작하며, 통상적으로 카운터에 의해 1씩 증가시킴

     - 타임스탬프 간격은 유료부하 유형에 따라 정해진 샘플링 간격을 기준 
        . 대부분의 오디오 RTP 패킷의 경우 => 1 패킷 당 디폴트 시간 간격을 20 ms으로 함
           .. 例) G.711 (PCM A-Law) 오디오 페이로드 패킷 크기 
                    = (유료부하 코덱 데이터율) x (1 패킷시간 간격)
                    = (64 kbps G.711 코덱) x (20 ms)
                    = (8000 samples x 8 bits)/sec x (0.02 sec)
                    = 160 바이트

     - 타임스탬프 값의 연속성 의미 구분
        . 例1) 일련의 패킷들의 타임스탬프 값이 `같은` 경우  
           .. 특정 비디오 장면이 같은 시간샘플링되었음을 의미
        . 例2) 일련의 패킷들의 타임스탬프 값이 `단순 증가하지 않는` 경우
           .. MPEG 화면 픽처 처럼 시간 순서가 어긋나며 전후 화면으로부터 예측되었음을 의미
        . 例3) 일련의 패킷들의 타임스탬프 값이 `연속 증가`되는 번호 순서를 갖음
           .. 오디오 패킷 흐름일 경우 등

  ㅇ 동기 발신 식별자 (SSRC ID, Synchronization SouRCe ID) : (32 비트)
     - 원래의 정보 스트림에 대한 식별 
        . 하나의 RTP 세션 내에서는, 
           .. 각각의 발송지는 무작위 SSRC ID로 나타내고, 
           .. 타 발송지와의 구별을 위해 중복되지 않아야 함
        . 여러 미디어가 혼합되어 있으면, 
           .. SSRC는 일종의 기준 역할이 가능함

     * RTP 세션 이란? 
        . RTP를 통해 상호 통신하는 참가자들 간에 형성된 논리적 연결 상태
        . RTP 세션에서 목적지 확인은, 
           .. 1개의 IP 주소 및 1쌍의 RTP/RTCP 번호로 식별됨
           .. 이때 IP 주소멀티캐스트 주소일 수도 있음
        . 만일, 여러 미디어믹서에서 혼합되면, 그 믹서에 대한 SSRC ID를 갖게됨

  ㅇ 기여 발신 식별자 (CSRC ID, Contributor SouRCe ID) 목록 : (32 비트) 1 이상 가변 갯수
     - 믹서(Mixer)를 통해 혼합되어 단일의 정보열로 되면, 원래의 각각에 대한 식별 역할
        . 여러 미디어가 혼합되면, CC(CSRC Count:4 비트)에 총 개수를 지정하고, 
        . SSRC 이외 추가된 스트림들에 대한 식별자를 CSRC ID 값으로 함

     * 만일, 하나의 미디어 소스 만 있다면, 
        . CC=1 이되고, RTP 헤더 길이는 12 바이트(기본헤더 길이)가 됨
        . 결국, SSRC ID가 하나의 값을 갖고, CSRC ID 목록은 비어있게 됨

  ※ [참고_웹] ☞ RTP의 이해