Observation: ANAFI RTSP Stream not containing IDR packets

My suspecion is, that this is the reason why a WebRTC client (browser) can’t observe the video.

The NAL units sent by the drone are STAP-A aggregated and contain SPS, PPS and non-IDR slices. Besides this there are SEI packets.

But not any single IDR frame. In discuss-webrtc there is a rumour, that the browsers don’t display anything unless at least one IDR has been seen. Also - the “ordinary” RTSP display via VLC or others show a crappy picture appearance in the very first seconds (not like usual “boom” - here is the video) and also the long term observation of the video doesn’t show any key frame appearance.

See also this thread: Anafi video via WebRTC (GStreamer webrtcbin)

And find a download of a Wireshark recording of 1 minute running

gst-launch-1.0 rtspsrc location=rtsp:// ! fakesink

here: Dropbox - anafi-h264-rtp-nalu-dump.pcap - Simplify your life


If you open this in WS you will most likely just see RTP frames (or UDP in worst case), unless you have the H.264 decoding filter enabled. If not done do this:

  1. Mark an UDP package, right click, “Decode as…RTP” then save. Packages should be shown as RTP now
  2. The dynamic payload type is 96, so go to “Wireshark/Rreferences/Rrotocols”, unfold the protocl list, select the H264 filter and enter the number 96 as “dynamic payload type”

Now you should see the packages as H.264.

I additionally tried to extract the H.264 NALUs using this LUA plugin (which usually works fine):

But the results where strange: I did run FFMPEG over the extracted H.264. FFMPEG produced a lot of errors (“Invalid NALU type 0” and others) and came up with a 39 seconds black MP4 video :slight_smile: So I don’t really trust the extraction result or the input is already garbage.

Is this a bug or a feature?

The complete absence of any key frame in the H.264 of the Anafi makes the H.264 not directly relayable via WebRTC to browsers, which REQUIRE key frames. I suppose the Parrot H.264 implementation is using the “intra-refresh” procedure X264 Settings - MeWiki - the way, how a new video stream display “develops” speaks for this.

Unfortunately this modern way excludes ALL browser decoder implementations of WebRTC nowadays and it is not possible to use the H.264 directly. Transcoding is required and this drives the hardware costs for SBCs and introduces additional latency and quality loss.


Quick answers about this:

  • We are indeed using intra-refresh (and multiple slices) instead of streaming IDR frames. This is done to increase network reliability as it makes the bandwidth more linear, and avoids the big key frames which are more likely to be the one having an error. A slight difference on Anafi Ai: we send a highly compressed IDR frame at start of stream, but otherwise the stream is the same.
  • In ordrer to initialize the decoders, we still use an IDR frame, but a grey one generated locally (see vdec_ffmpeg.c for an example, or this part for when we inject this frame).
  • Our RTP demuxer is also doing some error concealment, replacing damaged slices by h.264 skip slices, so the input h.264 byte stream of the decoder is always syntactically valid, this helps with some decoder implementations that block/crash on invalid byte stream.
  • Overall, this allows us to flag the decoded frames as either silent (the frame should not be displayed, this is the case for the grey IDR frame and the whole frames decoded before a full intra-refresh period), or having visual errors (the frame either had a broken slice replaced by a skip slice, or has a slice that depends on such a slice, and thus is not suitable for image processing).

Hope that this answers at least part of your questions.

Hi Nicolas, thanks for your elaborated answer, very much appreciated. However, I fear it doesn’t help me much :slight_smile: The reason is, that I was attempting to directly relay the H.264 frames from a GStreamer rtspsrc via rtph264depay and rtph264pay to a webrtcbin component in order to stream this to a WebRTC client in the browser. This would be the most straight forward way to make sub-zero latency video from the Anafi available to browsers, even a Raspberry PI Zero (with an additional network interface for the WebRTC stuff) could do that.

My problem is, that I have a lot of other rtpssrc reference devices (e.g. cheap Chinese surveilance cams), with which this works perfectly.

I have analyzed the case in details and mostly discussed in the “discuss-webrtc” group (https://groups.google.com/g/discuss-webrtc/c/rLnsr1dKLGE/m/a5Cw3BmPCQAJ?utm_medium=email&utm_source=footer). I started from a complete “WTF?” and finally came to the conclusion, that all is fine with your stream, just the browsers can’t cope with it.

In fact, Chromium logs that very clearly:

[15032:48131:0406/202231.110771:WARNING:[video_receive_stream2.cc](http://video_receive_stream2.cc/)(951)] No decodable frame in 200 ms, requesting keyframe.

You say, you are sending an IDR - I traced a lot, I never found one. And I also never had at least one time a “that works” effect on the browser. The browser just displays a spinning wheel.

In the forum above you will also find two videos which demonstrate what I have seen here.

I’m pretty sure a transcoding could solve this, but this again drives the hardware costs for necessary edge proxy computers by factor 4 to 8.

It would be really awesome, if there would be a way to switch off this intra-refresh and return to an old style SPS+PPS+IDR behaviour in order to have a turnkey solution for WebRTC. I could imagine to have a query-string parameter in the RTSP location string, which switches this behaviour and would be available for any kind of tests, if you would agree


This “IDR when starting the encoder” is only sent on Anafi Ai, our newer drone. I was pointing this as it is the only difference between the streaming on the “older” Anafi drones (Anafi, Anafi FPV, Anafi Work, Anafi USA …), and the new one.
It is normal that you never see an IDR frame on your model.

Our approach was selected for multiple reasons, and having the client transcode the stream for re-transmission is expected:

  • Transcoding allows you to adjust the target bitrate depending on the actual link you will use for re-transmission, while the live stream bitrate is adjusted based on the wifi link only. This might become an issue if the “upload” link is slower than the wifi link.
  • Transmitting IDR frames every GOP does not work properly for a drone application, it’s something we tested in the past, and as soon as the packet loss rate raises (typically due to distance), the error rate of IDR frames raises even more, since those are way bigger that the following P frames. The slices + intra-refresh approach we use yield a much cleaner stream for the same error rate, and we adopted this method for exactly this reason.

Regarding the requested change, it probably will never happen, as the Anafi (non-Ai) line of product is no longer on active development, and such a change would probably have a bigger impact on the drone than you can imagine.
However, having a way to transmit IDR frames on-request for client needing one is something we’re looking into (for Anafi Ai only), since the lack of IDR frames in the stream seems to be a problem for more use-cases than we thought.


Hi Nicolas,

This “IDR when starting the encoder” is only sent on Anafi Ai, our newer drone. I was pointing this as it is the only difference between the streaming on the “older” Anafi drones (Anafi, Anafi FPV, Anafi Work, Anafi USA …), and the new one.
It is normal that you never see an IDR frame on your model.

Sorry, I read this comment later and then forgot to correct my post.

I also totally understand your reasons and you are completely right with it, I don’t want to argue against that.

Transcoding works btw. with sub-second latency for two WebRTC stations on the same network. It just requires way more capable and expensive hardware now as a PI 3A+ as with the relay solution.

I need to add, that this statement is true for a Jetson Nano using Nvidea backed up transcoding in H.264. The latency of an outgoing vp8enc is > 10 s, so practically unuseable.

To finalize this: GPU supported edge devices (e.g. Jetson Nano, Raspberry PI) can be used. The end-to-end latency is around 150-250 ms (estimation). I’m wondering if the RF controller provides a network interface to the Android device, so that GStreamer code could be used directly in an Android app

This topic was automatically closed after 30 days. New replies are no longer allowed.