Having fun with jumping sumo; some observations about jpg feed on developing program by following protocols doc

Product: [Jumping Sumo]
Product version: [N/A]
SDK version: [N/A]
Use of libARController: [NO]
SDK platform: [N/A]
Reproductible with the official app: [Not tried]

Hi, I’m in NYC and have been having some fun attempting to write, from scratch, some small program that helps the little jumping sumo navigate, based on its camera. the program initiates a connection, does some setup and then pulls via udp jpeg images, which the program will ultimately attempt to make navigation decisions from (based on the location of h (hsv) values in its view).

I struggled for a while with some of the earlier phases of the program, because, programming in linux, I thought of an event based joystick driver, and, after sending move packets to the jumping sumo, the ~16hz jpg frame feed would stop unless the device was in motion, or detected motion; until the device was moved, the frame feed was consistent. I couldn’t figure this out for a bit, thinking that I had something wrong in my protocol implementation.

I switched over to a different joystick method that now polls for joystick values, which I’ve also placed at 16hz and send in 0,0 move commands when the device is not in motion, which has the effect of keeping the jpg feed running. But this seems like a lot of packet traffic.

has anyone else experienced this ? my controller was responding to ping commands, I thought that this ought to have been enough to keep the jpg feed coming.

If you’re OK with using the ARDrone SDK, you should check out the driver in ROS I wrote : https://github.com/arnaud-ramey/rossumo
A sample with joystick:

Basically, in ROS, all applications are parallel executables communicating through sockets.
This way, the video stream is independant from the motion orders, and the motion orders can be supplied by a joystick; a keyboard, or AI.

Hope it helps!

hi arnaud, thanks for the notice about ROS … I’m relatively new to this and so didn’t know anything about ROS … I’ll definitely look into it for future versions …

the goal of my project, in that phase, was not necessarily the actions or output, per se, but rather to play w/, from the ground up, all the protocols and socket transactions and the multi-threading and message queues and all that, so that I could get a fairly good sense of what the basics of communicating with the device actually was like.

but, in the end, what I wound up doing to drive my device was to, instead of using the ~16hz jpg feed, to, rather, take jpg snapshots and then yank them off the device via the ftp interface and then chop them up w/ opencv to make decisions from. I didn’t find that I could reliably drive the “vertical” or “horizontal” motion from the video feed, given that I couldn’t always fetch the photos in a timewise consistent way, and, further, the images I did fetch I couldn’t process quickly enough …

again, this was w/ me doing all this by hand, and using simple perl, linux and perhaps “simple” opencv routines to make decisions from.

after I got all this to a point I was pretty happy w/, I went on vacation in early september, but, now I’m back and intend to write all this up into some article and post somewhere, and, once I do, I’ll post here and to you, so that you can have a deeper look into what I was doing; that should take about a week or so to get something together.

also, let me state, going back and re-reading the original post and response, arnaud’s response really doesn’t address the issue, at least in my experience, continuous “movement” commands are needed to have the device send further image data. Upon starting, and with no moves, the video feed seems in tact, but, once movement starts, without further movement, the image shuts off. I think a notice of this is in the protocols sheet, but, the device still sends acks instead of entirely shutting down, so, I don’t see why images aren’t also sent …

again, I could be doing something wrong in my handling of the protocols (particularly the acks), but, I don’t think so …

Yes, I guess it is a very instructive process to understand all the lower level inner mechanisms…

Concerning the (non) continuous stream, it might be that the ARDrone SDK sends periodical requests to the robot. At the upper level at least, no such command is sent.

Concerning the real-time processing, what you aim at doing is called visual odometry. It should be doable real time, as the resolution (VGA) and the frequency (15Hz if I remember correctly) are fairly low. Using JPG captures and the FTP does not sound very efficient though, as you create a lot of flash-disk writing and reading. Better putting everything in RAM. You might consider cv::imencode/decoode for encoding/decoding an image as JPG without writing to disk.

Hope it helps!

hi arnaud, again, thanks … I do use the imdecode routine from opencv to decode the jpg stream into a matrix. that matrix is chopped in half, then converted from bgr to hsv space … as I said, sooner or later, probably either this weekend, or the next, I’ll create some post to detail all I discovered with the devices, and then some others can critique and comment, given the actualities of the code …

its a bit complicated, as you discuss, this notion of using ftp and the file system in interfacing opencv, but other options include some other form of IPC, including some sockets process … I found, in the end, the slower file system approach was working extremely reliably, albeit slower than I think it would have been nice to demo …

in regards to visual odometry, I wasn’t doing anything so interesting, I don’t think. Simply put: I was trying to, in a “real time fashion” use image data taken from the jumping sumo’s camera to drive the device’s navigation, rather than trying to determine the absolute position of the device given a defined space. In my case, I just put some colored tape on the floor, as a track so to speak, and wanted the device to follow along the twists and curves of the tape.

as I said, I found that, if I waited for image data to be sent to drive the device forward, lags in the receipt of the camera data would cause the image data from being sent entirely, which was the original purpose of the post … The only thing that would keep the image data being sent (after a single movement command had been sent) was having my controller code send in 0, 0 (that is: none) movement commands at some interval (a few per second).


This behavior is normal.

Hope this helps,
Best regards,

hi djavan, yes, well, thanks for the notice …

since I was trying to “roll my own” app, after many hours of triple checking that I wasn’t doing something wrong I determined that this “feature” must be by design. what I don’t understand, unless I am mistaken somehow, is that both acks and pings are sent throughout the same time period, so: if these packets are both being sent and received, then, what is specifically about the pcmds that is somehow linked into the “video” feed … other special commands, as well, such as for the jumping sumo the tap and other commands, have no effect on this, and so, it is just the movements.

finally, I sent a note about this because I didn’t want anyone to have to be stumped by this as well, and so I think, if its is possible, a notice of this should go into the ardsk protocols document, which I think is the primary document to be used by those designing their own app. There’s something in there that might be construed to lead to this, but, nothing as direct as the above and it should, I think, exist somewhere …

also, djavan, let me also state, I think that the “side effect” of adjusting the video rate under certain conditions, makes, I think, the desire to “automatically” pilot the device from those images a bit more difficult …

I think that there should be, on certain devices, a control to allow the rate per second of the video feed to be set, and this should remain, as best as possible, constant.

finally, it might also be a good idea to have all video data on a separate socket …