Notes on Tele-Presence in the CAVE

The CAVE is a tool used to create immersive virtual reality applications. We are in the process of building applications that will enable tele-operations, first of simple video playback robots, and later of more complex machinery or robotic hardware.

Robo-Cam

The idea of the robo-cam project is to enable a person in a CAVE to drive a robot using visual feedback from cameras on the robot. The goal of this experiment is to find ways to take traditional video sources, and create the stereo effect used in the CAVE to generate the immersive VR. There are three pieces of hardware in this application, the robot, the CAVE, and a workstation that will serve as the interface to the robot for the CAVE.

CAVE

The CAVE will be running an application that will present the video that's being transmitted from the robot in such a way as to make the user feel like they are the robot. I am assuming a three wall CAVE, but the implementation should be usable on an ImmersaDesk, viewable from the PowerWall, as well as CAVEs with variable number of walls.

What I plan to do with the wand is to use it in the following way, at first.

             ===========
	     \  1 2 3  /
	      \       /
               | ( ) |
               |     |
               =======
  1. Toggle Tilt Head Mode.
  2. Push to talk.
  3. Toggle Turn Head Mode.
( ) is the joystick used to control motion for each mode.

Once I have gotten this working, I plan to see how well I can do head turning automatically from the head tracking information. This is phase 3 (at least) of the project.

The Robot

I have an idea the robot with be roughly human size and shape, without arms, probably a single cylindrical body extending from the base to the "head". This part of the robot should allow for telescoping, to change the height of the "head".

The Base of the Robot will be some vehicular-type thing that has a reasonable wheel-base, as well as fine-grain maneuverability. I've been thinking a hacked r/c car could do the trick, with a bigger wheelbase, and enough heft for the weight it has to carry.

The "Head" of the robot will house between 1 - 7 small camera/computer modules, either 1 full duplex, or 2 one-directional audio/computer modules, a small screen to view the video from the person in the CAVE. This part of the robot is also what will be directly manipulatible, from the control data recieved via the CAVE. The details of that manipulation have yet to be worked out.

The video/computer modules can have various forms, depending on how we need to deal with throughput. The number of (computer, network connection)'s we need will depend on the throughput of the computer connection. Most wireless network solutions that I've seen so far have an upper limit of 1Mbps. This means that we'll be able to use one possibly two video streams per network connection. That makes our number of 1 - 7 cameras, translate into about 1 - 4 network connections.

To deal with the full duplex (or two half duplex) channels, we'll need another netowrk conenction. This connection should be able to share the control information for the Robot. I'd like to start with all the channels separate, until we understand what happens with the traffic that goes between the workstation and the robot.

The Workstation

The workstation will be used to pull the various video/audio streams out of the robot, and feed them to the CAVE. It will also take the commands from the CAVE application, and convert them into the input control streams to the robot, and communicate them.

This machine will operate by providing an RTP reflector that the CAVE application can connect to, and pull the video streams from. The workstation will not serve any audio or video, but will simply collect the streams from the robot, consolidate them, and provide them through a consistant protocol/interface to the CAVE application.

The reflective responsibility of the workstation is to take the control streams from the CAVE application, translate it into the type of data stream the robot expects in order to move and manipulate the cameras focus, etc.

The functionality of the workstation can be broken into two parts, the parts that are bandwidth constrained to be on a traditional wire (or fiber) network, and the parts that may possibly be able to move to the robot. The Audio/Video streams need to go to a workstation on a traditional network that can feed them across a high-throughput network directly to the CAVE.

Schedule

  1. Get a CAVE simulator going with movement that translates into control streams from the program.
  2. Get Two camera stereo video in the CAVE, synch'ing with the walls.
  3. Do tests on generating artificial focal disparity from a single video source.
  4. Test breaking a single pair of stereo streams among the walls.
  5. Test multiple pairs of stereo streams on each wall.
  6. Setup robot/workstation pair, figure out control system for robot.
  7. Start testing tethered robot for latency information.
  8. Work out control interface for robot (or write application side interface to existing control software).
  9. Test out controlling tethered robot with Simulator application.
  10. Test out robot fed video into the CAVE (once control issues have been resolved.)

Parts List

  1. CAVE Application.

    This Application will expand from the current application we have for viewing video in the CAVE. The first steps will involve making it understand RTP, a reasonably excellent protocol for shipping audio and video data (as well as other kinds) around. Then we'll move into simulator mode, where we start experimenting with the right way to present one or more stereo video streams to the user. Next, we'll move into how to get reasonable control data out of the application, experimenting with ways to build the control interface. Finally, we'll attempt to cement the different parts of the application together with a working robot. We'll need 3-5 weeks of solid testing time to ensure this is working well. This puts a target date of October 31 in front of us.

  2. Robot.

    The Robot will be as described above. I have an idea that we could use very small 486's (the cigarette pack machines come to mind), with pcmcia wireless/video, running some off the shelf video delivery software. I'd prefer if these ran under some BSD system, then I could write the video delivery system, but I'll settle for off the shelf parts. We'll have to manufacture the control system, Dan Sandin of EVL has ideas how we can do the video/control subsystem of the robot. I have ideas of using a reinforced r/c automaton for the locomotion, which I think should be wheel/track based movement.

  3. Workstation.

    The workstation I'd like to have will have to have high-spead connectivity to the CAVE (I'm thinking ATM makes sense) and have a wireless subnet that exists on the other side of the workstation. The reason I say other side is because I don't think that it's important for the robot to be "internet connected", as long as it can get data to the workstation, who can then relay the information to the CAVE. This is contrary to the idea that we put the control system directly on the robot, instead of going through the workstation. I'd like to see how this works out over time. I'd be in favor of this being an IBM RS/6000, OpenGL extensions, and ATM. Conceivably we could get away with a 41P (PCI testbed machine).

Problems to Be Solved

Vision and Perception of stereo imagery.