Tivo uses a hardware MPEG encoder for both video and audio. For an excellent introduction and technical explanation of the MPEG video standards, you should check out this link from the Tektronix website:

Audio is recorded using MPEG1 layer 2, at 192 kbits/second. The video is encoded using the MPEG2 standard. The resolution and bitrate of the stream depends on the recording quality you choose in the Tivo software. The basic setting on my 1.3.0 Tivo uses a 352x480 video stream encoded at 1500 kbits/second.

(Note this is slightly higher than the bitrate used in VCD, which uses an MPEG1 video stream. This is inconvenient, as it makes one hour recordings about 790 MB, which is just too large to fit on a CDR. The means that to archive to CDR, you have to re-encode the stream in software.)

Inside the tivo, the raw audio and video streams are first run through the hardware audio and video mpeg encoders, giving two elementary streams (ES). The ESs are then broken into packets, and headers are added to the packets that include presentation time stamps (PTS) that allow the two streams to be played in sync when they are later decoded. These two PES streams are then stored on the harddisk, in the proprietary MFS partitions.

For actual storage of the PES streams on the harddrive, packets from the two streams are interleaved and the resulting data is broken up into chunks of 131072 bytes. Headers are added to these "chunks" to tell the tivo where in the chunks the data from the packets is, and how to reconstruct the packets from the chunk. Note that a single PES packet can span more than one chunk.

This can be seen in the ouput of ConvertStream when the -v option is given. An example is here. The records in the chunk labelled by type c0 contain the audio stream data and those of type e0 contain video stream data. The records that have the comment "Looks like a bad record, rejecting" are the tivo PES headers.

One detail is that while the elementary streams (ES) that are contained in the packets are in a standard format (and thus decodable by any mpeg decoder), the PES streams that the tivo uses are not quite the same as the PES streams that are contained in an mpeg program stream. In particular the header format is different: it is a different length, and does not contain the time stamp info in the same place. This means that an ordinary mpeg-2 decoder will not be able to play a tivo stream as a program stream.

The way that ConvertStream and ExtractStream currently handle this problem (as of 21 Oct 01) is to run through the tivo stream and separate out the audio and video data as Elementary Streams (ES). This basically involves copying out the contents of the PES packets and throwing away the packet headers. The two resulting ES can be multiplexed back together using tools such as mplex.

An example of the contents of an elementary video stream that has been processed this way can be found here. It was parsed into a somewhat human readable format using parse_stream.c. It contains all of the familiar start codes found in an MPEG2 elemetary stream.

The only problem with this is that you have lost the precious PTS timestamps that give you high quality audio/video synchronization. In order to avoid problems with lip-sync, the two streams must be synchronized to within microseconds. (As an aside, I find that the linux mplex doen't do a very good job even considering this. My streams end up seconds out of sync by the end of a 1 hour program, which I think is partly a mplex flakyness issue as well. I haven't yet tried netmplex.)

A better way to handle this would to use these headers somehow, either by writing a multiplexer that is aware of them and uses them, or by writing an input plugin for mplayer to directly play a tivo stream. Writing a multiplexer that produces a program stream from two PES is not that hard: you basically just have to add headers in the correct way. (Writing a mulitplexer that starts from ES is much harder, as you have to decide how to packetize the stream and have to determine the time stamps.)

Here's a list of the first 200 tivo PES headers from a video stream and the corresponding audio stream, off of my tivo. Looking at them, it's pretty obvious which parts of the bitfield are the timestamps. I'm just not sure what to do with them right now.